Intro Map Reduce

  |   Source
Intro Map Reduce
  • Until now all we have learned so far is somewhat small or managable data.(GB data)
  • But often we will have a very complex data (terra/penta)
  • This lesson will taught us how to make a mapreduce algorithm to handle simple problem, even against scaling data



  • Big data is not by common said data that can't be processed by Excel
  • But rather data that's too large to fit in one disk



  • Supposse Google want to rank all the books in the world
  • It will be impossible to fit all datas in one disk
  • That's way we use Map Reduce
  • It will become very complex if we use simple python script to handle all the data.
  • Use Map Reduce if we have data that's too complex

  • All the possiblities can be handle by Map Reduce
  • Chevron uses Map Reduce programming model to handle data from ships that tracks seismic activity all over the world
  • Ebay use it to handle large data between seller, buyer, and transactions
  • IPtrust use it to gather a lot of cyber atacks to create more sophisticated online security
  • Pixio use it to help them answer questions about patients' health

  • Suppose we have 5 terrabytes of data. This is a good way to use Map Reduce
  • Map Reduce generally a parallel programming model that consists of two part, map and reducer

  • Suppose we have history of the books as a data
  • We want to make key per books and the value is a document within it
  • It will generate key,value for all the books.
  • Then we map all of our data to the respected mapper, will be shard to all of our threads
  • Then each of the threads will produce key, with value.
  • It's important that each of value related to the key, and related to the same threads, otherwise undesirable