Intro Map Reduce

2018-04-05 00:00 | Source

Intro Map Reduce

Until now all we have learned so far is somewhat small or managable data.(GB data)
But often we will have a very complex data (terra/penta)
This lesson will taught us how to make a mapreduce algorithm to handle simple problem, even against scaling data

Big data is not by common said data that can't be processed by Excel
But rather data that's too large to fit in one disk

Supposse Google want to rank all the books in the world
It will be impossible to fit all datas in one disk
That's way we use Map Reduce
It will become very complex if we use simple python script to handle all the data.
Use Map Reduce if we have data that's too complex

All the possiblities can be handle by Map Reduce
Chevron uses Map Reduce programming model to handle data from ships that tracks seismic activity all over the world
Ebay use it to handle large data between seller, buyer, and transactions
IPtrust use it to gather a lot of cyber atacks to create more sophisticated online security
Pixio use it to help them answer questions about patients' health

Suppose we have 5 terrabytes of data. This is a good way to use Map Reduce
Map Reduce generally a parallel programming model that consists of two part, map and reducer

Suppose we have history of the books as a data
We want to make key per books and the value is a document within it
It will generate key,value for all the books.
Then we map all of our data to the respected mapper, will be shard to all of our threads
Then each of the threads will produce key, with value.
It's important that each of value related to the key, and related to the same threads, otherwise undesirable