Intro Map Reduce
Intro Map Reduce
-
Until now all we have learned so far is somewhat small or managable data.(GB data)
-
But often we will have a very complex data (terra/penta)
-
This lesson will taught us how to make a mapreduce algorithm to handle simple problem, even against scaling data
-
Big data is not by common said data that can't be processed by Excel
-
But rather data that's too large to fit in one disk
-
Supposse Google want to rank all the books in the world
-
It will be impossible to fit all datas in one disk
-
That's way we use Map Reduce
-
It will become very complex if we use simple python script to handle all the data.
-
Use Map Reduce if we have data that's too complex
-
All the possiblities can be handle by Map Reduce
-
Chevron uses Map Reduce programming model to handle data from ships that tracks seismic activity all over the world
-
Ebay use it to handle large data between seller, buyer, and transactions
-
IPtrust use it to gather a lot of cyber atacks to create more sophisticated online security
-
Pixio use it to help them answer questions about patients' health
-
Suppose we have 5 terrabytes of data. This is a good way to use Map Reduce
-
Map Reduce generally a parallel programming model that consists of two part, map and reducer
-
Suppose we have history of the books as a data
-
We want to make key per books and the value is a document within it
-
It will generate key,value for all the books.
-
Then we map all of our data to the respected mapper, will be shard to all of our threads
-
Then each of the threads will produce key, with value.
-
It's important that each of value related to the key, and related to the same threads, otherwise undesirable