-
Often the data is so huge that we have to use more than one machine (Hadoop Cluster)
-
This video explain how map-reduce can take much larger data, better than stochastic GD and fit to one machine.
-
Note that because it’s less explain in the videos, doesn’t mean it unequally important as stochastic GD.
-
This method is used by one of the most legendary engineer at Sillcon Valley, work at Google
-
Divide the training examples based one n machine. If m = 400, and n machine = 4. Each quarter assigned to a machine.
-
Store into a temp, and the final equation will be include the sum of all temp.
-
This way we will 4 times faster than batch with single machine.
-
This is how we divide the training example based on number of machine
-
It may not reach maximum Nx speed, but still times faster than single machine
-
Keep in mind whether or not we can sums our learning algorithm if we are using map -reduce
-
In Logistic regression, (advanced like BFS) we need to the summation directly, as in Map Reduce we wait for other temp. Keep in mind, but still possible
-
Of course nowadays we also have single machine with multiple cores. And the concept can be used earlier
-
Mind that many, not always open numerical computing library already implemented map-reduce and parallelism. We just need to concern about vectorize implementation.
-
But if the library doesn’t have we can use the algorithm above.
-
In one machine we doesn’t have to worry about network latency as we done all this in single machine compared to multiple machines.There
-
These are how we use map-reduce that can be used to handle much larger datasets.
-
Many open source map-reduce library, like Hadoop that can handle clusters can be used also.