Map-reduce and data-parallelism

Often the data is so huge that we have to use more than one machine (Hadoop Cluster)
This video explain how map-reduce can take much larger data, better than stochastic GD and fit to one machine.
Note that because it’s less explain in the videos, doesn’t mean it unequally important as stochastic GD.

This method is used by one of the most legendary engineer at Sillcon Valley, work at Google
Divide the training examples based one n machine. If m = 400, and n machine = 4. Each quarter assigned to a machine.
Store into a temp, and the final equation will be include the sum of all temp.
This way we will 4 times faster than batch with single machine.

Keep in mind whether or not we can sums our learning algorithm if we are using map -reduce
In Logistic regression, (advanced like BFS) we need to the summation directly, as in Map Reduce we wait for other temp. Keep in mind, but still possible

Of course nowadays we also have single machine with multiple cores. And the concept can be used earlier
Mind that many, not always open numerical computing library already implemented map-reduce and parallelism. We just need to concern about vectorize implementation.
But if the library doesn’t have we can use the algorithm above.
In one machine we doesn’t have to worry about network latency as we done all this in single machine compared to multiple machines.There

These are how we use map-reduce that can be used to handle much larger datasets.
Many open source map-reduce library, like Hadoop that can handle clusters can be used also.