- Often the data is so huge that we have to use more than one machine (Hadoop Cluster)
- This video explain how map-reduce can take much larger data, better than stochastic GD and fit to one machine.
- Note that because it’s less explain in the videos, doesn’t mean it unequally important as stochastic GD.

- This method is used by one of the most legendary engineer at Sillcon Valley, work at Google
- Divide the training examples based one n machine. If m = 400, and n machine = 4. Each quarter assigned to a machine.
- Store into a temp, and the final equation will be include the sum of all temp.
- This way we will 4 times faster than batch with single machine.

- This is how we divide the training example based on number of machine
- It may not reach maximum Nx speed, but still times faster than single machine

- Keep in mind whether or not we can sums our learning algorithm if we are using map -reduce
- In Logistic regression, (advanced like BFS) we need to the summation directly, as in Map Reduce we wait for other temp. Keep in mind, but still possible

- Of course nowadays we also have single machine with multiple cores. And the concept can be used earlier
- Mind that many, not always open numerical computing library already implemented map-reduce and parallelism. We just need to concern about vectorize implementation.
- But if the library doesn’t have we can use the algorithm above.
- In one machine we doesn’t have to worry about network latency as we done all this in single machine compared to multiple machines.There

- These are how we use map-reduce that can be used to handle much larger datasets.
- Many open source map-reduce library, like Hadoop that can handle clusters can be used also.