Map-reduce and data-parallelism

Map-reduce and data-parallelism
  • Often the data is so huge that we have to use more than one machine (Hadoop Cluster)
  • This video explain how map-reduce can take much larger data, better than stochastic GD and fit to one machine.
  • Note that because it’s less explain in the videos, doesn’t mean it unequally important as stochastic GD.


  • This method is used by one of the most legendary engineer at Sillcon Valley, work at Google
  • Divide the training examples based one n machine. If m = 400, and n machine = 4. Each quarter assigned to a machine.
  • Store into a temp, and the final equation will be include the sum of all temp.
  • This way we will 4 times faster than batch with single machine.

  • This is how we divide the training example based on number of machine
  • It may not reach maximum Nx speed, but still times faster than single machine

  • Keep in mind whether or not we can sums our learning algorithm if we are using map -reduce
  • In Logistic regression, (advanced like BFS) we need to the summation directly, as in Map Reduce we wait for other temp. Keep in mind, but still possible



  • Of course nowadays we also have single machine with multiple cores. And the concept can be used earlier
  • Mind that many, not always open numerical computing library already implemented map-reduce and parallelism. We just need to concern about vectorize implementation.
  • But if the library doesn’t have we can use the algorithm above.
  • In one machine we doesn’t have to worry about network latency as we done all this in single machine compared to multiple machines.There

  • These are how we use map-reduce that can be used to handle much larger datasets.
  • Many open source map-reduce library, like Hadoop that can handle clusters can be used also.