Counting Words

  |   Source
Counting Words

  • Snipping words for counting words serially

  • It may be easier to solve just one book, but to fit all books in one disk is impossible

  • All the words in documents could be map to reducer respectively, based on key.
  • Earlier, there's multiple key(words) with value 1.
  • Then reducers would have produce all the counting of the words(key), in result 1 word value all the counting words

  • in result if we put the code with sentence, 'Hello my name is Dave, Dave is my name'), it will produce all the tupple(key,value) above.
  • (recall string subtition to subtitute 0 with cleaned_data, 1 with 1)
  • Code above is the 'mapper' function
  • Then we will shuffle into reducers based on keys. if we have two reducers, we will split the keys in half

  • The reducer will take a line as  = 'my\t1'
  • It will split '\t' making it a tuple(list) = ['my',1]
  • It then check if it really len(list) = 2, otherwise break
  • if old key is different than the key than we currently have, init. assign key and word_count = 0
  • then add the count (which is 1). After that, if we receive same key, then just increment the word_count with count
  • Finally we print every key with its count if it's not None.
  • Note that this means we have shuffle all the keys, means we have sorted the keys.