Counting Words

It may be easier to solve just one book, but to fit all books in one disk is impossible

All the words in documents could be map to reducer respectively, based on key.
Earlier, there's multiple key(words) with value 1.
Then reducers would have produce all the counting of the words(key), in result 1 word value all the counting words

in result if we put the code with sentence, 'Hello my name is Dave, Dave is my name'), it will produce all the tupple(key,value) above.
(recall string subtition to subtitute 0 with cleaned_data, 1 with 1)
Code above is the 'mapper' function
Then we will shuffle into reducers based on keys. if we have two reducers, we will split the keys in half

The reducer will take a line as = 'my\t1'
It will split '\t' making it a tuple(list) = ['my',1]
It then check if it really len(list) = 2, otherwise break
if old key is different than the key than we currently have, init. assign key and word_count = 0
then add the count (which is 1). After that, if we receive same key, then just increment the word_count with count
Finally we print every key with its count if it's not None.
Note that this means we have shuffle all the keys, means we have sorted the keys.