The idea here is that you have a huge set of webpages (URLs) and webpage-content, and you want to create a huge table indexed by each word that shows you what URL the word is in. Fill in the body for the mapper and reducer for the Google sprite and test it. Hint: This problem is very similar to WordCount.

Problem Input Map Domain Map Range Map Function Binary reducer function Output
Google simulation! Given web pages (URLs) and data, create a massive reverse-lookup-table, that allows us to quickly query, given any single word, what webpages it was on. The input is a list of lists. The first element in each inner list is the web page address, the second element is the content of the webpage Two-element list, the web page address and the text of the web page A list of lists, where the inner list has the word as the first element and all the URLs that have the word as followup elements. e.g., if the input were: ("hamlet" "to be or not to be"), the output would be ((to hamlet) (be hamlet) (or hamlet) (not hamlet)) For every unique word in the webpage, make a list of the word and the URL. Return a list of all these pairs. Take two lists of words and their counts and merge them. E.g., Given ((to hamlet) (be hamlet) (or hamlet) (not hamlet)) and ((to webster) (wit webster)), it would return ((to hamlet webster) (be hamlet) (or hamlet) (not hamlet) (wit webster)) A single list of lists, with each inner list a unique word as the first element and the URLs that contain the word as the following elements.