# Analyzing Data

|   Source
Analyzing Data
• This lesson gonna teach us how to use twitter dataset to analyze the data.

• This the dataset in general social network data, for this in particular tweeter.
• Introduced The Agregation Framework, MongoDB powerful data anaylisis, to analyze what kind of data we've been working on.

• Here is the step to extract the user who tweeted the most based on the structure of data twitter above.

• The Agregation Framework in MongoDB implemented this
• the framework using pipeline to solve the problem.
• First it uses group operator, where the id(unique) means that we group all the tweet based on the uniqueness(id) of user screen name. the "$user.screen_name" doesn't mean operator, but value of "user.screen_name". Then for every tweet based on the same username, increment (count) to one. • The sort then perform the sorting based on count, on the descending(-1) order. • This is two-stage performed by the pipeline of agregation framework. • The stage in agregation pipeline can be single or series of stage to get a result • Here we reshaping tweet to the middle(based on what we want) and then performe sorting stage in 'sort' • Agregation operators: •$project: Reshaping all the data so that it can be presented nicely depend what we want, to the next stage or as result.
• $match: filter documents. •$group, compact multiple documents(given parameters) with single documents that satisfied the operator. operator $group as follows: •$sum
• $first •$last
• $max •$min
• $avg •$push. Deal with Array
• $addtoSet. Deal with Array, Perform as a set to update a value in array, •$skip: skip documents by index
• $limit: limit by number, the documents. 3, means only first three allowed. •$unwind: unwind the array of a documents, to a multiple documents with same data, but different by each value of array name. This is useful as in Twitter, we may want to group by the hashtag

• This produce 4-stage pipeline for agregation

• friends: who i follow

• This is the function of who included the most user mentions.

• This will produce unique hashtag as an array, but not containing the same value.

• Multiple stage with same name operator.
• This one counts the user that has the most unique user mentions(user that mentions many unique users, the most)

• We can index our database for fasten our query
• To do this we specify our leftmost queries hashtag-->username
• Keep in mind that although read faster, write becomes slower because the database has to be updated.

• Here is the indexex command from monggo shell
• If we execute second line, it will have few seconds to execute, because the data have 7 millions set.
• But when we set index(tg), the result for the query give immediate results

• We can specift name type(e.g. location) but the value must follow [x,y] format

• Then we can query based on the \$near operator.