Earlier in my blog, I said that Data Wrangling is not skill that statistician has. Take a lot of process to clean this data. If there's outliers, two thing comes in mind. First is if this an error in the input. If it was input by human, there's maybe human error. Or sensor error if have sensor, or network error. Second, the outliers that really mean outliers. It's just kind of noise that usually the data has. It's up to us to treat the outliers, or ignore it as we want to focus on the data.
So what is Data Science? Why this field that comes from nowhere suddenly become the most popular carreer? Data itself has been cheap, free, spread across the internet. Earlier back in the days, companies just keep data from themselves. And this make it rare and expensive. But turns out, the data that they only have, is not so much if they want to see it from the bigger picture. This is why they need more data, from other sources. They can't keep learn from the data that they only have. Generated more data can be easy. But it need people who can get insights from huge number of sources. 90% of the data created from the last two years. This of course can't keep up with existing people that already in the field. And suddenly world need data scientist.
MapReduce can handle all your process in the Hadoop File System. It broke your data into chunks that reside in each cluster, then perform your data in parallel way. Suppose we need hashtable and we use it to add key value to our hashtable. If there's millions data, we can be low on memory, out of memory perhaps. And the hashtable will be created after a long time. For this thing, parallel could be work.
In our engineering world, nowadays we familliar with the term of Big Data. Big Data itself is . But is everything is about Big Data? Details about smartphone owned by a person for a phone company may not a Big Data, but hundreds of person may be a Big Data. Big Data comes from a different perspective. Some said that Big Data is used terrabytes of data. Others said that Big Data is data that is so big, it couldn't fit on the single machine.
In this section we want to know how we interract and animate in various narrative structure.
Narrative is one thing that binds all your stories. Acts as a string, binding all your findings from the beginning to the end.
This blog will explain more about how we can recreate the visualization. We will perform metrics that can help the effectiveness of your graphics. We also dig deep into the problems adressed for using charts and graphics. And we also explore how D3 can engage users more.
Dimple JS has been popular for newcomers at D3 because it has gentler learning curve. It's abstract the lower level of D3, such as append(), enter(), and data(). Dimple also abstract the shape, scale and positioning of D3. It's built on top of D3, and you can dig into more customization with D3, within DimpleJS.
To get an overview of the basics of D3, you may want to check out this link
The Visualization is important. Picture means thousands word is no joke.Data Visualization is about how we turned raw data, number in table, row, or columns, to visuals. It's about how we use visualization presented with right kind of plot or colors.It takes our creativity to makes visualization that is interesting, simpler, but also understandable.