Statistics and Exploratory Data Analysis
Earlier in my blog, I said that Data Wrangling is not skill that statistician has. Take a lot of process to clean this data. If there's outliers, two thing comes in mind. First is if this an error in the input. If it was input by human, there's maybe human error. Or sensor error if have sensor, or network error. Second, the outliers that really mean outliers. It's just kind of noise that usually the data has. It's up to us to treat the outliers, or ignore it as we want to focus on the data.
Why Data Science?
So what is Data Science? Why this field that comes from nowhere suddenly become the most popular carreer? Data itself has been cheap, free, spread across the internet. Earlier back in the days, companies just keep data from themselves. And this make it rare and expensive. But turns out, the data that they only have, is not so much if they want to see it from the bigger picture. This is why they need more data, from other sources. They can't keep learn from the data that they only have. Generated more data can be easy. But it need people who can get insights from huge number of sources. 90% of the data created from the last two years. This of course can't keep up with existing people that already in the field. And suddenly world need data scientist.
Using MapReduce and Design Pattern
MapReduce can handle all your process in the Hadoop File System. It broke your data into chunks that reside in each cluster, then perform your data in parallel way. Suppose we need hashtable and we use it to add key value to our hashtable. If there's millions data, we can be low on memory, out of memory perhaps. And the hashtable will be created after a long time. For this thing, parallel could be work.
Hadoop and Big Data
In our engineering world, nowadays we familliar with the term of Big Data. Big Data itself is . But is everything is about Big Data? Details about smartphone owned by a person for a phone company may not a Big Data, but hundreds of person may be a Big Data. Big Data comes from a different perspective. Some said that Big Data is used terrabytes of data. Others said that Big Data is data that is so big, it couldn't fit on the single machine.
Interaction and Animation with D3.js
In this section we want to know how we interract and animate in various narrative structure.
Narrative Structures of Data Journalism
Narrative is one thing that binds all your stories. Acts as a string, binding all your findings from the beginning to the end.
Design and Principles of Data Visualization
This blog will explain more about how we can recreate the visualization. We will perform metrics that can help the effectiveness of your graphics. We also dig deep into the problems adressed for using charts and graphics. And we also explore how D3 can engage users more.
Dimple Basics
Dimple JS has been popular for newcomers at D3 because it has gentler learning curve. It's abstract the lower level of D3, such as append(), enter(), and data(). Dimple also abstract the shape, scale and positioning of D3. It's built on top of D3, and you can dig into more customization with D3, within DimpleJS.
D3 Basics
To get an overview of the basics of D3, you may want to check out this link
D3 assumes the users to have profficiency in HTML, CSS, Javascript, and SVG. You can see that visuals elements is abstracted by DOM. DOM used as a bridge between the visualization and html source. It avoid the programmatic html source, and using developer tools like in Chrome, we can hover the code.
Fundamentals of Data Visualization
The Visualization is important. Picture means thousands word is no joke.Data Visualization is about how we turned raw data, number in table, row, or columns, to visuals. It's about how we use visualization presented with right kind of plot or colors.It takes our creativity to makes visualization that is interesting, simpler, but also understandable.