Visualization for Big Data

  |   Source

So the problem is your data is big. Really big. So big that you can't even fit the data in your visualization. There's too muuch data. And how about displaying and get insights from it. We could buy larger display to fit all visualization that we need. But data is growing so big that someday we still need another display. Let's step through what else can we do.

You can take a lootk 3V definition here. The Next 4V is Veracity, whether the data is certain or not. Is there any error definition,

We could do some panning, one of the technique in Temporal Partioning. all the data can be scroll left or right depending on what we ned. Panning could also be with angle, rotate the object. In this case it make sense if we use 3D. We could also zoom in and out. Semantic zooming lets you as you zoom bigger, more information will let out. This is in correlation with abstraction in visualization. You want to hide all the information and let the user discover information as they zooming.

You can use Multiple Coordinated Views to divide your data in small multiples and let readers see specific category of your data.The downside of MCV is that you have to compare between each of the subset to see the difference. This like spot the difference games all over again. It takes effort to really see the difference. Big data will still benefit from MCV but you also have to pay attention to this.The other ways is SPLOM

One way to reduce your data is by using Dimensionality Reduction. This is used in almost any field in data science; visualization, exploratory data analysis and machine learning. It also used when you have want to fit your data into single harddisk, and you want to compress it using DR. DR could also be uncompressed using its reverse algorithm.

Cluster is something that we have learned since we have a child. We can cluster tuna, shark into fish, human rabbit into mamals. Because we know all the characteristic for us to judge which species we want to cluster. If we could fit into a machine, we want to tell how many characteristic, in columns, to make similarity across species. This similarities will be calculated into matrices and check each of the distance.

Singular Value Decomposition(SVD) let you reduce the columns that give you not much information. If the columns that are related is just the same information, then the columns will be eliminated. This could be done in entropy.