-
In the case of this, try to split into training and test set and cros-validate to avoid over/underfitting
Kurt divide into qualitative and quantitative
-
Qualitative: Try to make better intuition, visualize, so we can better understanding the data, and ask a quality question to ask for our data. In doing so doing dimensionality reduction(PCA) to reduce dimension of data so we can better visualize it, so long in the clustering(K-means)
-
Quantitave: Not  just throwing a bunch of data, but rather selecting the features that causing the model, not based on instinct, but with analysis
Kurt's Advice
-Kurt's give 3Â spesific fields, with particular interest
-
Building Model. We like to model our system, data, for example recommender system. Then increase our coding skill,
-
Data Analysis, increase statistical and machine learning, as well as mathematical analysis
-
Communication, increase communication skill, subtract high-analysis data, and make conclusion to the company
-
It's important to know what our spesific interest among the three (or mixed) and increase the skill of particular field
-
Assignment is create t-test to know the subway rider, are more people into subway(raining/not raining/weekend)
-
We have talked about modeling existing data analysis, as well ass predicting the data in the future
-
We also have talked about statistic(Welch's t-test) and machine learning(linear regression)
-
Now we want to analyze and draw conlusion based on rider subway data
-
After that, we want to project our findings to family/friends. Doing so we will need data visualization