Conclusion, other advice, assignment and Recap
-
In the case of this, try to split into training and test set and cros-validate to avoid over/underfitting
Kurt divide into qualitative and quantitative
-
Qualitative: Try to make better intuition, visualize, so we can better understanding the data, and ask a quality question to ask for our data. In doing so doing dimensionality reduction(PCA) to reduce dimension of data so we can better visualize it, so long in the clustering(K-means)
- Quantitave: Not just throwing a bunch of data, but rather selecting the features that causing the model, not based on instinct, but with analysis
Kurt's Advice
-Kurt's give 3 spesific fields, with particular interest
- Building Model. We like to model our system, data, for example recommender system. Then increase our coding skill,
- Data Analysis, increase statistical and machine learning, as well as mathematical analysis
-
Communication, increase communication skill, subtract high-analysis data, and make conclusion to the company
- It's important to know what our spesific interest among the three (or mixed) and increase the skill of particular field
- Assignment is create t-test to know the subway rider, are more people into subway(raining/not raining/weekend)
- We have talked about modeling existing data analysis, as well ass predicting the data in the future
- We also have talked about statistic(Welch's t-test) and machine learning(linear regression)
- Now we want to analyze and draw conlusion based on rider subway data
- After that, we want to project our findings to family/friends. Doing so we will need data visualization