Data Wrangling, Analyze Messy Data and Nick's Experiences
- This is the data of baseball player team
- If we look into the data, there's something missing out
- Average bat can't be more than one thousand, and there's missing left-or-hand by Ichiro SUzuki.
- Data Wrangling is an art of gathering, fixing, and cleaning up the data that we have.
- There's 3 part to gather data, files, databases, and Web APIS
- At some point of our lifes, there's gotta be an experience when we have mesy data
- Like how we have to find personal budget in our projects,
- Or there's too many catalog menu in Bake sales.
- We should be realize by now that cleaning the data will need much work
- It's important to visualize the data, find out the structure to really understand the what data that we have.
Nick Gustafson said:
- About average projects, spent time about 50% to gather the data.
- If the data can't be used by our algorithm, then we have to reiterate the process.
- Acquiring data isn't something specific
-
We can just look at the internet, and for example find
http://www.seanlahman.com/baseball-archive/statistics
will give us 3 different formats of data.