Data Science, Python, Games (Data Science)http://napitupulu-jon.appspot.com//enSat, 14 Jan 2017 01:22:03 GMThttps://getnikola.com/http://blogs.law.harvard.edu/tech/rssDeep Learning for Letter Recognition with Tensorflowhttp://napitupulu-jon.appspot.com/posts/deep-learning-tensorflow-mnist.htmlJonathan Hari Napitupulu<div tabindex="-1" id="notebook" class="border-box-sizing"> <div class="container" id="notebook-container"> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>It's been 6 months since my last blog. There are multiple blog drafts to be honest, but many of them took a really long time to finish. I guess I should have sliced the material a bit to multiple blogs. Anyway in this blog, I want to show how to achieve more than 95% accuracy with just Macbook Air. So let's dig down to details.</p> <p>This material originally comes <a href="https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/udacity/4_convolutions.ipynb">Deep Learning Lecture 4 from Udacity</a>. In this blog, Pickling, reformat, accuracy, and session are theirs, but the architecture is my own which is the core of Deep Learning. It's kind of refreshing because I have experience it before (yes, Andrew Ng's Coursera Machine Learning on Neural Network). </p><p><a href="http://napitupulu-jon.appspot.com/posts/deep-learning-tensorflow-mnist.html">Read more…</a> (11 min remaining to read)</p></div></div></div></div></div>Data ScienceDeep Learningmachine learningNeural NetworksTensorflowUdacityhttp://napitupulu-jon.appspot.com/posts/deep-learning-tensorflow-mnist.htmlFri, 13 Jan 2017 13:00:14 GMTStatiscal Modeling vs Machine Learninghttp://napitupulu-jon.appspot.com/posts/statistics-vs-ml.htmlJonathan Hari Napitupulu<div tabindex="-1" id="notebook" class="border-box-sizing"> <div class="container" id="notebook-container"> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>In a Data Science process, after Data Scientist question the data and extract many useful information, it's time to get into the modeling process.</p> <p><a href="http://napitupulu-jon.appspot.com/posts/statistics-vs-ml.html">Read more…</a> (4 min remaining to read)</p></div></div></div></div></div>Data Sciencemachine learningStatisticshttp://napitupulu-jon.appspot.com/posts/statistics-vs-ml.htmlFri, 15 Jul 2016 13:00:14 GMT10 Minutes into Data Sciencehttp://napitupulu-jon.appspot.com/posts/10minutes-ds.htmlJonathan Hari Napitupulu<div tabindex="-1" id="notebook" class="border-box-sizing"> <div class="container" id="notebook-container"> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>I just followed John Hopkin's Executive Data Science team. In the <a href="https://www.coursera.org/learn/data-science-course/lecture/X4Z9T/what-is-data-science">first chapter of the course</a> Jeff Leek said,</p> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <blockquote><p></p><center> In Data Science, the importance is science and not data. Data Science is only useful when we use data to answer the question. </center> </blockquote> <p><a href="http://napitupulu-jon.appspot.com/posts/10minutes-ds.html">Read more…</a> (5 min remaining to read)</p></div></div></div></div></div>Data Sciencemachine learningStatisticshttp://napitupulu-jon.appspot.com/posts/10minutes-ds.htmlSat, 04 Jun 2016 23:00:14 GMTVisualization for Big Datahttp://napitupulu-jon.appspot.com/posts/bigdata-visualization-cs109.htmlJonathan Hari Napitupulu<div tabindex="-1" id="notebook" class="border-box-sizing"> <div class="container" id="notebook-container"> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>So the problem is your data is big. Really big. So big that you can't even fit the data in your visualization. There's too muuch data. And how about displaying and get insights from it. We could buy larger display to fit all visualization that we need. But data is growing so big that someday we still need another display. Let's step through what else can we do. </p><p><a href="http://napitupulu-jon.appspot.com/posts/bigdata-visualization-cs109.html">Read more…</a> (1 min remaining to read)</p></div></div></div></div></div>cs109-harvardData AnalysisData Sciencehttp://napitupulu-jon.appspot.com/posts/bigdata-visualization-cs109.htmlThu, 12 Feb 2015 10:58:14 GMTDesign Principles of Visualizationhttp://napitupulu-jon.appspot.com/posts/visualization-design-principles-cs109.htmlJonathan Hari Napitupulu<div tabindex="-1" id="notebook" class="border-box-sizing"> <div class="container" id="notebook-container"> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>In Visualization that you want to communicate your finding, there are two ways to choose. Expressive Visualizaton and Effective Visualization. Expressive Visualization will only shows you the data and just the data.We show the visualization and force the readers to only focus on one end, one specific end that we choose. Expressive Visualization however is how about the user interactive with our visualization, so they themselves can find the insights.Expressive communication also called author driven, and Effective Visualization called readers driven. You may want to check this <a href="http://napitupulu-jon.appspot.com/posts/inter-anim-ud507.html">article</a>. </p><p><a href="http://napitupulu-jon.appspot.com/posts/visualization-design-principles-cs109.html">Read more…</a> (6 min remaining to read)</p></div></div></div></div></div>cs109-harvardData AnalysisData Sciencehttp://napitupulu-jon.appspot.com/posts/visualization-design-principles-cs109.htmlWed, 11 Feb 2015 10:58:14 GMTVisualization for Multi-Dimensional Datahttp://napitupulu-jon.appspot.com/posts/multidim-visualization-cs109.htmlJonathan Hari Napitupulu<div tabindex="-1" id="notebook" class="border-box-sizing"> <div class="container" id="notebook-container"> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>If you want to take visualization for more than 3D, it's best to use retinal variables like shapes and colors. But you know that multi-dimensional data isn't just 3D, or 4D. It could be thousands or hundreds of thousands of columns in your data. All of these data have different characteristic. It's important to know that you want your data in the same page, in the same direction, same data type. This way when you're trying to visualize it, you expect to see pattern. Try to manipulate the data so they're in the same group, but not biased. You might also want to check the scales, see if all the data is in the same tuned. Data that are the same page called Homogeneity, otherwise called Heterogeneity. </p><p><a href="http://napitupulu-jon.appspot.com/posts/multidim-visualization-cs109.html">Read more…</a> (5 min remaining to read)</p></div></div></div></div></div>cs109-harvardData AnalysisData Sciencehttp://napitupulu-jon.appspot.com/posts/multidim-visualization-cs109.htmlWed, 11 Feb 2015 10:58:14 GMTGraph and Visualizationhttp://napitupulu-jon.appspot.com/posts/graph-visualization-cs109.htmlJonathan Hari Napitupulu<div tabindex="-1" id="notebook" class="border-box-sizing"> <div class="container" id="notebook-container"> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>Visualization is important in data science field. One of the reason is the famous Ancombe's Quartet, where we have the same statistical overview of the data, but turns out it's a different kind of data when we are trying to visualize it. Similar thing happen when last Russian President Elections, the data that supposed to normally distributed of voters, turns out it doesn't. You can look in the internet event public trust Gauss/Normal Distribution more than its head committee. </p><p><a href="http://napitupulu-jon.appspot.com/posts/graph-visualization-cs109.html">Read more…</a> (8 min remaining to read)</p></div></div></div></div></div>cs109-harvardData ScienceData Visualizationhttp://napitupulu-jon.appspot.com/posts/graph-visualization-cs109.htmlWed, 28 Jan 2015 10:58:14 GMTStatistics and Exploratory Data Analysishttp://napitupulu-jon.appspot.com/posts/statistical-eda-cs109.htmlJonathan Hari Napitupulu<div tabindex="-1" id="notebook" class="border-box-sizing"> <div class="container" id="notebook-container"> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>Earlier in my blog, I said that Data Wrangling is not skill that statistician has. Take a lot of process to clean this data. If there's outliers, two thing comes in mind. First is if this an error in the input. If it was input by human, there's maybe human error. Or sensor error if have sensor, or network error. Second, the outliers that really mean outliers. It's just kind of noise that usually the data has. It's up to us to treat the outliers, or ignore it as we want to focus on the data. </p><p><a href="http://napitupulu-jon.appspot.com/posts/statistical-eda-cs109.html">Read more…</a> (2 min remaining to read)</p></div></div></div></div></div>cs109-harvardData AnalysisData Sciencehttp://napitupulu-jon.appspot.com/posts/statistical-eda-cs109.htmlTue, 27 Jan 2015 10:58:14 GMTWhy Data Science?http://napitupulu-jon.appspot.com/posts/introds-cs109.htmlJonathan Hari Napitupulu<div tabindex="-1" id="notebook" class="border-box-sizing"> <div class="container" id="notebook-container"> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>So what is Data Science? Why this field that comes from nowhere suddenly become the most popular carreer? Data itself has been cheap, free, spread across the internet. Earlier back in the days, companies just keep data from themselves. And this make it rare and expensive. But turns out, the data that they only have, is not so much if they want to see it from the bigger picture. This is why they need more data, from other sources. They can't keep learn from the data that they only have. Generated more data can be easy. But it need people who can get insights from huge number of sources. 90% of the data created from the last two years. This of course can't keep up with existing people that already in the field. And suddenly world need data scientist. </p><p><a href="http://napitupulu-jon.appspot.com/posts/introds-cs109.html">Read more…</a> (3 min remaining to read)</p></div></div></div></div></div>cs109-harvardData Sciencehttp://napitupulu-jon.appspot.com/posts/introds-cs109.htmlMon, 26 Jan 2015 10:58:14 GMTUsing MapReduce and Design Patternhttp://napitupulu-jon.appspot.com/posts/mapreduce-ud617.htmlJonathan Hari Napitupulu<div tabindex="-1" id="notebook" class="border-box-sizing"> <div class="container" id="notebook-container"> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>MapReduce can handle all your process in the Hadoop File System. It broke your data into chunks that reside in each cluster, then perform your data in parallel way. Suppose we need hashtable and we use it to add key value to our hashtable. If there's millions data, we can be low on memory, out of memory perhaps. And the hashtable will be created after a long time. For this thing, parallel could be work. </p><p><a href="http://napitupulu-jon.appspot.com/posts/mapreduce-ud617.html">Read more…</a> (5 min remaining to read)</p></div></div></div></div></div>Data ScienceHadoopUdacityhttp://napitupulu-jon.appspot.com/posts/mapreduce-ud617.htmlFri, 23 Jan 2015 10:58:14 GMT