Data Science, Python, Games (machine learning)http://napitupulu-jon.appspot.com//enSat, 14 Jan 2017 01:22:08 GMThttps://getnikola.com/http://blogs.law.harvard.edu/tech/rssDeep Learning for Letter Recognition with Tensorflowhttp://napitupulu-jon.appspot.com/posts/deep-learning-tensorflow-mnist.htmlJonathan Hari Napitupulu<div tabindex="-1" id="notebook" class="border-box-sizing"> <div class="container" id="notebook-container"> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>It's been 6 months since my last blog. There are multiple blog drafts to be honest, but many of them took a really long time to finish. I guess I should have sliced the material a bit to multiple blogs. Anyway in this blog, I want to show how to achieve more than 95% accuracy with just Macbook Air. So let's dig down to details.</p> <p>This material originally comes <a href="https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/udacity/4_convolutions.ipynb">Deep Learning Lecture 4 from Udacity</a>. In this blog, Pickling, reformat, accuracy, and session are theirs, but the architecture is my own which is the core of Deep Learning. It's kind of refreshing because I have experience it before (yes, Andrew Ng's Coursera Machine Learning on Neural Network). </p><p><a href="http://napitupulu-jon.appspot.com/posts/deep-learning-tensorflow-mnist.html">Read more…</a> (11 min remaining to read)</p></div></div></div></div></div>Data ScienceDeep Learningmachine learningNeural NetworksTensorflowUdacityhttp://napitupulu-jon.appspot.com/posts/deep-learning-tensorflow-mnist.htmlFri, 13 Jan 2017 13:00:14 GMTStatiscal Modeling vs Machine Learninghttp://napitupulu-jon.appspot.com/posts/statistics-vs-ml.htmlJonathan Hari Napitupulu<div tabindex="-1" id="notebook" class="border-box-sizing"> <div class="container" id="notebook-container"> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>In a Data Science process, after Data Scientist question the data and extract many useful information, it's time to get into the modeling process.</p> <p><a href="http://napitupulu-jon.appspot.com/posts/statistics-vs-ml.html">Read more…</a> (4 min remaining to read)</p></div></div></div></div></div>Data Sciencemachine learningStatisticshttp://napitupulu-jon.appspot.com/posts/statistics-vs-ml.htmlFri, 15 Jul 2016 13:00:14 GMT10 Minutes into Data Sciencehttp://napitupulu-jon.appspot.com/posts/10minutes-ds.htmlJonathan Hari Napitupulu<div tabindex="-1" id="notebook" class="border-box-sizing"> <div class="container" id="notebook-container"> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>I just followed John Hopkin's Executive Data Science team. In the <a href="https://www.coursera.org/learn/data-science-course/lecture/X4Z9T/what-is-data-science">first chapter of the course</a> Jeff Leek said,</p> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <blockquote><p></p><center> In Data Science, the importance is science and not data. Data Science is only useful when we use data to answer the question. </center> </blockquote> <p><a href="http://napitupulu-jon.appspot.com/posts/10minutes-ds.html">Read more…</a> (5 min remaining to read)</p></div></div></div></div></div>Data Sciencemachine learningStatisticshttp://napitupulu-jon.appspot.com/posts/10minutes-ds.htmlSat, 04 Jun 2016 23:00:14 GMTvalidation with scikit-learnhttp://napitupulu-jon.appspot.com/posts/validation-ud120.htmlJonathan Hari Napitupulu<div tabindex="-1" id="notebook" class="border-box-sizing"> <div class="container" id="notebook-container"> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p><img src="http://napitupulu-jon.appspot.com/galleries/validation/1.jpg" alt="jpeg"></p> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>With separate training and testing dataset, we would know how are the performance of our learning model against dataset that haven't been seen. In this way we know how our model generalize if there's new examples. It also acts as a background check whether your model is overfitting. It may cause dataset shortage, but it's kind of step that is worth for. Keep in mind that every machine learning algorithm fit in the training set, not in a test set. If you fit in a test set, and score also in the test set, you definitely would have high performance. And that's called tremendous CHEATING in machine learning. </p><p><a href="http://napitupulu-jon.appspot.com/posts/validation-ud120.html">Read more…</a> (6 min remaining to read)</p></div></div></div></div></div>Advice Apply MLmachine learningUdacityhttp://napitupulu-jon.appspot.com/posts/validation-ud120.htmlFri, 12 Dec 2014 10:58:14 GMTevaluation with scikit-learnhttp://napitupulu-jon.appspot.com/posts/evaluation-ud120.htmlJonathan Hari Napitupulu<div tabindex="-1" id="notebook" class="border-box-sizing"> <div class="container" id="notebook-container"> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p><img src="http://napitupulu-jon.appspot.com/galleries/evaluation/1.jpg" alt="jpeg"></p> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>Skewed Class is when the label is too scarce from the other label (suppose it's a binary classification). Takes POI for example, where the non POI is huge compare to POI. This would give imbalance label, and POI lacks train dataset for the learning model. </p><p><a href="http://napitupulu-jon.appspot.com/posts/evaluation-ud120.html">Read more…</a> (8 min remaining to read)</p></div></div></div></div></div>machine learningSystem DesignUdacityhttp://napitupulu-jon.appspot.com/posts/evaluation-ud120.htmlWed, 10 Dec 2014 10:58:14 GMTPCA with scikit-learnhttp://napitupulu-jon.appspot.com/posts/pca-ud120.htmlJonathan Hari Napitupulu<div tabindex="-1" id="notebook" class="border-box-sizing"> <div class="container" id="notebook-container"> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p><img src="http://napitupulu-jon.appspot.com/galleries/pca/2.jpg" alt="jpeg"></p> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>PCA is used thoroughly for most of the time in visualization data, alongside feature set compression. It's hard (othwerwise impossible) to interpret the data with more than three dimension. So we reduce it to two/third dimension, allow us to make the visualization. This could come very handy if we want to capture the general information about the data, or to present it as business context. But it's not the tool that we use for combining two features, or compress it, as PCA throws some (important) information. </p><p><a href="http://napitupulu-jon.appspot.com/posts/pca-ud120.html">Read more…</a> (12 min remaining to read)</p></div></div></div></div></div>Dimensionality Reductionmachine learningUdacityhttp://napitupulu-jon.appspot.com/posts/pca-ud120.htmlTue, 09 Dec 2014 10:58:14 GMTFeature Selection with scikit-learnhttp://napitupulu-jon.appspot.com/posts/feature-selection-ud120.htmlJonathan Hari Napitupulu<div tabindex="-1" id="notebook" class="border-box-sizing"> <div class="container" id="notebook-container"> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p><img src="http://napitupulu-jon.appspot.com/galleries/feature-selection/1.jpg" alt="jpeg"></p> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>Feature Selection is one of thing that we should pay attention when building machine learning algorithm. For all features available, there might be some unnecessary features that will overfitting your predictive model if you include it. So choose best features that's going to have good perfomance, and prioritize that. On the other hand, we may don't see that some features, that we really need is missing (in the case of using EDA). What we want to do is synthesize a new feature based on features available. </p><p><a href="http://napitupulu-jon.appspot.com/posts/feature-selection-ud120.html">Read more…</a> (17 min remaining to read)</p></div></div></div></div></div>Advice Apply MLmachine learningRegularizationUdacityhttp://napitupulu-jon.appspot.com/posts/feature-selection-ud120.htmlWed, 03 Dec 2014 10:58:14 GMTText Learning with scikit-learnhttp://napitupulu-jon.appspot.com/posts/text-learning-ud120.htmlJonathan Hari Napitupulu<div tabindex="-1" id="notebook" class="border-box-sizing"> <div class="container" id="notebook-container"> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p><img src="http://napitupulu-jon.appspot.com/galleries/text_learning/2.jpg" alt="jpeg"></p> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>Text Learning, is machine learning on broad area which incorporate text. Many search giants, like Google, Yahoo, Baidu, tried to to learn text from various search. In this example we take a look at bag of words, which contains words, and from the data, count the frequency of word occurs in the text. </p><p><a href="http://napitupulu-jon.appspot.com/posts/text-learning-ud120.html">Read more…</a> (10 min remaining to read)</p></div></div></div></div></div>machine learningUdacityhttp://napitupulu-jon.appspot.com/posts/text-learning-ud120.htmlWed, 03 Dec 2014 10:58:14 GMTFeature Scaling with scikit-learnhttp://napitupulu-jon.appspot.com/posts/feature-scaling-ud120.htmlJonathan Hari Napitupulu<div tabindex="-1" id="notebook" class="border-box-sizing"> <div class="container" id="notebook-container"> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p><img src="http://napitupulu-jon.appspot.com/galleries/feature_scaling/2.jpg" alt="jpeg"></p> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>Feature Selection is one important tool in Machine Learning.It's about how we normalize the range of each of our feature so that it can't dominate from one to another. Let's take this picture for example. </p><p><a href="http://napitupulu-jon.appspot.com/posts/feature-scaling-ud120.html">Read more…</a> (4 min remaining to read)</p></div></div></div></div></div>Advice Apply MLmachine learningUdacityhttp://napitupulu-jon.appspot.com/posts/feature-scaling-ud120.htmlTue, 02 Dec 2014 10:58:14 GMTK-Means with scikit-learnhttp://napitupulu-jon.appspot.com/posts/kmeans-ud120.htmlJonathan Hari Napitupulu<div tabindex="-1" id="notebook" class="border-box-sizing"> <div class="container" id="notebook-container"> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p><img src="http://napitupulu-jon.appspot.com/galleries/kmeans/2.jpg" alt="jpeg"></p> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>Clustering is one example of Unsupervised Learning. Unsupervised Learning is something that we want to learn from the datasets, eventhough we don't know what to label this particular data. This particular group in the plot,named cluster, is something that we want to learn this lesson. Dimensionality Reduction (PCA) also one of the Unsupervised Learning algorithm that I will be discussed in the future blog. </p><p><a href="http://napitupulu-jon.appspot.com/posts/kmeans-ud120.html">Read more…</a> (10 min remaining to read)</p></div></div></div></div></div>Clusteringmachine learningUdacityhttp://napitupulu-jon.appspot.com/posts/kmeans-ud120.htmlMon, 01 Dec 2014 10:58:14 GMT