Data Science, Python, Games (Statistics)http://napitupulu-jon.appspot.com//enFri, 15 Jul 2016 13:30:57 GMThttps://getnikola.com/http://blogs.law.harvard.edu/tech/rssStatiscal Modeling vs Machine Learninghttp://napitupulu-jon.appspot.com/posts/statistics-vs-ml.htmlJonathan Hari Napitupulu<div tabindex="-1" id="notebook" class="border-box-sizing"> <div class="container" id="notebook-container"> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>In a Data Science process, after Data Scientist question the data and extract many useful information, it's time to get into the modeling process.</p> <p><a href="http://napitupulu-jon.appspot.com/posts/statistics-vs-ml.html">Read more…</a> (4 min remaining to read)</p></div></div></div></div></div>Data Sciencemachine learningStatisticshttp://napitupulu-jon.appspot.com/posts/statistics-vs-ml.htmlFri, 15 Jul 2016 13:00:14 GMT10 Minutes into Data Sciencehttp://napitupulu-jon.appspot.com/posts/10minutes-ds.htmlJonathan Hari Napitupulu<div tabindex="-1" id="notebook" class="border-box-sizing"> <div class="container" id="notebook-container"> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>I just followed John Hopkin's Executive Data Science team. In the <a href="https://www.coursera.org/learn/data-science-course/lecture/X4Z9T/what-is-data-science">first chapter of the course</a> Jeff Leek said,</p> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <blockquote><p></p><center> In Data Science, the importance is science and not data. Data Science is only useful when we use data to answer the question. </center> </blockquote> <p><a href="http://napitupulu-jon.appspot.com/posts/10minutes-ds.html">Read more…</a> (5 min remaining to read)</p></div></div></div></div></div>Data Sciencemachine learningStatisticshttp://napitupulu-jon.appspot.com/posts/10minutes-ds.htmlSat, 04 Jun 2016 23:00:14 GMTA/B Testing Multiple Metricshttp://napitupulu-jon.appspot.com/posts/multiple-metric-abtesting-udacity.htmlJonathan Hari Napitupulu<div tabindex="-1" id="notebook" class="border-box-sizing"> <div class="container" id="notebook-container"> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>One thing could be changed when you do multiple metrics instead of single metric. There could be one metric that could occur significantly different by chance. That is if you choose 5% fixed significant level, there could be one metric that significant, but only one time. When you do some experiment in any other day, it shouldn't be reoccured. One thing that we could do is perform multiple comparison, see which of the metric behave differently.</p> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>We can use multiple comparison when for example we have automated alerting. See if suddenly metric that behave differently occurs. Or if we use automated framework in exploratory data analysis, you want to make sure that the metric is occurs and the different is repeatable.</p> <p><a href="http://napitupulu-jon.appspot.com/posts/multiple-metric-abtesting-udacity.html">Read more…</a> (9 min remaining to read)</p></div></div></div></div></div>A/B TestingData AnalysisStatisticsUdacityhttp://napitupulu-jon.appspot.com/posts/multiple-metric-abtesting-udacity.htmlWed, 06 Jan 2016 03:00:14 GMTA/B Testing Single Metrichttp://napitupulu-jon.appspot.com/posts/single-metric-abtesting-udacity.htmlJonathan Hari Napitupulu<div tabindex="-1" id="notebook" class="border-box-sizing"> <div class="container" id="notebook-container"> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>When you have single evaluation metric, you have to know what is the impact of your results to the business side. Analytical speaking, you want to find whether your results is significantly different. You then also want to know about the magnitude and direction of your changes.</p> <p>If your results is statisically significant, then you can interpret the results based on the how you characterize the metric and build intuition from it, just as we have discussed in previous blog. You also want to check the variability of the metric that you experiment.</p> <p>If your results is not statiscally significant when it really should, then you can do two things. You could subset your experiment by platform, time (day of the week) see what went wrong or different significant if subset by those features. It could lead you to new hypothesis test and understand how your participants reacts. If you just begin in your experiment, you should cross-check your parametric hypothesis and non-parametric hypothesis test.</p> <p><a href="http://napitupulu-jon.appspot.com/posts/single-metric-abtesting-udacity.html">Read more…</a> (6 min remaining to read)</p></div></div></div></div></div>A/B TestingData AnalysisStatisticsUdacityhttp://napitupulu-jon.appspot.com/posts/single-metric-abtesting-udacity.htmlTue, 29 Dec 2015 03:00:14 GMTA/B Testing Sanity Checkhttp://napitupulu-jon.appspot.com/posts/sanity-check-abtesting-udacity.htmlJonathan Hari Napitupulu<div tabindex="-1" id="notebook" class="border-box-sizing"> <div class="container" id="notebook-container"> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>In this blog series, we're going to talk about analzying our results, what we interpret from the results of the experiment, what we can and can't conclude. We will use invariant metric for sanity check, as we will be discuss in this blog. Evaluate in single metric and multiple metric, also gotchas in analysis.</p> <p><a href="http://napitupulu-jon.appspot.com/posts/sanity-check-abtesting-udacity.html">Read more…</a> (6 min remaining to read)</p></div></div></div></div></div>A/B TestingData AnalysisStatisticsUdacityhttp://napitupulu-jon.appspot.com/posts/sanity-check-abtesting-udacity.htmlWed, 23 Dec 2015 03:00:14 GMTDuration of Experimenthttp://napitupulu-jon.appspot.com/posts/duration-experiment-abtesting-udacity.htmlJonathan Hari Napitupulu<div tabindex="-1" id="notebook" class="border-box-sizing"> <div class="container" id="notebook-container"> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>What is the duration of the experiment? Is it long time? How much long before participants gives any feedback? This our finalize subject of the experiment. We will also be talking about exposure. How much users you want them to see your experimental features, will affect the duration of your experiment.</p> <p><a href="http://napitupulu-jon.appspot.com/posts/duration-experiment-abtesting-udacity.html">Read more…</a> (5 min remaining to read)</p></div></div></div></div></div>A/B TestingData AnalysisStatisticsUdacityhttp://napitupulu-jon.appspot.com/posts/duration-experiment-abtesting-udacity.htmlWed, 16 Dec 2015 03:00:14 GMTSize of Experimenthttp://napitupulu-jon.appspot.com/posts/size-experiment-abtesting-udacity.htmlJonathan Hari Napitupulu<div tabindex="-1" id="notebook" class="border-box-sizing"> <div class="container" id="notebook-container"> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>There's many things to take into account when choosing which size for your experiment. Practical significance level, statistical significance level, sensitivity, metric, cohort, population will result in different variability.</p> <p>Variability and the duration of your metric. Suppose you want to run an experiment that will affect global user. Running experiment worldwide is time consuming since you observe a lot of users. What you want to do is take subset of population, doing cohort for example. Choosing this will give you much smaller size and different variability. But it will give you some intuituion whether your experiment is actually have an effect.</p> <p>Suppose you know that from video latency example in previous blog, what you're really want is people with 90th percentile, that is people with slower internet connection. And because you want to have immediate feedback, you cohort based on users that last activity seen in 2 month. This experiment could give you decision whether you want to continue for worldwide experiment.</p> <p><a href="http://napitupulu-jon.appspot.com/posts/size-experiment-abtesting-udacity.html">Read more…</a> (5 min remaining to read)</p></div></div></div></div></div>A/B TestingData AnalysisStatisticsUdacityhttp://napitupulu-jon.appspot.com/posts/size-experiment-abtesting-udacity.htmlWed, 09 Dec 2015 03:00:14 GMTPopulation of Experimenthttp://napitupulu-jon.appspot.com/posts/population-experiment-abtesting-udacity.htmlJonathan Hari Napitupulu<div tabindex="-1" id="notebook" class="border-box-sizing"> <div class="container" id="notebook-container"> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>When choosing population, we left with two kind of user experiment, inter and intra user. Intra user is like event-based experiments, where it could be same user to the same group. Intra user let same user in control and experiment group. The thing to keep in mind is that you don't want to run the experiments before and after big event, like Christmas. This could be vary greatly.</p> <p>The other alternative is inter-user experiments. This would means that different user in both groups. And we also want to keep something like Cohort, or lurking variables. It's the variable that potentially makes bias if we don't divide equal features for both group. Medical experiment usually more sensitive, randomly assign equal of both groups, variables like demographic, gender, age, etc. A/B testing on internet experiments lack of such things. We don't event know whether the participants is real people.</p> <p><a href="http://napitupulu-jon.appspot.com/posts/population-experiment-abtesting-udacity.html">Read more…</a> (4 min remaining to read)</p></div></div></div></div></div>A/B TestingData AnalysisStatisticsUdacityhttp://napitupulu-jon.appspot.com/posts/population-experiment-abtesting-udacity.htmlWed, 02 Dec 2015 03:00:14 GMTSubject of Experimenthttp://napitupulu-jon.appspot.com/posts/subject-experiment-abtesting-udacity.htmlJonathan Hari Napitupulu<div tabindex="-1" id="notebook" class="border-box-sizing"> <div class="container" id="notebook-container"> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>When designing an A/B Testing experiment, there's four main considerations:</p> <ul> <li>Choosing subject</li> <li>Choosing population</li> <li>Size</li> <li>Duration</li> </ul> <p>In this blog, we want to choose the subject of our experiment. That is unit to measure for our control and experiment group. This unit will be called <strong>unit of diversion</strong>.</p> <p><a href="http://napitupulu-jon.appspot.com/posts/subject-experiment-abtesting-udacity.html">Read more…</a> (7 min remaining to read)</p></div></div></div></div></div>A/B TestingData AnalysisStatisticsUdacityhttp://napitupulu-jon.appspot.com/posts/subject-experiment-abtesting-udacity.htmlTue, 01 Dec 2015 11:00:14 GMTVariability of Metricshttp://napitupulu-jon.appspot.com/posts/variability-abtesting-udacity.htmlJonathan Hari Napitupulu<div tabindex="-1" id="notebook" class="border-box-sizing"> <div class="container" id="notebook-container"> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"> </div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html"> <p>When characterizing our experiment, we're talking about how the metric can used in particular situation. We want to know range of possible condition that metrics can be used, and that's variability. Usually, when we talking about probability (e.g. click-through-probability) or count of users, we can see nice normal distribution by plot it in histogram. This would make it easy for us to use theorically-compute variability, in form of Confidence interval. We could also use our normal practival significance level. However, when using video latencies like in previous blog gives us sort of lumpy shape, we want to compute the variability empirically. This blog discuss about how we characterize metrics for our experiment, specifically by <strong>measures of spread</strong>.</p> <p><a href="http://napitupulu-jon.appspot.com/posts/variability-abtesting-udacity.html">Read more…</a> (8 min remaining to read)</p></div></div></div></div></div>A/B TestingData AnalysisStatisticsUdacityhttp://napitupulu-jon.appspot.com/posts/variability-abtesting-udacity.htmlMon, 16 Nov 2015 11:00:14 GMT