It's been 6 months since my last blog. There are multiple blog drafts to be honest, but many of them took a really long time to finish. I guess I should have sliced the material a bit to multiple blogs. Anyway in this blog, I want to show how to achieve more than 95% accuracy with just Macbook Air. So let's dig down to details.
This material originally comes Deep Learning Lecture 4 from Udacity. In this blog, Pickling, reformat, accuracy, and session are theirs, but the architecture is my own which is the core of Deep Learning. It's kind of refreshing because I have experience it before (yes, Andrew Ng's Coursera Machine Learning on Neural Network).
Fintech company is a place where you can borrow and lend a money. The power to send is limitless. As one of Forbes Articles described, “Fintech companies, as they’ve come to be called, are easing payment processes, reducing fraud, saving users money, promoting financial planning, and ultimately moving a giant industry forward.” When talking about fintech companies, one that comes to mind is Prosper. In this blog, I will use their data to perform the analysis.
One thing could be changed when you do multiple metrics instead of single metric. There could be one metric that could occur significantly different by chance. That is if you choose 5% fixed significant level, there could be one metric that significant, but only one time. When you do some experiment in any other day, it shouldn't be reoccured. One thing that we could do is perform multiple comparison, see which of the metric behave differently.
We can use multiple comparison when for example we have automated alerting. See if suddenly metric that behave differently occurs. Or if we use automated framework in exploratory data analysis, you want to make sure that the metric is occurs and the different is repeatable.
When you have single evaluation metric, you have to know what is the impact of your results to the business side. Analytical speaking, you want to find whether your results is significantly different. You then also want to know about the magnitude and direction of your changes.
If your results is statisically significant, then you can interpret the results based on the how you characterize the metric and build intuition from it, just as we have discussed in previous blog. You also want to check the variability of the metric that you experiment.
If your results is not statiscally significant when it really should, then you can do two things. You could subset your experiment by platform, time (day of the week) see what went wrong or different significant if subset by those features. It could lead you to new hypothesis test and understand how your participants reacts. If you just begin in your experiment, you should cross-check your parametric hypothesis and non-parametric hypothesis test.
In this blog series, we're going to talk about analzying our results, what we interpret from the results of the experiment, what we can and can't conclude. We will use invariant metric for sanity check, as we will be discuss in this blog. Evaluate in single metric and multiple metric, also gotchas in analysis.
What is the duration of the experiment? Is it long time? How much long before participants gives any feedback? This our finalize subject of the experiment. We will also be talking about exposure. How much users you want them to see your experimental features, will affect the duration of your experiment.
There's many things to take into account when choosing which size for your experiment. Practical significance level, statistical significance level, sensitivity, metric, cohort, population will result in different variability.
Variability and the duration of your metric. Suppose you want to run an experiment that will affect global user. Running experiment worldwide is time consuming since you observe a lot of users. What you want to do is take subset of population, doing cohort for example. Choosing this will give you much smaller size and different variability. But it will give you some intuituion whether your experiment is actually have an effect.
Suppose you know that from video latency example in previous blog, what you're really want is people with 90th percentile, that is people with slower internet connection. And because you want to have immediate feedback, you cohort based on users that last activity seen in 2 month. This experiment could give you decision whether you want to continue for worldwide experiment.