We have two initial hypothesis.
Unit of diversion: cookie
We have two initial hypothesis.
Expect analytical variance match empirical variance because unit of analysis and unit of diversion is same.
To calculate standard deviation, we use this formula
Formula = np.sqrt(p * (1-p) / n)
and using baseline data below.
baselines= """Unique cookies to view page per day: 40000
Unique cookies to click "Start free trial" per day: 3200
Enrollments per day: 660
Click-through-probability on "Start free trial": 0.08
Probability of enrolling, given click: 0.20625
Probability of payment, given enroll: 0.53
Probability of payment, given click: 0.1093125"""
lines = baselines.split('\n')
d_baseline = dict([(e.split(':\t')[0],float(e.split(':\t')[1])) for e in lines])
Since we have 5000 sample cookies instead of the original 40000, we can adjust accordingly using calculate probability. For these two evaluation metric, we need number of users who click "Start Now" button, and calculated as
n = 5000
n_click = n * d_baseline['Click-through-probability on "Start free trial"']
n_click
400.0
Next, standard deviation for Gross conversion is
p = d_baseline['Probability of enrolling, given click']
round(np.sqrt(p * (1-p) / n_click),4)
0.0202
and for Net Conversion,
p = d_baseline['Probability of payment, given click']
round(np.sqrt(p * (1-p) / n_click),4)
0.0156
Gross Conversion and Net Conversion, their empirical variance should approximate analytical variance, because the unit of analysis and unit of diversion is the same, cookie-ids/user-ids.
The pageviews needed then will be: 685275 impression.
We feed it into sample size calculator.
We can use bigger number, so the minimum required cookies is sufficient. The sample size is only for one group, so output from the calculator must be doubled to get the enough pageviews. Since this only the user who clicks, we calculate number of pageviews using CTP. The pageviews needed then will be:
(27411 * 2) / d_baseline['Click-through-probability on "Start free trial"']
685275.0
Number of Cookies:
Number of clicks on “Start free trial”:
Click-through-probability on “Start free trial”:
Since we have passed all of the sanity checks, we can continue to analyze the experiment.
control = pd.read_csv('control_data.csv')
experiment = pd.read_csv('experiment.csv')
control.head()
Date | Pageviews | Clicks | Enrollments | Payments | |
---|---|---|---|---|---|
0 | Sat, Oct 11 | 7723 | 687 | 134 | 70 |
1 | Sun, Oct 12 | 9102 | 779 | 147 | 70 |
2 | Mon, Oct 13 | 10511 | 909 | 167 | 95 |
3 | Tue, Oct 14 | 9871 | 836 | 156 | 105 |
4 | Wed, Oct 15 | 10014 | 837 | 163 | 64 |
experiment.head()
Date | Pageviews | Clicks | Enrollments | Payments | |
---|---|---|---|---|---|
0 | Sat, Oct 11 | 7716 | 686 | 105 | 34 |
1 | Sun, Oct 12 | 9288 | 785 | 116 | 91 |
2 | Mon, Oct 13 | 10480 | 884 | 145 | 79 |
3 | Tue, Oct 14 | 9867 | 827 | 138 | 92 |
4 | Wed, Oct 15 | 9793 | 832 | 140 | 94 |
Next, we count the total views and clicks for both control and experiment groups.
control_views = control.Pageviews.sum()
control_clicks = control.Clicks.sum()
experiment_views = experiment.Pageviews.sum()
experiment_clicks = experiment.Clicks.sum()
For count like number of cookies and number of clicks in "Start free trial" button, we can do confidence interval around the fraction we expect in control group, and actual fraction as the observed outcome. Since we expect control and experiment to have equal proportion, we set the the expected proportion to be 0.5. Both invariant metrics, the confidence interval for sanity checks use the function below.
def sanity_check_CI(control,experiment,expected):
SE = np.sqrt((expected*(1-expected))/(control + experiment))
ME = 1.96 * SE
return (expected-ME,expected+ME)
Now for sanity checks confidence interval of number of cookies who views the page,
sanity_check_CI(control_views,experiment_views,0.5)
(0.49882039214902313, 0.50117960785097693)
The actual proportion is
float(control_views)/(control_views+experiment_views)
0.5006396668806133
Since we know that 0.5006 is within the interval, then experiment pass sanity checks for number of cookies.
Next, we calculate confidence interval of number of clicks at "Start free trial" button.
sanity_check_CI(control_clicks,experiment_clicks,0.5)
(0.49588449572378945, 0.50411550427621055)
And the actual proportion,
float(control_clicks)/(control_clicks+experiment_clicks)
0.5004673474066628
Again 0.5006 is within the interval, so our experiment also pass the sanity check.
For our sanity check with ctp, is a little different calculation. Using simple count earlier, we know that if we setup our experiment in a proper way, the true proportion of control group should be 0.5. Since we don't know the true proportion of ctp control group, we build confidence interval around the control group, and ctp experiment as observed outcome. If the experiment change and ctp experiment is outside ctp control confidence interval, then our experiment failed sanity checks. Thus we can't continue our analysis.
ctp_control = float(control_clicks)/control_views
ctp_experiment = float(experiment_clicks)/experiment_views
# %%R
c = 28378
n = 345543
CL = 0.95
pe = c/n
SE = sqrt(pe*(1-pe)/n)
z_star = round(qnorm((1-CL)/2,lower.tail=F),digits=2)
ME = z_star * SE
c(pe-ME, pe+ME)
ctp_experiment
0.082125813574576823
And as you can see, click-through-probability of the experiment is still within the confidence interval of click-through-probability control groups. Since we have passed all of the sanity checks, we can continue to analyze the experiment.
get_gross = lambda group: float(group.dropna().Enrollments.sum())/ group.Clicks.sum()
get_net = lambda group: float(group.dropna().Payments.sum())/ group.Clicks.sum()
Keep in mind that observed_difference can be negative
print('N_cont = %i'%control.dropna().Clicks.sum())
print('X_cont = %i'%control.dropna().Enrollments.sum())
print('N_exp = %i'%experiment.dropna().Clicks.sum())
print('X_exp = %i'%experiment.dropna().Enrollments.sum())
N_cont = 17293 X_cont = 3785 N_exp = 17260 X_exp = 3423
X_exp/N_exp
X_cont/N_cont
#%%R
N_cont = 17293
X_cont = 3785
N_exp = 17260
X_exp = 3423
observed_diff = X_exp/N_exp - X_cont/N_cont
# print(observed_diff)
p_pool = (X_cont+X_exp)/(N_cont+N_exp)
SE = sqrt( (p_pool*(1-p_pool)) * ((1/N_cont) + (1/N_exp)))
ME = 1.96 * SE
# print(p_pool)
c(observed_diff-ME, observed_diff+ME)
observed_diff
The observed difference is outside the confidence interval. And the observed difference also above 0.01 dmin, minimum detectable effect. We should definitely launch.
print('N_cont = %i'%control.dropna().Clicks.sum())
print('X_cont = %i'%control.dropna().Payments.sum())
print('N_exp = %i'%experiment.dropna().Clicks.sum())
print('X_exp = %i'%experiment.dropna().Payments.sum())
N_cont = 17293 X_cont = 2033 N_exp = 17260 X_exp = 1945
X_exp/N_exp
X_cont/N_cont
#%%R
N_cont = 17293
X_cont = 2033
N_exp = 17260
X_exp = 1945
observed_diff = X_exp/N_exp - X_cont/N_cont
# print(observed_diff)
p_pool = (X_cont+X_exp)/(N_cont+N_exp)
SE = sqrt( (p_pool*(1-p_pool)) * ((1/N_cont) + (1/N_exp)))
ME = 1.96 * SE
# print(p_pool)
c(observed_diff-ME, observed_diff+ME)
observed_diff
compare_prob = lambda col: ((control.dropna()[col] / control.dropna().Clicks) <
(experiment.dropna()[col]/experiment.dropna().Clicks))
Count the gross conversion, I got,
compare_prob('Enrollments').value_counts()
False 19 True 4 dtype: int64
compare_prob('Payments').value_counts()
False 13 True 10 dtype: int64
I got p-value of 0.6776 for Net Conversion.
Net Conversion: somehow pass, can loss potential money
decision: risky. delay for further experiment or cancel the launch.
We can use Invariant metrics for this experiment for the follow-up:
And the evaluation metric:
We use user-ids as unit of diversion, expect all of the evaluation metrics to be practically significant.