Overview: Free Trial Screener

Background

  • Start free trial to allow paid version
  • Access course materials for free

experiment

png

  • users see this popup after click
  • if they're not committed, warning them.

We have two initial hypothesis.

  • “..this might set clearer expectations for students upfront, thus reducing the number of frustrated students who left the free trial because they didn't have enough time”
  • ”..without significantly reducing the number of students to continue past the free trial and eventually complete the course. “

Unit of diversion: cookie

  • users in free trial tracked by user-id
  • same user-id can't enroll in free trial twice
  • users not enroll can't tracked.

Experiment Design

Metric Choice

Invariant metrics

  • Number of cookies
  • Number of clicks
  • Click through probability

Evaluation metrics

  • Gross Conversion
  • Net Conversion

We have two initial hypothesis.

  • “..this might set clearer expectations for students upfront, thus reducing the number of frustrated students who left the free trial because they didn't have enough time”
  • ”..without significantly reducing the number of students to continue past the free trial and eventually complete the course. “

Measuring Standard Deviation

  • Gross Conversion: 0.02
  • Net Conversion 0.0156

Expect analytical variance match empirical variance because unit of analysis and unit of diversion is same.

To calculate standard deviation, we use this formula

Formula = np.sqrt(p * (1-p) / n)

and using baseline data below.

In [34]:
baselines= """Unique cookies to view page per day:	40000
Unique cookies to click "Start free trial" per day:	3200
Enrollments per day:	660
Click-through-probability on "Start free trial":	0.08
Probability of enrolling, given click:	0.20625
Probability of payment, given enroll:	0.53
Probability of payment, given click:	0.1093125"""

lines  = baselines.split('\n')
d_baseline = dict([(e.split(':\t')[0],float(e.split(':\t')[1])) for e in lines])

Since we have 5000 sample cookies instead of the original 40000, we can adjust accordingly using calculate probability. For these two evaluation metric, we need number of users who click "Start Now" button, and calculated as

In [79]:
n = 5000
n_click = n * d_baseline['Click-through-probability on "Start free trial"']
n_click
Out[79]:
400.0

Next, standard deviation for Gross conversion is

In [78]:
p = d_baseline['Probability of enrolling, given click']
round(np.sqrt(p * (1-p) / n_click),4)
Out[78]:
0.0202

and for Net Conversion,

In [77]:
p = d_baseline['Probability of payment, given click']
round(np.sqrt(p * (1-p) / n_click),4)
Out[77]:
0.0156

Gross Conversion and Net Conversion, their empirical variance should approximate analytical variance, because the unit of analysis and unit of diversion is the same, cookie-ids/user-ids.

Sizing

Number of Samples vs. Power

  • Gross Conversion. Baseline: 0.20625 dmin: 0.01 = 25.839 cookies who clicks.
  • Net Conversion. Baseline: 0.1093125 dmin: 0.0075 = 27,411 cookies who clicks.
  • Not using Bonferroni correction.
  • Using alpha = 0.05 and beta 0.2

The pageviews needed then will be: 685275 impression.

We feed it into sample size calculator.

We can use bigger number, so the minimum required cookies is sufficient. The sample size is only for one group, so output from the calculator must be doubled to get the enough pageviews. Since this only the user who clicks, we calculate number of pageviews using CTP. The pageviews needed then will be:

In [89]:
(27411 * 2) / d_baseline['Click-through-probability on "Start free trial"']
Out[89]:
685275.0

Duration vs. Exposure

  • Fraction: 0.8 (Low risk)
  • Duration: 22 days (40000 pageviews/day)

Experiment Analysis

Sanity Checks

  • Number of Cookies:

    • Bounds = (0.4988,0.5012)
    • Observed = 0.5006
    • Passes? Yes
  • Number of clicks on “Start free trial”:

    • Bounds = (0.4959,0.5041)
    • Observed = 0.5005
    • Passes? Yes
  • Click-through-probability on “Start free trial”:

    • Bounds = (0.0812,0.0830)
    • Observed = 0.0821
    • Passes? Yes

Since we have passed all of the sanity checks, we can continue to analyze the experiment.

In [4]:
control = pd.read_csv('control_data.csv')
experiment = pd.read_csv('experiment.csv')
In [5]:
control.head()
Out[5]:
Date Pageviews Clicks Enrollments Payments
0 Sat, Oct 11 7723 687 134 70
1 Sun, Oct 12 9102 779 147 70
2 Mon, Oct 13 10511 909 167 95
3 Tue, Oct 14 9871 836 156 105
4 Wed, Oct 15 10014 837 163 64
In [6]:
experiment.head()
Out[6]:
Date Pageviews Clicks Enrollments Payments
0 Sat, Oct 11 7716 686 105 34
1 Sun, Oct 12 9288 785 116 91
2 Mon, Oct 13 10480 884 145 79
3 Tue, Oct 14 9867 827 138 92
4 Wed, Oct 15 9793 832 140 94

Next, we count the total views and clicks for both control and experiment groups.

In [38]:
control_views = control.Pageviews.sum()
control_clicks = control.Clicks.sum()

experiment_views = experiment.Pageviews.sum()
experiment_clicks = experiment.Clicks.sum()

For count like number of cookies and number of clicks in "Start free trial" button, we can do confidence interval around the fraction we expect in control group, and actual fraction as the observed outcome. Since we expect control and experiment to have equal proportion, we set the the expected proportion to be 0.5. Both invariant metrics, the confidence interval for sanity checks use the function below.

In [7]:
def sanity_check_CI(control,experiment,expected):
    SE = np.sqrt((expected*(1-expected))/(control + experiment))
    ME = 1.96 * SE
    return (expected-ME,expected+ME)
    

Now for sanity checks confidence interval of number of cookies who views the page,

In [42]:
sanity_check_CI(control_views,experiment_views,0.5)
Out[42]:
(0.49882039214902313, 0.50117960785097693)

The actual proportion is

In [60]:
float(control_views)/(control_views+experiment_views)
Out[60]:
0.5006396668806133

Since we know that 0.5006 is within the interval, then experiment pass sanity checks for number of cookies.

Next, we calculate confidence interval of number of clicks at "Start free trial" button.

In [44]:
sanity_check_CI(control_clicks,experiment_clicks,0.5)
Out[44]:
(0.49588449572378945, 0.50411550427621055)

And the actual proportion,

In [61]:
float(control_clicks)/(control_clicks+experiment_clicks)
Out[61]:
0.5004673474066628

Again 0.5006 is within the interval, so our experiment also pass the sanity check.

For our sanity check with ctp, is a little different calculation. Using simple count earlier, we know that if we setup our experiment in a proper way, the true proportion of control group should be 0.5. Since we don't know the true proportion of ctp control group, we build confidence interval around the control group, and ctp experiment as observed outcome. If the experiment change and ctp experiment is outside ctp control confidence interval, then our experiment failed sanity checks. Thus we can't continue our analysis.

In [21]:
ctp_control = float(control_clicks)/control_views
ctp_experiment = float(experiment_clicks)/experiment_views
In [3]:
# %%R
c = 28378
n = 345543
CL = 0.95

pe = c/n
SE = sqrt(pe*(1-pe)/n)
z_star = round(qnorm((1-CL)/2,lower.tail=F),digits=2)
ME = z_star * SE

c(pe-ME, pe+ME)
Out[3]:
  1. 0.0812103597525297
  2. 0.0830412673966239
In [4]:
ctp_experiment
Out[4]:
0.082125813574576823

And as you can see, click-through-probability of the experiment is still within the confidence interval of click-through-probability control groups. Since we have passed all of the sanity checks, we can continue to analyze the experiment.

Effect Size Test

  • Did not use Bonferroni correction
  • Gross Conversion
    • Bounds = (-0.0291, -0.0120)
    • Statistical Significance? Yes
    • Practical Significance? Yes
  • Net Conversion
    • Bounds = (-0.0116,0.0019)
    • Statistical Significance? No
    • Practical Significance? No
In [29]:
get_gross = lambda group: float(group.dropna().Enrollments.sum())/ group.Clicks.sum()
get_net = lambda group: float(group.dropna().Payments.sum())/ group.Clicks.sum()

Gross Conversion

Keep in mind that observed_difference can be negative

In [40]:
print('N_cont = %i'%control.dropna().Clicks.sum())
print('X_cont = %i'%control.dropna().Enrollments.sum())
print('N_exp =  %i'%experiment.dropna().Clicks.sum())
print('X_exp =  %i'%experiment.dropna().Enrollments.sum())
N_cont = 17293
X_cont = 3785
N_exp =  17260
X_exp =  3423
In [3]:
X_exp/N_exp
Out[3]:
0.198319814600232
In [4]:
X_cont/N_cont
Out[4]:
0.218874689180593
In [1]:
#%%R

N_cont = 17293
X_cont = 3785
N_exp = 17260
X_exp = 3423

observed_diff = X_exp/N_exp - X_cont/N_cont
# print(observed_diff)
p_pool = (X_cont+X_exp)/(N_cont+N_exp)
SE = sqrt( (p_pool*(1-p_pool)) * ((1/N_cont) + (1/N_exp)))
ME = 1.96 * SE
# print(p_pool)
c(observed_diff-ME, observed_diff+ME)
Out[1]:
  1. -0.0291233583354044
  2. -0.0119863908253187
In [2]:
observed_diff
Out[2]:
-0.0205548745803616

The observed difference is outside the confidence interval. And the observed difference also above 0.01 dmin, minimum detectable effect. We should definitely launch.

Net Conversion

In [43]:
print('N_cont = %i'%control.dropna().Clicks.sum())
print('X_cont = %i'%control.dropna().Payments.sum())
print('N_exp =  %i'%experiment.dropna().Clicks.sum())
print('X_exp =  %i'%experiment.dropna().Payments.sum())
N_cont = 17293
X_cont = 2033
N_exp =  17260
X_exp =  1945
In [5]:
X_exp/N_exp
Out[5]:
0.198319814600232
In [6]:
X_cont/N_cont
Out[6]:
0.218874689180593
In [11]:
#%%R
N_cont = 17293
X_cont = 2033
N_exp = 17260
X_exp = 1945

observed_diff = X_exp/N_exp - X_cont/N_cont
# print(observed_diff)
p_pool = (X_cont+X_exp)/(N_cont+N_exp)
SE = sqrt( (p_pool*(1-p_pool)) * ((1/N_cont) + (1/N_exp)))
ME = 1.96 * SE
# print(p_pool)
c(observed_diff-ME, observed_diff+ME)
Out[11]:
  1. -0.0116046243598917
  2. 0.00185717901080338
In [12]:
observed_diff
Out[12]:
-0.00487372267454417

Sign Test

  • Did not use Bonferroni correction
  • Gross Conversion
    • p-value = 0.0026
    • Statistical Significance? Yes
  • Net Conversion
    • p-value = 0.6776
    • Statistical Significance? No
In [1]:
compare_prob = lambda col: ((control.dropna()[col] / control.dropna().Clicks) <
                            (experiment.dropna()[col]/experiment.dropna().Clicks))
                                   

Gross Conversion

Count the gross conversion, I got,

In [4]:
compare_prob('Enrollments').value_counts()
Out[4]:
False    19
True      4
dtype: int64

Net Conversion

In [5]:
compare_prob('Payments').value_counts()
Out[5]:
False    13
True     10
dtype: int64

I got p-value of 0.6776 for Net Conversion.

Conclusion

  • Not use Benferroni correction.
  • Gross Conversion need significant but Net Conversion doesn't.

Recommendation

  • Gross Conversion: pass
  • Net Conversion: somehow pass, can loss potential money

  • decision: risky. delay for further experiment or cancel the launch.

Follow-Up Experiment

  • Not necessary to show warning
  • Start Debt Program
  • Risky, users break agreement
    • Not become Udacity Code Reviewer
    • Cancel in midway program
  • Hypothesis
    • Non-serious users become more committed after incentive
    • Number of users who cancel early is reduced
    • Boost compared to already committed

We can use Invariant metrics for this experiment for the follow-up:

  • Number of cookies: That is, number of unique cookies to view the course overview page.
  • Number of clicks: That is, number of unique cookies to click the "Start free trial" button (which happens before the free trial screener is trigger).
  • Click-through-probability: That is, number of unique cookies to click the "Start free trial" button divided by number of unique cookies to view the course overview page
  • Gross conversion: That is, number of user-ids to complete checkout and enroll in the free trial divided by number of unique cookies to click the "Start free trial" button.

And the evaluation metric:

  • Debt Conversion: That is, number of user-ids to click “Start Debt Program” divided by number of user-ids that enroll in the free trial.
  • Debt-Net conversion: That is, number of user-ids to click “Start Debt Program” divided by number of user-ids to remain enrolled past the 14-day boundary (and thus make at least one payment)
  • Net conversion: That is, number of user-ids to remain enrolled past the 14-day boundary (and thus make at least one payment) divided by the number of unique cookies to click the "Start free trial" button.

We use user-ids as unit of diversion, expect all of the evaluation metrics to be practically significant.