- Start free trial to allow paid version
- Access course materials for free

- users see this popup after click
- if they're not committed, warning them.

We have two initial hypothesis.

- “..this might set clearer expectations for students upfront, thus reducing the number of frustrated students who left the free trial because they didn't have enough time”
- ”..without significantly reducing the number of students to continue past the free trial and eventually complete the course. “

Unit of diversion: cookie

- users in free trial tracked by user-id
- same user-id can't enroll in free trial twice
- users not enroll can't tracked.

- Number of cookies
- Number of clicks
- Click through probability

- Gross Conversion
- Net Conversion

We have two initial hypothesis.

- “..this might set clearer expectations for students upfront, thus reducing the number of frustrated students who left the free trial because they didn't have enough time”
- ”..without significantly reducing the number of students to continue past the free trial and eventually complete the course. “

- Gross Conversion: 0.02
- Net Conversion 0.0156

Expect analytical variance match empirical variance because unit of analysis and unit of diversion is same.

To calculate standard deviation, we use this formula

```
Formula = np.sqrt(p * (1-p) / n)
```

and using baseline data below.

In [34]:

```
baselines= """Unique cookies to view page per day: 40000
Unique cookies to click "Start free trial" per day: 3200
Enrollments per day: 660
Click-through-probability on "Start free trial": 0.08
Probability of enrolling, given click: 0.20625
Probability of payment, given enroll: 0.53
Probability of payment, given click: 0.1093125"""
lines = baselines.split('\n')
d_baseline = dict([(e.split(':\t')[0],float(e.split(':\t')[1])) for e in lines])
```

In [79]:

```
n = 5000
n_click = n * d_baseline['Click-through-probability on "Start free trial"']
n_click
```

Out[79]:

400.0

Next, standard deviation for Gross conversion is

In [78]:

```
p = d_baseline['Probability of enrolling, given click']
round(np.sqrt(p * (1-p) / n_click),4)
```

Out[78]:

0.0202

and for Net Conversion,

In [77]:

```
p = d_baseline['Probability of payment, given click']
round(np.sqrt(p * (1-p) / n_click),4)
```

Out[77]:

0.0156

- Gross Conversion. Baseline: 0.20625 dmin: 0.01 = 25.839 cookies who clicks.
- Net Conversion. Baseline: 0.1093125 dmin: 0.0075 = 27,411 cookies who clicks.
- Not using Bonferroni correction.
- Using alpha = 0.05 and beta 0.2

The pageviews needed then will be: 685275 impression.

We feed it into sample size calculator.

We can use bigger number, so the minimum required cookies is sufficient. The sample size is only for one group, so output from the calculator must be doubled to get the enough pageviews. Since this only the user who clicks, we calculate number of pageviews using CTP. The pageviews needed then will be:

In [89]:

```
(27411 * 2) / d_baseline['Click-through-probability on "Start free trial"']
```

Out[89]:

685275.0

- Fraction: 0.8 (
*Low risk*) - Duration: 22 days (
*40000 pageviews/day*)

Number of Cookies:

- Bounds = (0.4988,0.5012)
- Observed = 0.5006
- Passes? Yes

Number of clicks on “Start free trial”:

- Bounds = (0.4959,0.5041)
- Observed = 0.5005
- Passes? Yes

Click-through-probability on “Start free trial”:

- Bounds = (0.0812,0.0830)
- Observed = 0.0821
- Passes? Yes

Since we have passed all of the sanity checks, we can continue to analyze the experiment.

In [4]:

```
control = pd.read_csv('control_data.csv')
experiment = pd.read_csv('experiment.csv')
```

In [5]:

```
control.head()
```

Out[5]:

Date | Pageviews | Clicks | Enrollments | Payments | |
---|---|---|---|---|---|

0 | Sat, Oct 11 | 7723 | 687 | 134 | 70 |

1 | Sun, Oct 12 | 9102 | 779 | 147 | 70 |

2 | Mon, Oct 13 | 10511 | 909 | 167 | 95 |

3 | Tue, Oct 14 | 9871 | 836 | 156 | 105 |

4 | Wed, Oct 15 | 10014 | 837 | 163 | 64 |

In [6]:

```
experiment.head()
```

Out[6]:

Date | Pageviews | Clicks | Enrollments | Payments | |
---|---|---|---|---|---|

0 | Sat, Oct 11 | 7716 | 686 | 105 | 34 |

1 | Sun, Oct 12 | 9288 | 785 | 116 | 91 |

2 | Mon, Oct 13 | 10480 | 884 | 145 | 79 |

3 | Tue, Oct 14 | 9867 | 827 | 138 | 92 |

4 | Wed, Oct 15 | 9793 | 832 | 140 | 94 |

Next, we count the total views and clicks for both control and experiment groups.

In [38]:

```
control_views = control.Pageviews.sum()
control_clicks = control.Clicks.sum()
experiment_views = experiment.Pageviews.sum()
experiment_clicks = experiment.Clicks.sum()
```

In [7]:

```
def sanity_check_CI(control,experiment,expected):
SE = np.sqrt((expected*(1-expected))/(control + experiment))
ME = 1.96 * SE
return (expected-ME,expected+ME)
```

Now for sanity checks confidence interval of number of cookies who views the page,

In [42]:

```
sanity_check_CI(control_views,experiment_views,0.5)
```

Out[42]:

(0.49882039214902313, 0.50117960785097693)

The actual proportion is

In [60]:

```
float(control_views)/(control_views+experiment_views)
```

Out[60]:

0.5006396668806133

Since we know that 0.5006 is within the interval, then experiment pass sanity checks for number of cookies.

Next, we calculate confidence interval of number of clicks at "Start free trial" button.

In [44]:

```
sanity_check_CI(control_clicks,experiment_clicks,0.5)
```

Out[44]:

(0.49588449572378945, 0.50411550427621055)

And the actual proportion,

In [61]:

```
float(control_clicks)/(control_clicks+experiment_clicks)
```

Out[61]:

0.5004673474066628

Again 0.5006 is within the interval, so our experiment also pass the sanity check.

In [21]:

```
ctp_control = float(control_clicks)/control_views
ctp_experiment = float(experiment_clicks)/experiment_views
```

In [3]:

```
# %%R
c = 28378
n = 345543
CL = 0.95
pe = c/n
SE = sqrt(pe*(1-pe)/n)
z_star = round(qnorm((1-CL)/2,lower.tail=F),digits=2)
ME = z_star * SE
c(pe-ME, pe+ME)
```

Out[3]:

- 0.0812103597525297
- 0.0830412673966239

In [4]:

```
ctp_experiment
```

Out[4]:

0.082125813574576823

- Did not use Bonferroni correction
- Gross Conversion
- Bounds = (-0.0291, -0.0120)
- Statistical Significance? Yes
- Practical Significance? Yes

- Net Conversion
- Bounds = (-0.0116,0.0019)
- Statistical Significance? No
- Practical Significance? No

In [29]:

```
get_gross = lambda group: float(group.dropna().Enrollments.sum())/ group.Clicks.sum()
get_net = lambda group: float(group.dropna().Payments.sum())/ group.Clicks.sum()
```

Keep in mind that observed_difference can be negative

In [40]:

```
print('N_cont = %i'%control.dropna().Clicks.sum())
print('X_cont = %i'%control.dropna().Enrollments.sum())
print('N_exp = %i'%experiment.dropna().Clicks.sum())
print('X_exp = %i'%experiment.dropna().Enrollments.sum())
```

N_cont = 17293 X_cont = 3785 N_exp = 17260 X_exp = 3423

In [3]:

```
X_exp/N_exp
```

Out[3]:

0.198319814600232

In [4]:

```
X_cont/N_cont
```

Out[4]:

0.218874689180593

In [1]:

```
#%%R
N_cont = 17293
X_cont = 3785
N_exp = 17260
X_exp = 3423
observed_diff = X_exp/N_exp - X_cont/N_cont
# print(observed_diff)
p_pool = (X_cont+X_exp)/(N_cont+N_exp)
SE = sqrt( (p_pool*(1-p_pool)) * ((1/N_cont) + (1/N_exp)))
ME = 1.96 * SE
# print(p_pool)
c(observed_diff-ME, observed_diff+ME)
```

Out[1]:

- -0.0291233583354044
- -0.0119863908253187

In [2]:

```
observed_diff
```

Out[2]:

-0.0205548745803616

In [43]:

```
print('N_cont = %i'%control.dropna().Clicks.sum())
print('X_cont = %i'%control.dropna().Payments.sum())
print('N_exp = %i'%experiment.dropna().Clicks.sum())
print('X_exp = %i'%experiment.dropna().Payments.sum())
```

N_cont = 17293 X_cont = 2033 N_exp = 17260 X_exp = 1945

In [5]:

```
X_exp/N_exp
```

Out[5]:

0.198319814600232

In [6]:

```
X_cont/N_cont
```

Out[6]:

0.218874689180593

In [11]:

```
#%%R
N_cont = 17293
X_cont = 2033
N_exp = 17260
X_exp = 1945
observed_diff = X_exp/N_exp - X_cont/N_cont
# print(observed_diff)
p_pool = (X_cont+X_exp)/(N_cont+N_exp)
SE = sqrt( (p_pool*(1-p_pool)) * ((1/N_cont) + (1/N_exp)))
ME = 1.96 * SE
# print(p_pool)
c(observed_diff-ME, observed_diff+ME)
```

Out[11]:

- -0.0116046243598917
- 0.00185717901080338

In [12]:

```
observed_diff
```

Out[12]:

-0.00487372267454417

- Did not use Bonferroni correction
- Gross Conversion
- p-value = 0.0026
- Statistical Significance? Yes

- Net Conversion
- p-value = 0.6776
- Statistical Significance? No

In [1]:

```
compare_prob = lambda col: ((control.dropna()[col] / control.dropna().Clicks) <
(experiment.dropna()[col]/experiment.dropna().Clicks))
```

Count the gross conversion, I got,

In [4]:

```
compare_prob('Enrollments').value_counts()
```

Out[4]:

False 19 True 4 dtype: int64

In [5]:

```
compare_prob('Payments').value_counts()
```

Out[5]:

False 13 True 10 dtype: int64

I got p-value of 0.6776 for Net Conversion.

- Not use Benferroni correction.
- Gross Conversion need significant but Net Conversion doesn't.

- Gross Conversion: pass
Net Conversion: somehow pass, can loss potential money

decision: risky. delay for further experiment or cancel the launch.

- Not necessary to show warning
- Start Debt Program
- Risky, users break agreement
- Not become Udacity Code Reviewer
- Cancel in midway program

- Hypothesis
- Non-serious users become more committed after incentive
- Number of users who cancel early is reduced
- Boost compared to already committed

We can use Invariant metrics for this experiment for the follow-up:

**Number of cookies**: That is, number of unique cookies to view the course overview page.**Number of clicks**: That is, number of unique cookies to click the "Start free trial" button (which happens before the free trial screener is trigger).**Click-through-probability**: That is, number of unique cookies to click the "Start free trial" button divided by number of unique cookies to view the course overview page**Gross conversion**: That is, number of user-ids to complete checkout and enroll in the free trial divided by number of unique cookies to click the "Start free trial" button.

And the evaluation metric:

**Debt Conversion**: That is, number of user-ids to click “Start Debt Program” divided by number of user-ids that enroll in the free trial.**Debt-Net conversion**: That is, number of user-ids to click “Start Debt Program” divided by number of user-ids to remain enrolled past the 14-day boundary (and thus make at least one payment)**Net conversion**: That is, number of user-ids to remain enrolled past the 14-day boundary (and thus make at least one payment) divided by the number of unique cookies to click the "Start free trial" button.