Steps to Completing an A/B Test
Whether you’re launching a new product, changing a website design, or testing a new price, an AB test can help you make a decision
with confidence when you don’t have a lot of data. Below is a useful, high level guide to designing and running an AB test.
Select experimental design
Select performance metric Matched pair - when the sample size is small and/or the
It’s important to understand the metric used to evaluate data is difficult to collect, a matched pair experiment
the results of the test. Whether the goal is to increase should be used.
sales, profit, conversion rate, etc., this should be specified Randomized design - when the sample size is large and the
at the upfront. data is easy to collect, then a randomized experiment
should be used. Randomized experiments are very
common for web-based AB tests.
Select experimental and control variables
Select the treatment and control units Experimental variable - The experimental, or treatment,
variable, is the variable that is different between treatment
Each individual in the test is considered a unit. The unit can
and control units.
be a person, store, etc. In a test, units are split into two
groups, the treatment group and control group. Control Variables - The control variables are the variables
that should remain constant between test and control
Treatment and control units are compared against each
groups. These ensure that the treatment and control
other
groups are representative of each other and that the
results will apply to the population.
Determine sample size and test duration Clean and prepare data
These contribute most directly to statistical significance.
Clean and filter the data appropriately. This could mean
You can improve statistical significance by either increasing
filtering for the dates of the test, ensuring there are no
the sample size or test duration.
duplicate records, removing records with incomplete data,
Duration: Generally the duration of a test should be at least etc.
as long enough to capture a representative group. If users
Remove outliers. Make sure that any approach to remove
generally visit a store or website once a week, then the test
outliers is unbiased.
duration should be a week.
Calculate lift Determine statistical significance
Compare the average performance between the two Performing a t-test gives a p-value. If the p-value is below
groups 0.05, the results are considered statistically significant.
It can also be useful to understand the distribution of the Use a paired t-test for matched pair experiments.
performance of the units Use an unpaired t-test for randomized experiments.
Estimate impact of broad implementation
In order to provide an expected impact of broad
implementation of the treatment, apply the lift calculation
to the entire population.