Picture this: you set up your experiment to have 50% of users on variant A and 50% on variant B. The experiment is over and you have 44,000 users in A, and 45,000 in B. Is this a problem?
Most likely. If your assignment worked properly, the probability of seeing this imbalance (or a larger one) is less than 0.1%.
When the difference between the ratios is significant, you have a Sample Ratio Mismatch (SRM). Unless you understand why it happened, you should not analyse the results of the experiment; your setup may be flawed, invalidating any conclusions.
- Precise: less than 0.01% deviation from Python's scipy's Chi Square calculations
- Unlimited variants: can be used for A/B/n tests
- Custom ratios: you didn't run a 50-50 split? No problem!
- Share your results with colleagues through custom links
- Private: no accounts, no tracking, works offline, and no data is sent to any server
This tool uses the Chi-Squared Goodness-of-Fit test to detect Sample Ratio Mismatch (SRM). Here's the process:
-
You provide:
- Observed counts: The actual number of users in each experiment variant.
- Expected distribution: How you intended to split traffic (e.g., 50/50, or a custom ratio like 60/40).
-
The calculator then:
- Determines the expected user counts for each variant based on your input.
- Calculates the Chi-Square statistic (
$\chi^2$ ). This value measures how much your observed counts ($O$ ) deviate from the expected ones ($E$ ). The formula is:$\chi^2 = \sum \frac{(O - E)^2}{E}$ .- Derives a p-value using the regularised incomplete gamma function. The degrees of freedom (
$df$ ) are calculated as (number of variants - 1). The p-value is then:$p\text{-value} = 1 - P(\frac{df}{2}, \frac{\chi^2}{2})$ - For smaller chi-square values: series expansion method
- For larger values: continued fraction method
- Derives a p-value using the regularised incomplete gamma function. The degrees of freedom (
-
Interpretation:
- The p-value tells you the probability of seeing your observed user distribution (or one even more imbalanced) if the traffic splitting was actually working as intended.
-
p < 0.01: Strong evidence of SRM. -
0.01 ≤ p < 0.05: Possible SRM. -
p ≥ 0.05: No significant evidence of SRM.
-
- The p-value tells you the probability of seeing your observed user distribution (or one even more imbalanced) if the traffic splitting was actually working as intended.
Please do! I'd appreciate bug reports, improvements (however minor), suggestions…
The calculator uses vanilla JavaScript, HTML, and CSS. To run locally:
- Clone the repository:
git clone https://github.com/welpo/srm.git - Navigate to the project directory:
cd srm - Start a local server:
python3 -m http.server - Visit
http://localhost:8000in your browser
The important files are:
index.html: Basic structurestyles.css: Stylesapp.js: Logictests.js: Tests, generated withsrm/srm_test_generator.py(add?testto the URL to run validation tests)
Something not working? Have an idea? Let me know!
- Questions or ideas → Start a discussion
- Found a bug? → Report it here
- Feature request? → Let me know
This SRM calculator is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.