statsmodels · victormvy · May 28, 2025 · May 29, 2025 · May 30, 2025
diff --git a/examples/notebooks/stats_tukey_test.ipynb b/examples/notebooks/stats_tukey_test.ipynb
@@ -0,0 +1,312 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "d5bb79cd",
+   "metadata": {},
+   "source": [
+    "# Tukey's multiple comparisons tests\n",
+    "\n",
+    "The Tukey Honest Significant Difference (HSD) test is a widely used post hoc analysis method for identifying which specific group means differ significantly after obtaining a statistically significant result from an ANOVA. While ANOVA can tell us that at least one group is different, it doesn't indicate which ones. The pairwise_tukeyhsd function in statsmodels fills this gap by performing all possible pairwise comparisons between group means while controlling the family-wise error rate. This makes it especially valuable when analysing categorical factors with three or more levels, allowing for detailed interpretation of group differences while maintaining statistical rigour."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3a1bf6a0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from IPython.display import display, Markdown\n",
+    "from statsmodels.stats.multicomp import pairwise_tukeyhsd\n",
+    "from statsmodels.formula.api import ols\n",
+    "from statsmodels.stats.anova import anova_lm\n",
+    "import numpy as np"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "80ee232d",
+   "metadata": {},
+   "source": [
+    "## Generate random balanced data\n",
+    "\n",
+    "First, we will generate synthetic data by sampling from normal distributions. Specifically, we will create 8 groups, each containing 30 samples drawn from a normal distribution with a fixed variance of 1. The group means will vary to ensure that some groups differ significantly from others, allowing us to demonstrate the utility of post hoc analysis."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5aa208b7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data = []\n",
+    "groups = []\n",
+    "locs = [0, 0.1, 1, 1.2, 1.5, 1.7, 2.0, 3.0]\n",
+    "np.random.seed(0)\n",
+    "for i in range(8):\n",
+    "    group_data = np.random.normal(loc=locs[i], scale=1.0, size=30)\n",
+    "    data.extend(group_data)\n",
+    "    groups.extend([f'Group {i+1}'] * 30)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6bc47f10",
+   "metadata": {},
+   "source": [
+    "## ANOVA I test\n",
+    "\n",
+    "Before applying the Tukey HSD test, we will first perform a one-way ANOVA to determine whether there are any statistically significant differences among the group means. ANOVA serves as an initial global test, assessing whether at least one group differs from the others. If the ANOVA result is significant, we can proceed with the Tukey HSD test to identify which specific pairs of groups are significantly different."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "40c9bb58",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "anova_model = ols('value ~ C(group)', data={'value': data, 'group': groups}).fit()\n",
+    "anova_results = anova_lm(anova_model, typ=2)\n",
+    "display(anova_results)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3aeed8a9",
+   "metadata": {},
+   "source": [
+    "## Pairwise Tukey HSD test\n",
+    "\n",
+    "Since our factor has more than two levels, a significant ANOVA result only tells us that at least one group differs from the others, but not which ones. To investigate these differences in more detail, we will use the Tukey HSD test, which performs all pairwise comparisons between group means while controlling for the increased risk of Type I error due to multiple testing. This allows us to determine exactly which group pairs show statistically significant differences."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "35609eb9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "alpha = 0.05\n",
+    "\n",
+    "tukey = pairwise_tukeyhsd(data, groups, alpha=alpha)\n",
+    "print(tukey)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c65b21c4",
+   "metadata": {},
+   "source": [
+    "### Tukey summary frame\n",
+    "\n",
+    "Using `tukey.summary_frame()` we can display a dataframe with the result of all the pairwise comparisons."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4c29239c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display(tukey.summary_frame())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0cb787c8",
+   "metadata": {},
+   "source": [
+    "### Tukey Homogeneous Subsets\n",
+    "\n",
+    "Also, using `tukey.create_homogeneous_subsets_dataframe()`, we can display a summary table of the test results.\n",
+    "\n",
+    "This function constructs a DataFrame that groups factor levels into **homogeneous subsets** based on the results of the Tukey HSD test. Groups are considered part of the same subset if their pairwise differences are **not statistically significant** (i.e., *p* > alpha). Each group appears once in the table, with its mean shown under every subset to which it belongs. At the bottom of the table, a **\"min p-value\"** row shows the smallest p-value among all pairwise comparisons within each subset. This provides a clear visual summary of which groups are statistically similar, helping to interpret the post hoc results at a glance.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f7afb083",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "hs_df = tukey.create_homogeneous_subsets_dataframe()\n",
+    "display(hs_df.fillna('')) # Fill NaN with empty strings for better display"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "67a30aee",
+   "metadata": {},
+   "source": [
+    "### Tukey simultaneous plot\n",
+    "\n",
+    "To visualise group mean differences in a more interpretable way, we can use the `plot_simultaneous()` method to create a **universal confidence interval plot**. Instead of displaying all pairwise confidence intervals (which can be overwhelming with many groups), this method shows a **single confidence interval for each group mean**.\n",
+    "\n",
+    "This approach, originally proposed by Hochberg and Tamhane (1987), uses Tukey's Q critical value to calculate interval widths, allowing us to compare any two group means visually. If the confidence intervals of two groups **do not overlap**, the difference between them is statistically significant."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4075aa0b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "tukey.plot_simultaneous()\n",
+    "pass"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f4ab9ff",
+   "metadata": {},
+   "source": [
+    "## Generate random unbalanced data\n",
+    "\n",
+    "Now we can repeat the same example with unbalanced data (i.e. different sample size for each group)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "58dea005",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data = []\n",
+    "groups = []\n",
+    "locs = [0, 0.1, 1, 1.2, 1.5, 1.7, 2.0, 3.0]\n",
+    "sizes = [10, 8, 12, 50, 20, 11, 60, 15]\n",
+    "np.random.seed(0)\n",
+    "for i in range(8):\n",
+    "    group_data = np.random.normal(loc=locs[i], scale=1.0, size=sizes[i])\n",
+    "    data.extend(group_data)\n",
+    "    groups.extend([f'Group {i+1}'] * sizes[i])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0f9bf711",
+   "metadata": {},
+   "source": [
+    "## ANOVA I test"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f1e6d002",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "anova_model = ols('value ~ C(group)', data={'value': data, 'group': groups}).fit()\n",
+    "anova_results = anova_lm(anova_model, typ=2)\n",
+    "display(anova_results)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1791e15d",
+   "metadata": {},
+   "source": [
+    "## Pairwise Tukey HSD test"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b499e1b7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "alpha = 0.05\n",
+    "\n",
+    "tukey = pairwise_tukeyhsd(data, groups, alpha=alpha)\n",
+    "print(tukey)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c6232074",
+   "metadata": {},
+   "source": [
+    "### Tukey summary frame"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2d68b993",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display(tukey.summary_frame())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "93979fe9",
+   "metadata": {},
+   "source": [
+    "### Tukey Homogeneous Subsets"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fb83eee3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "hs_df = tukey.create_homogeneous_subsets_dataframe()\n",
+    "display(hs_df.fillna('')) # Fill NaN with empty strings for better display"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4bf77d3e",
+   "metadata": {},
+   "source": [
+    "### Tukey simultaneous plot"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2dde4236",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "tukey.plot_simultaneous()\n",
+    "pass"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "statsmodels-dev",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.13.2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}