Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@victormvy
Copy link

@victormvy victormvy commented May 28, 2025

I have added the method create_homogeneous_subsets_dataframe to the TukeyHSDResults class, which is the type of object returned by the pairwise_tukeyhsd test. This method summarises the results of Tukey's HSD test by constructing a DataFrame that groups factor levels into homogeneous subsets—sets of groups whose pairwise differences are not statistically significant (i.e., p > alpha). Each group appears only once in the table, with its mean value displayed under each subset it belongs to. A final row, labelled "min p-value", shows the smallest p-value among all comparisons within each subset. This table offers a concise and intuitive visual summary of which groups are statistically similar, making it easier to interpret the results of the post hoc analysis.

I have also included a test for the new method and added an example notebook that demonstrates how to use Tukey’s HSD test, the newly implemented create_homogeneous_subsets_dataframe method, and the existing plot_simultaneous method. This notebook provides a complete workflow for performing post hoc analysis and visualising group differences effectively.

  • tests added / passed.
  • code/documentation is well formatted.
  • properly formatted commit message. See NumPy's guide.
Details

Notes:

  • It is essential that you add a test when making code changes. Tests are not
    needed for doc changes.
  • When adding a new function, test values should usually be verified in another package (e.g., R/SAS/Stata).
  • When fixing a bug, you must add a test that would produce the bug in main and
    then show that it is fixed with the new code.
  • New code additions must be well formatted. Changes should pass flake8. If on Linux or OSX, you can
    verify you changes are well formatted by running
    git diff upstream/main -u -- "*.py" | flake8 --diff --isolated
    
    assuming flake8 is installed. This command is also available on Windows
    using the Windows System for Linux once flake8 is installed in the
    local Linux environment. While passing this test is not required, it is good practice and it help
    improve code quality in statsmodels.
  • Docstring additions must render correctly, including escapes and LaTeX.

@josef-pkt
Copy link
Member

Do you have a reference?
What's the basis for create_homogeneous_subsets_dataframe? or where does it come from?

Is this similar to #9493?

Each group appears only once in the table

I need to look at how this works here. But in general, groupings have overlapping groups.
(IIRC, with homogeneous variance assumption we have overlapping lines. If variances are allowed to differ, then groupings do not necessarily include consecutive unites (sorted by mean). In both cases groups can overlap.)

BTW:
looks like one docstring got incorrectly indented.
And don't blacken my modules (those that are only maintained by myself). I don't like dedented closing parenthesis. (It destroys python's block structure.)

Thanks for the PR.

@victormvy
Copy link
Author

Thank you for the feedback!

Regarding the create_homogeneous_subsets_dataframe method — the idea for this table is inspired by how homogeneous subsets are commonly presented in statistical software like SPSS, where groups that are not significantly different (i.e., p > alpha) are grouped into subsets following a post hoc test after ANOVA. I found this summary format helpful in several past projects (done in SPSS), especially when dealing with a large number of groups, where reading through a long list of pairwise p-values becomes difficult. The goal is to make Tukey HSD results easier to interpret by providing a more visual and concise summary.

You're absolutely right that groupings can overlap, and the method supports that. It creates one row per group and one column per subset. A group may appear in multiple subsets, and in those cases its mean is shown in more than one column. For example, in the table below:

Group S1 S2 S3 S4 S5
Group 2 -0.189
Group 1 0.442 0.442
Group 3 0.866 0.866
Group 5 1.596 1.596
Group 6 1.632 1.632
Group 4 1.710
Group 7 1.899
Group 8 2.922
min p-value 0.198 0.701 0.053 0.931 1.000

Here, each cell shows the group mean if the group belongs to the subset. The last row, min p-value, shows the smallest p-value among all pairwise comparisons within that subset, giving a sense of how "homogeneous" the subset is (higher values suggest greater similarity among its members).

Regarding #9493, I agree it uses a similar conceptual idea of homogeneous subsets, but with a different representation. I'll take a closer look in case there's overlap or opportunity for integration.

As for style, I'll fix the mis-indented docstring and will revert any Black formatting in modules you maintain. Thanks for the clarification.

Let me know if you see a better place for this functionality or any needed adjustments to align with statsmodels conventions.

@josef-pkt
Copy link
Member

good
from this, I think you have the lines grouping (as in SAS, I don't remember looking at an SPSS version)

The letter display in my PR is more general because it also applies to Games-Howell (or similar) with unequal variances across groups.

In terms of interface and result format.
Your table with means looks good standalone.
AFAIR, lines or letters are displayed as additional columns to a table or dataframe that already includes the summary statistics including mean (and confint, ...)

I will look for my notebook with examples for the letter display.
Then, we can look at how to integrate this.

@josef-pkt
Copy link
Member

looking at parts of your pr code. AFAICS:

find_subsets checks all possible subsets based on the reject decision matrix.
There is no assumption on any prior ordering as in the lines display.
So they outcome will not have consecutive group members (when sorted by means). It should be similar to my PR and the article that I used.

(I guess the all subset algorithm will be slower than the iterative algorithm when we have a large number of groups.)

The example and test case needs unbalanced groups, i.e. different group sizes so that the standard errors of the means are not all the same.

@josef-pkt
Copy link
Member

here is my dirty notebook https://gist.github.com/josef-pkt/183dd4b6cc04429385725f502c578b39
I used it 4 months ago when I worked on letter display again, so it's not cleaned up.
It has test cases from examples in the original article.
https://gist.github.com/josef-pkt/75f0ca778b03c34f70b0a4771bac076c is a notebook with full games howell example.
results dataframe with letters starting at cell 45

I got stuck at the end in finding a sorting option, how to define letters and in creating the full results table by integrating it with tukey-hsd and games-howell.
(The grouping will eventually also need to be connected to the model results test_wald_pairwise.)

@victormvy
Copy link
Author

I've included a test for unbalanced data (i.e. different sample sizes), and you're right, there are some issues in this case. You can check the notebook here: https://gist.github.com/victormvy/25cbf2dc11d2706ccd3e5e8a6e86e55e (cell 47).

I also ran the same test in SPSS to compare the results, and here's what I got:

image

Not only is the ordering messed up, but the p-values are also different.

So it seems SPSS handles different sample sizes just fine. I'll need to look into how it manages that. But yeah, it looks like we're facing the same problem.

@victormvy
Copy link
Author

I am checking the full Tukey test table from SPSS and I found something weird. I'm not a statistician, so maybe I'm missing something, but...

For example, groups 2 and 4 are both included in subset 2. However, according to the Tukey test table, the p-value for the comparison between groups 2 and 4 is below alpha = 0.05. Meanwhile, the p-value between groups 2 and 5 is 0.051. So shouldn't groups 2 and 4 be in different subsets, and groups 2 and 5 be in the same one?

My other concern is that the significance levels shown in the homogeneous subsets table don't appear anywhere in the Tukey pairwise comparison table. So it doesn't look like they're just the minimum p-values from the pairwise comparisons within each subset. If it were the minimum p-value, it would match the p-values in our table.

I'm a bit lost 😕

image

@josef-pkt
Copy link
Member

josef-pkt commented May 30, 2025

see footnote b in the SPSS groupings table

I guess SPSS assumes equal variances and equal sample sizes in computing the lines grouping`. (using harmonic mean for the actual group sizes).

The point of the article underlying my PR was that their "letter" display is correct even if standard errors of means differ.
In the lines display we have overlapping intervals of means, which breaks if standard errors differ.

Old statistical methods. For example, standard 2-way anova or, IIRC, repeated measures anova only works for balanced samples (i.e. equal variances and equal cell sizes).
(Much of my reading material for statsmodels is to figure out which methods to add that do not rely on very restrictive assumptions.)

update
https://community.ibm.com/community/user/discussion/multiple-comparisons-vs-homogeneous-subsets-in-spss-oneway-anova
sounds like harmonic mean group size is for each pair, and not total harmonic average over all groups.
(I'm not sure that really implies that groupings are intervals. maybe there is also another algorithmic assumption)

@victormvy
Copy link
Author

Yes, I think SPSS must be doing something additional under the hood when generating the homogeneous subsets table. In fact, the pairwise comparisons table matches ours, but the homogeneous subsets table, especially the p-values it shows, does not.

Also, is the Tukey test still reliable when dealing with unbalanced data? I wonder whether the discrepancies we're seeing might be due to limitations of the method in those cases.

By the way, does it make sense to sort the groups by the first and last subset they appear in? I've been experimenting with that approach, and it seems to produce visually continuous subsets. Of course, when sample sizes differ, the groups aren't sorted by mean. You can see a somewhat hacky implementation of this idea in cell 13: https://gist.github.com/victormvy/25cbf2dc11d2706ccd3e5e8a6e86e55e

@josef-pkt
Copy link
Member

tukey-hsd is robust to unequal sample sizes, I think the reference is to tukey-kramer method.
However, it is not robust to unequal variances across groups.
If both, sample sizes and variances can differ, then games-howell and a few others (that we don't have yet) are correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants