Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add pandas type-completeness blog post#2548

Open
MarcoGorelli wants to merge 6 commits intofacebook:mainfrom
MarcoGorelli:pandas-blogpost
Open

Add pandas type-completeness blog post#2548
MarcoGorelli wants to merge 6 commits intofacebook:mainfrom
MarcoGorelli:pandas-blogpost

Conversation

@MarcoGorelli
Copy link
Contributor

Summary

As discussed, following on from the hackmd document (thanks @javabster for helpful comments!)

Fixes #XXXX

Test Plan

@meta-cla meta-cla bot added the cla signed label Feb 25, 2026
In order to improve the developer experience for pandas' users across the ecosystem, we decided to focus on improving pandas' typing. Why? Because better type hints mean:

- More accurate and useful auto-completions from VSCode / PyCharm / NeoVIM / Positron / other IDEs.
- More robust pipelines, as some categories of bugs can be caught without even needing to execute your code.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also mention the (alleged) LLM benefits?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, thanks - did you have a reference in mind for this?

Copy link

@jorenham jorenham Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The closest I was able to find is https://www.se.cs.uni-saarland.de/conferences/ASE/ase2023/details/ase-2023/ase-2023-papers/12/Generative-Type-Inference-for-Python.html, but I don't think there's anything yet that tests this for modern LLMs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also https://llm-guidelines.org/study-types/, which suggests that structured outputs (which if I understand correctly also includes static typing) is indeed helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks - as far as I can tell, that paper's about using llms to do type inference? if so, not sure if we should cite it for the alleged llm benefits of having typed code

Copy link

@jorenham jorenham Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need even need a citation? I mean; I'm all for being accurate, but in this case I doubt that anyone would question that static typing helps LLMs write better code, seeing as it also helps humans write better code 🤷‍♂️

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I'm assuming here that the types are correct. Because if not, I wouldn't be surprised that LLMs perform worse than if there are no types at all. The same holds for humans, after all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh it's not obvious to me that they would perform better, they hallucincate method names all the time and i find that they often suggest code that which doesn't satisfy type-checkers even in codebases that are fully typed

i'd prefer to leave this out unless we have a reference if it's ok


pandas is one of the most widely used Python libraries. At time of writing, it is [downloaded about half-a-billion times per month from PyPI](https://pypistats.org/packages/pandas), is supported by nearly all Python data science packages, and is generally required learning in data science curriculums. Despite modern alternatives existing, pandas' impact cannot be minimised or understated.

In order to improve the developer experience for pandas' users across the ecosystem, we decided to focus on improving pandas' typing. Why? Because better type hints mean:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should still be more explicit here about who "we" is at the beginning, could you add a clarification, even if its just briefly in brackets? Something like "the team at Quantsight" or "the Quantsight team with support from the Pyrefly team", whatever you feel is appropriate. My main concern is that people coming to the blog on the pyrefly website will assume "we" means just the Pyrefly team


## Beyond Pyright - what about "Pyrefly report"?

Pyright's verifytypes feature takes about 2 and a half minutes to run in pandas-stubs. There's room of improvement here - so much so, that the Pyrefly team is working on a [`pyrefly report`](https://pyrefly.org/en/docs/report/) which would work similarly. The `pyrefly report` API is not yet considered stable, so for now pandas-stubs uses Pyright's `--verifytypes` command, but hopefully a faster is on the horizon!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting: should it be verifytypes? or verify types?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "hopefully a faster is on the horizon!" a faster tool?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting: should it be verifytypes? or verify types?

--verifytypes is correct; pyright --help shows:

Usage: pyright [options] files...
  Options:
  [..]
  --verifytypes <PACKAGE>            Verify type completeness of a py.typed package
  [..]

Copy link
Contributor

@javabster javabster Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant the instance in the first sentence (Pyrights veriftypes feature...) not the --verifytypes one :)

@@ -0,0 +1,76 @@
---
title: pandas' public API is now type-complete!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
title: pandas' public API is now type-complete!
title: Pandas' Public API Is Now Type-Complete!

Please use title case for titles :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they ask that it be used lowercase even at the beginning of a sentece https://pandas.pydata.org/about/citing.html#brand-and-logo

When using the project name pandas, please use it in lower case, even at the beginning of a sentence.

if we're ok going against that in titles, then sure, will do

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh! Thanks for flagging, lets follow their citation guidelines, but the rest of the title should still be title case imho

@MarcoGorelli
Copy link
Contributor Author

thanks for your reviews! 🙏

Copy link
Contributor

@javabster javabster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

but lets wait to merge this until early next week, we already published a blog earlier this week

Comment on lines +43 to +45
- `DataFrame` is reported as "partially unknown" because its method `.index` returns `Index`, which is partially unknown.
- `Index` is reported as "partially unknown" because its method `to_series` returns `Series`, which is partially unknown.
- `Series` is reported as "partially unknown" because its method `to_frame` returns `DataFrame`, which is partially unknown.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is surprising to me / doesn't make a ton of sense to me. DataFrame is unknown because DataFrame is unknown? I'd expect there to be some "Unknown" or "Any" typed attribute or similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've reworked the example so it's clearer, thanks for commenting!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants