Description
(I am creating an issue to a PR already opened (#31643), because there are many more ways to solve the problem.)
AI tools many people use to create PRs don't care about our Automated Contributions Policy.
Since GitHub Copilot Coding Agent Has Arrived! and people build Github-MCPs that can be integrated with LLM clients, scikit-learn and other open source projects get an increasing amount of AI spam. Many people who care about open source are unhappy about it and request an option to block AI-generated PRs and issues on their projects (Allow us to block Copilot-generated issues (and PRs) from our own repositories) - so far without success.
You can see that there is an increasing amount of partially or fully generated PRs and a decrease in overall quality for PRs on scikit-learn by looking at the last closed PRs (as of June 30th 2025). It is not a flood yet, but bad enough to keep several maintainers busy for some extra hours a week. It could become a flood in the future. This is why it is important to find solutions.
Quite some of the authors of these additional low-quality PRs on scikit-learn also spam llm-based PRs on other open source projects at the same time. I have added repeated cases to @adrinjalali's agents-to-block folder. The pattern of spammers is to open a PR with an unqualified guess of what the project needs or how an issue can be solved, and then not follow up after maintainers reviewed, close and try again.
PRs can look like someone made a genuine attempt to address an open issue, and project maintainers start to interact with the "authors" - but then their review comments are processed by an agent (not manually) and the PR never improves after a certain point.
Many authors of automated PRs did not invest a lot of time into it and don't have enough skin in the game to care.
This creates a huge burden for reviewers because we cannot see if a PR or issue is AI generated and there is no way to foresee how much manual work and thought their "authors" are willing to invest after their llm's first guess wasn't quite helpful.
Reviewers are forced to chose between several options:
a) being helpful to an alleged newbe who makes mistakes but wants to learn (only that they are not),
b) writing our review comments like prompts for someone's llm,
c) closing the PR by writing down a polite-to-humans closing message with a reasoning on why this PR doesn't measure up quality-wise or is not needed, even though the human author might hardly care or might not even read it
Reviewers can loose time on trying option a and b before before closing a PR.
This takes reviewer's attention from PRs that are thoughtful which is unfair to human contributors that create thoughtful PRs and it is ultimately harmful to the whole project.
###############
The issue to solve is to find out how to protect against the effects of AI spam, mark AI based PRs and issues (so reviewers know what they deal with), most efficiently block spam-authors, and discourage LLM agents to automatically open PRs in scikit-learn.
I really think there should be a process-oriented way to deal with this problem. Ideally, we can find technical solutions.
It is not enough to let reviewers deal with on a one-to-one basis.