Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@booxter
Copy link
Contributor

@booxter booxter commented Feb 6, 2025

I find them not useful and mostly distracting, esp. to new contributors.

Of course, this is not an endorsement to stop carrying about egregious
spelling issues, or where it's important (in user facing docs,
changelog, etc.) Even then, some of these could be handled in particular
moments in release schedule (when prepping a new release cut).

Signed-off-by: Ihar Hrachyshka [email protected]

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the
    conventional commits.
  • Changelog updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Functional tests have been added, if necessary.
  • E2E Workflow tests have been added, if necessary.

@mergify mergify bot added CI/CD Affects CI/CD configuration documentation Improvements or additions to documentation testing Relates to testing dependencies Relates to dependencies labels Feb 6, 2025
@booxter
Copy link
Contributor Author

booxter commented Feb 6, 2025

Note: I don't know if there's some consensus building needed for this to happen, yet.

@booxter booxter marked this pull request as draft February 6, 2025 21:50
@booxter
Copy link
Contributor Author

booxter commented Feb 6, 2025

I couldn't find anything in dev-docs or in community repo that would require to have these checks (though I'm happy to learn otherwise). Hence I think it's for each repo owners to decide.

@booxter booxter marked this pull request as ready for review February 6, 2025 23:24
Copy link
Member

@RobotSail RobotSail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mergify mergify bot added the one-approval PR has one approval from a maintainer label Feb 7, 2025
@nathan-weinberg
Copy link
Member

Personally I'm against it but I won't block if there's a consensus from others

@booxter
Copy link
Contributor Author

booxter commented Feb 7, 2025

@nathan-weinberg should I bring this up to some venue to make sure there's consensus? Not sure which it would be.

@booxter booxter added the hold In-progress PR. Tag should be removed before merge. label Feb 7, 2025
@RobotSail
Copy link
Member

Personally I'm against it

Could you give more reasoning as to why? Currently the spell-checker serves to increase the amount of developer time spent contributing changes as it will fail and then more work is needed to figure out what went wrong, add it to the list of "approved" words, and then retest.

If we're paying a high cost for this, what are we getting in return as a benefit? We will still have typos in our actual codebase, and it doesn't prevent other semantic or grammatical errors from going into the documentation which passed the spellchecker.

@anastasds
Copy link
Contributor

I also find that this wastes more time than it gives value.

@nathan-weinberg
Copy link
Member

Personally I'm against it

Could you give more reasoning as to why? Currently the spell-checker serves to increase the amount of developer time spent contributing changes as it will fail and then more work is needed to figure out what went wrong, add it to the list of "approved" words, and then retest.

If we're paying a high cost for this, what are we getting in return as a benefit? We will still have typos in our actual codebase, and it doesn't prevent other semantic or grammatical errors from going into the documentation which passed the spellchecker.

The spellchecker only is a factor if you are changing Markdown files as it only runs against those. In the instances which I'm doing that, I've never found the difficulty of running make spellcheck locally, adding something to the dictionary if necessary, and running make spellcheck-sort. It's a one minute operation. As for value, simply having things spelled correctly I see as a net benefit with no actual drawback, apart from maybe I had to go in and take that minute to correct my spelling on something or add it to the dictionary.

@nathan-weinberg
Copy link
Member

@nathan-weinberg should I bring this up to some venue to make sure there's consensus? Not sure which it would be.

I think this PR is enough - like I said I won't block on this, just giving my two cents.

@RobotSail
Copy link
Member

The spellchecker only is a factor if you are changing Markdown files as it only runs against those. In the instances which I'm doing that, I've never found the difficulty of running make spellcheck locally, adding something to the dictionary if necessary, and running make spellcheck-sort. It's a one minute operation. As for value, simply having things spelled correctly I see as a net benefit with no actual drawback, apart from maybe I had to go in and take that minute to correct my spelling on something or add it to the dictionary.

But why do we need it? This answer tells me what it is and how it works, but I'm not understanding why it's something we need in CI.

@nathan-weinberg
Copy link
Member

The spellchecker only is a factor if you are changing Markdown files as it only runs against those. In the instances which I'm doing that, I've never found the difficulty of running make spellcheck locally, adding something to the dictionary if necessary, and running make spellcheck-sort. It's a one minute operation. As for value, simply having things spelled correctly I see as a net benefit with no actual drawback, apart from maybe I had to go in and take that minute to correct my spelling on something or add it to the dictionary.

But why do we need it? This answer tells me what it is and how it works, but I'm not understanding why it's something we need in CI.

I don't follow - do we not want words spelled correctly? Like I said, if people find it this difficult/too much time/etc I'm fine to get rid of it, but I feel like it's a pretty straightforward docs check.

You can actually configure it to be more targeted/do spellchecking for code as well (https://github.com/rojopolis/spellcheck-github-actions) - but that wasn't included in the implementation of the action (pre-dated me, it was very early on in the project): #564

@RobotSail
Copy link
Member

I don't follow - do we not want words spelled correctly? Like I said, if people find it this difficult/too much time/etc I'm fine to get rid of it, but I feel like it's a pretty straightforward docs check.

The question is more about - why do we want this as a CI check? What pain point for development does this solve? It sounds like improving developer velocity greatly outweighs the cost of having thier instead of their occasionally show up in a README.md. And if it shows up and someone notices, it's a fun exercise to go fix it.

And if nobody notices, did it even matter in the first place?

@booxter
Copy link
Contributor Author

booxter commented Feb 7, 2025

My take is: I think spelling is important, but also we should apply common sense when enforcing it. (e.g. be more stringent in release notes / changelog; less stringent in docs for contributors.) And that the drawback - worse experience for new contributors, more fragile CI - may overweight the benefit.

If this ends up controversial, I'll abandon. Please speak up. Otherwise, I'll remove the hold from the patch in a week.

@nathan-weinberg
Copy link
Member

I don't follow - do we not want words spelled correctly? Like I said, if people find it this difficult/too much time/etc I'm fine to get rid of it, but I feel like it's a pretty straightforward docs check.

The question is more about - why do we want this as a CI check? What pain point for development does this solve? It sounds like improving developer velocity greatly outweighs the cost of having thier instead of their occasionally show up in a README.md. And if it shows up and someone notices, it's a fun exercise to go fix it.

And if nobody notices, did it even matter in the first place?

CI isn't about solving development pain points - it's about ensuring that when we take a contribution from anyone, from a first-time contributor to a project maintainer, that the codebase - inclusive of documentation - is the highest quality it can be. I already stated why I personally do not find this such a velocity hamperment, given that if you run the local check the CI run is a moot point.

My take is: I think spelling is important, but also we should apply common sense when enforcing it. (e.g. be more stringent in release notes / changelog; less stringent in docs for contributors.) And that the drawback - worse experience for new contributors, more fragile CI - may overweight the benefit.

If this ends up controversial, I'll abandon. Please speak up. Otherwise, I'll remove the hold from the patch in a week.

If I'm the only one who disagrees, please feel free to remove the hold. I think this is a minor enough decision we can go with a simple majority here, no need for unanimous consensus.

@anastasds
Copy link
Contributor

@nathan-weinberg I have had to add to the dictionary on almost every single PR. Things like SSL and discoverability were most recent. Especially as document velocity has been increasing with the (hopefully continually increasing) adoption of ADRs and routine referencing of named technologies, the friction point doesn't seem like it will go away.

I am certain that removing it will not cause a deluge of unreadable text.

@RobotSail
Copy link
Member

CI isn't about solving development pain points - it's about ensuring that when we take a contribution from anyone, from a first-time contributor to a project maintainer, that the codebase - inclusive of documentation - is the highest quality it can be. I already stated why I personally do not find this such a velocity hamperment, given that if you run the local check the CI run is a moot point.

If it's a matter of quality - why is it that most popular open source projects lack spell checkers? For instance, this repository largely builds on technology from HuggingFace and PyTorch which are widely known and respected in the AI community, with thousands of contributors. But I don't see their repos making use of spellcheckers.

I already stated why I personally do not find this such a velocity hamperment, given that if you run the local check the CI run is a moot point.

This claim is simply untrue. Any filter you add on contributions requires effort on the contributor's end to:

  1. Wait for CI checks to pass
  2. If the spellchecker fails, they must go into the CI logs to check why
  3. Once they've figured out that something is spelled "incorrectly" (it might be a real word that's simply unregistered with the CI's spellchecker), they must put in work to correct it. This then has the following steps:
    1. First they must add the word to the list of "approved" words
    2. Stage the file for git changes
    3. Commit the file to git (either amending it and needing to then force push, or making another commit that someone might then ask them to later squash)
    4. Push the commit up to git
  4. Wait for the CI checks to re-run

If they want to figure out the tooling and run this locally, this itself involves more steps of:

  1. Figuring out what the command is
  2. Downloading the spellchecker
  3. Again, going through the process I've outlined above

If you are a first time contributor, these sorts of things are not obvious and often take more time in order to make something go through. Any barriers that you add is asking the contributor to do more work to make things go through.

From the perspective of first-time contributors, they are mentally asking themselves: Why would I spend the time contributing to this project, when I can simply go build up another project that will be less restrictive and more respectful of my time?

@courtneypacheco
Copy link
Contributor

So, I think every CI check here has some value -- there's no question about that. Spell checking is generally a very good thing. However, I think the issue here is that the spell check in our CI seems to lower developer productivity without giving us much benefit in return. Many contributors are regularly forced to update the dictionary with words that theoretically shouldn't even have to be added. So I'm pretty in favor of removing this particular CI check.

One could argue that yes, it's important to have "good spelling" and "no typos", but if we truly want pristine, professional documentation, then we should technically add a grammar checker into the mix as well. if you type like this and you don't use punctuation how does anyone know what you mean to say isn't it confusing. But at that point, we'll be adding so many CI hoops to jump through that many contributors will be left wondering: is it worth making a PR?

@nathan-weinberg
Copy link
Member

CI isn't about solving development pain points - it's about ensuring that when we take a contribution from anyone, from a first-time contributor to a project maintainer, that the codebase - inclusive of documentation - is the highest quality it can be. I already stated why I personally do not find this such a velocity hamperment, given that if you run the local check the CI run is a moot point.

If it's a matter of quality - why is it that most popular open source projects lack spell checkers? For instance, this repository largely builds on technology from HuggingFace and PyTorch which are widely known and respected in the AI community, with thousands of contributors. But I don't see their repos making use of spellcheckers.

The evidence of absence does not equal evidence that something should be absent.

I already stated why I personally do not find this such a velocity hamperment, given that if you run the local check the CI run is a moot point.

This claim is simply untrue. Any filter you add on contributions requires effort on the contributor's end to:

1. Wait for CI checks to pass

2. If the spellchecker fails, they must go into the CI logs to check why

3. Once they've figured out that something is spelled "incorrectly" (it might be a real word that's simply unregistered with the CI's spellchecker), they must put in work to correct it. This then has the following steps:
   
   1. First they must add the word to the list of "approved" words
   2. Stage the file for git changes
   3. Commit the file to git (either amending it and needing to then force push, or making another commit that someone might then ask them to later squash)
   4. Push the commit up to git

4. Wait for the CI checks to re-run

This entirely ignores the fact you can run this locally. If you choose not to run it locally, as with any CI check, is the project to be faulted that you didn't run local checks? Would we say the same about unit tests?

If they want to figure out the tooling and run this locally, this itself involves more steps of:

1. Figuring out what the command is

2. Downloading the spellchecker

3. Again, going through the process I've outlined above

If you are a first time contributor, these sorts of things are not obvious and often take more time in order to make something go through. Any barriers that you add is asking the contributor to do more work to make things go through.

From the perspective of first-time contributors, they are mentally asking themselves: Why would I spend the time contributing to this project, when I can simply go build up another project that will be less restrictive and more respectful of my time?

This is a problem either of contributors not having access to proper resources to contribute, or contributors choosing to ignore such documentation. Again, would we do the same for unit tests? Why should I as a contributor learn how they work?

I will cede to @courtneypacheco's point and what @RobotSail and others have raised here - people don't want to deal with it, they don't see the value, ergo let's remove it - I've said time and again that folks should feel free to proceed with it. I just find the development velocity argument to be weak personally.

@RobotSail
Copy link
Member

The evidence of absence does not equal evidence that something should be absent.

I would disagree, there are inherent patterns we converge to for making things work well. Something not being adopted for this long means that people either overlooked it, or they tried it and found it not to add value. By deduction, we have clearly not overlooked it. Therefore we can conclude that most likely others have also tried it and found it not to have added value.

In our case, we have tried it, and most have reported it to provide an overall negative experience without actually helping to maintain the quality of our codebase.

From my personal experience, there's a lot that I'd like to do in a day but only so much I can do. Because I know that dealing with a spellchecker in CI is just going to spend more time, I tend to not want to update any of our markdown documents as much, because it's going to take time away from other tasks. As a result, our documents end up drifting out of date.

This entirely ignores the fact you can run this locally. If you choose not to run it locally, as with any CI check, is the project to be faulted that you didn't run local checks? Would we say the same about unit tests?

This is ignoring the original point being made, which is that new contributors will not be familiar with our tooling. So their choices are to either spend more time learning how a spellchecker can be run locally, or parsing CI logs. Either way, it's not a good use of their time and will decrease the chances of them coming back.

@RobotSail
Copy link
Member

This is a problem either of contributors not having access to proper resources to contribute, or contributors choosing to ignore such documentation. Again, would we do the same for unit tests? Why should I as a contributor learn how they work?

If you're contributing to a new project, are you really going to take all of the time to read the entirety of their documentation before solving a problem you found? That sounds like a lot of time to invest just to fix or update something small.

@RobotSail
Copy link
Member

@nathan-weinberg What if instead of deleting this from the CI completely, we ran a trial for a month where we keep the check disabled in CI. This way - if at the end of the month, we have found that lots of spelling errors have gone through and reduced the readability of the documents, we can simply turn the check back on. Otherwise, we can just agree to keep it off. How does that sound?

@courtneypacheco
Copy link
Contributor

courtneypacheco commented Feb 7, 2025

I agree with @RobotSail - let's try 1 month without.

Also, if tons of typos and grammatical errors get through without this CI spell check in place, I'd argue that perhaps we have a reviewer problem where reviewers aren't paying close enough attention to documentation updates in general. I'm not saying reviewers should be expected to catch 100% of typos or grammatical errors, but if not a single typo or grammatical error gets caught during a review, then that's worrying and a CI spell check will only be a bandaid at best for a much larger, underlying reviewer issue.

@booxter booxter mentioned this pull request Feb 7, 2025
6 tasks
@booxter
Copy link
Contributor Author

booxter commented Feb 7, 2025

Sent #3133 hope this is a good compromise.

mergify bot added a commit that referenced this pull request Feb 14, 2025
We'd like to see if disabling spell check will make us agregeously
ilitarate. ;) If the experiment doesn't go well, we can revert this
patch later.

This is a follow-up to #3130

Signed-off-by: Ihar Hrachyshka <[email protected]>











**Checklist:**

- [ ] **Commit Message Formatting**: Commit titles and messages follow guidelines in the
  [conventional commits](https://www.conventionalcommits.org/en/v1.0.0/#summary).
- [ ] [Changelog](https://github.com/instructlab/instructlab/blob/main/CHANGELOG.md) updated with breaking and/or notable changes for the next minor release.
- [ ] Documentation has been updated, if necessary.
- [ ] Unit tests have been added, if necessary.
- [ ] Functional tests have been added, if necessary.
- [ ] E2E Workflow tests have been added, if necessary.



Approved-by: RobotSail

Approved-by: courtneypacheco
@booxter
Copy link
Contributor Author

booxter commented Mar 11, 2025

It's been a month. I think it's time to revisit if we struggle without spellcheck job. I rebased the patch.

@mergify mergify bot added ci-failure PR has at least one CI failure and removed one-approval PR has one approval from a maintainer labels Mar 11, 2025
I find them not useful and mostly distracting, esp. to new contributors.

Of course, this is not an endorsement to stop carrying about egregious
spelling issues, or where it's important (in user facing docs,
changelog, etc.) Even then, some of these could be handled in particular
moments in release schedule (when prepping a new release cut).

Signed-off-by: Ihar Hrachyshka <[email protected]>
@mergify mergify bot removed the ci-failure PR has at least one CI failure label Mar 19, 2025
@booxter booxter removed the hold In-progress PR. Tag should be removed before merge. label Mar 19, 2025
@mergify mergify bot merged commit cb09d9b into instructlab:main Mar 19, 2025
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI/CD Affects CI/CD configuration dependencies Relates to dependencies documentation Improvements or additions to documentation testing Relates to testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants