Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

yochem
Copy link
Contributor

@yochem yochem commented Sep 1, 2025

Problem:
scripts/check_urls.vim manually matches urls in the help pages and then synchronously checks them via curl/wget/powershell. This is extremely slow (~5 minutes for Nvims runtime on my machine) and prone to errors in how the urls are matched.

Solution:
Use Treesitter to find the urls in the help pages. Asynchronously call curl for their response. Takes around 20s for the same docs (with a timeout of 5s per request).


Current limitations:

  • Uses vim.net.request (curl), not wget and powershell
  • Is limited by the urls found by tree-sitter-vimdoc, it finds 443 urls instead of 517 for the Vimscript implementation

@justinmk
Copy link
Member

justinmk commented Sep 1, 2025

This is extremely slow (~5 minutes for Nvims runtime on my machine) and prone to errors in how the urls are matched.

Spawn it in a child job. I don't think vim.async is needed here, this script is not intended for interactive use, it's a build-time step.

Comment on lines 6 to 9
local function curl(url, cb)
local cmd = {
'curl',
'--silent',
Copy link
Member

@justinmk justinmk Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can vim.net.request be used or what is missing from it?

oh, i guess it's missing "async" behavior.

Copy link
Contributor Author

@yochem yochem Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the redirect -L flag and that the error code value is a string, but I think that checking err ~= nil might be fine too.

Copy link
Contributor Author

@yochem yochem Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay most importantly it's missing a way to add a timeout, or I'm not seeing it. This could either be fixed by passing --max-time 3 to curl, or for example the approach proposed by Lewis: #34140 (comment)

@justinmk justinmk changed the title Refactor check_urls.vim to Lua + Treesitter refactor(scripts): check_urls.vim to Lua + Treesitter Sep 1, 2025
@justinmk justinmk added the refactor changes that are not features or bugfixes label Sep 1, 2025
@yochem
Copy link
Contributor Author

yochem commented Sep 1, 2025

Spawn it in a child job. I don't think vim.async is needed here, this script is not intended for interactive use, it's a build-time step.

I added vim.async so a check for one url doesn't have to wait for the others. Is this the wrong use-case of vim.async and not supported by vim.async/coroutines? If it's not, then it's indeed not necessary.

@justinmk
Copy link
Member

justinmk commented Sep 1, 2025

I added vim.async so a check for one url doesn't have to wait for the others.

that makes sense, just wondering if it matters since this is mostly a CI job and we can always enhance it later. the vim.async PR might take a bit longer.

-- if output is not specified, err will always be nil (seems curl-specific)
vim.net.request(url, { retry = 1, outpath = '/dev/null' }, function(err, _)
if err then
vim.print(('Unreachable url in %s: %s'):format(filename, url))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinmk Preference for the output format? (here filename is the helpfile's name)

Problem:
scripts/check_urls.vim manually matches urls in the help pages and then
synchronously checks them via curl/wget/powershell. This is extremely
slow (~5 minutes for Nvims runtime on my machine) and prone to errors in
how the urls are matched.

Solution:
Use Treesitter to find the urls in the help pages. Asynchronously call
curl for their response. Takes around 10s for the same docs.
@@ -25,7 +25,7 @@ function M.request(url, opts, on_response)
local retry = opts.retry or 3

-- Build curl command
local args = { 'curl' }
local args = { 'curl', '--max-time', '5' }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the final solution, but need to find a way to have this behaviour. Either by allowing curl arguments to vim.net.request or some callback-wizardry.

-- Usage:
-- $ ./scripts/check_urls.lua [DIR...]
--
-- [DIR...] defaults to all 'doc' directories in the runtimepath.
Copy link
Contributor Author

@yochem yochem Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current behaviour also checks runtime/pack/dist/opt (seems okay) but that also includes netrw (booo). I still think it should be like this: $VIMRUNTIME/doc is easy to supply as script argument, 'all doc directories in the rtp' is not.

@yochem yochem marked this pull request as ready for review September 5, 2025 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactor changes that are not features or bugfixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants