-
-
Notifications
You must be signed in to change notification settings - Fork 34.5k
Robots.txt parsing fails when one rule line is invalid #111788
Copy link
Copy link
Closed
Labels
3.13bugs and security fixesbugs and security fixes3.14bugs and security fixesbugs and security fixes3.15new features, bugs and security fixesnew features, bugs and security fixesstdlibStandard Library Python modules in the Lib/ directoryStandard Library Python modules in the Lib/ directorytype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error
Metadata
Metadata
Assignees
Labels
3.13bugs and security fixesbugs and security fixes3.14bugs and security fixesbugs and security fixes3.15new features, bugs and security fixesnew features, bugs and security fixesstdlibStandard Library Python modules in the Lib/ directoryStandard Library Python modules in the Lib/ directorytype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error
Bug report
Bug description:
The robots.txt parsing fails if one line is not parsable from a robots.txt file. I don't think this is valid behavior. Ideally, non-parsable/invalid lines should be skipped. The norobots-rfc says the same too:
Implementors should pay particular attention to the robustness in parsing of the /robots.txt file..I know
[routes.productDetail(product.sku, product.slug)is clearly not a valid URL, but I don't think the whole parsing should error out because of this one line.CPython versions tested on:
3.11
Operating systems tested on:
Linux
Linked PRs
urllib.robotparser#113231