Loosen csv metadata parsing #3862
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Resolves #3853
Anki's current csv delimiter heuristic works by looking for potential delimiters in ascending expected frequency (
\t,|,;,:,,,) in the first 8kb. While a wrong guess isn't necessarily a problem since the delimiter can be changed in the import options page, hierarchical tags (::) throw a wrench in this, as pointed out by @GithubAnon0000. And it's a common enough usecase that anki maybe shouldn't penalise. I imagine this was also foreseen by Rumov when he left a todo 3 years agoThis pr proposes pulling in and using the
csv-sniffercrate, which is able to correctly deduce the delimiter for the samples in #3588 and #3853 in a multiplatform mannerEDIT: replaced csv-sniffer with qsv-sniffer, a fork that fixes some of the former's issues. However, not only is it more bloated than csv-sniffer, it still (unsurprisingly) has samples that it fails on: jblondin/csv-sniffer#18. Not sure if this is worth pursuing