Improve rel parsing#162
Conversation
* Parse the rel attribute in accordance with the WHATWG spec: https://infra.spec.whatwg.org/#split-on-ascii-whitespace * Only list unique rel values in the rel-urls output, fixes microformats#159: microformats/microformats2-parsing#30 * Sort the unique rel values alphabetically: microformats/microformats2-parsing#29 * Correctly merge attribute values into the resulting object.
|
I didn’t touch the alternates parsing because its behaviour isn’t in the parsing specification. I can’t comment on its correctness. |
This was sparked when looking into how attributes were parsed for microformats/microformats2-parsing#32. I have not tried to otherwise validate how well the parser implements that part of the rel spec and may have to be revisited when the spec issue is resolved. |
|
👍 |
Parsing of the
relattribute has been improved by using WHATWG’s definition of splitting on whitespace. We also treat it as a set, which means we drop duplicate values. This might spare us a loop or two.The properties (
hreflang,media,title,type,text) we add to therel-urlsobject should only be added if they weren’t previously set. Overwriting order has been changed to accord with this.The test
testRelURLsInfoMergesCorrectlyis a simple check for the correct behaviour.The final
relsarray within therel-urlsobject has been changed to contain only unique items, sorted alphabetically. This reflects the latest parsing specification change.The test
testRelURLsRelsUniqueAndSortedis a simple check for both uniqueness and order.The arrays of URLs within the
relsobject should not contain duplicates. The code has been changed to not add any URLs that are already present in the array. Based on the following line in the parsing spec, that had me reread it thrice as a non-native English speaker:The test
testRelURLsNoDuplicatesis a simple check to make sure duplicate URLs aren’t added.Closes #159 if merged.