-
Notifications
You must be signed in to change notification settings - Fork 9
Closed
Labels
Description
Requirement:
Enforce consistent numerical order of rows in CoNLL export
Description:
Under uncertain circumstances, -conll export resorts to lexicographic order of nif:Words (tbc. whether this uses URI or conll:ID), i.e., 1 10 11 ... 2 ... instead of numerical order 1 2 ... 9 10 11 .... This is reproducible, but it occurs on samples from the same source corpus (i.e., having the same structure).
Samples:
- snippet-lex-order.zip
- snippet-num-order.zip
- Test with
cat $MY_FILE | ./run.sh CoNLLRDFFormatter -conll ID WORD
Comments:
- Also note that additional line breaks are introduced in CoNLL export. These should apply to the last row only.
- Note that the issue does not apply to
-grammaror default (RDF) serializations (which seem to follow numerical order consistently).