Releases: Polifack/CoDeLin
Releases · Polifack/CoDeLin
1.25
1.21
1.20
- Replaced incremental encoding with a flag that allows to encode incrementally absolute, relative and dynamic encodings
- Added a flag that allows to binarize the constituent trees before encoding them
- Added tetra-tagging (with preorder, inorder and postorder traversal) encoding for constituent parsing
- Added attach-juxtapose encoding for constituent parsing
- Added 4-bits encoding for dependency parsing
- Added 7-bits encoding for dependency parsing
- Added hexatagging encoding for dependency parsing
- Updated function calls and changed src folder to codelin
- Updated notebook and readme
1.10
- Added incremental encoding for constituent parsing. This encoding should encode the
$nc_i$ field of the label$l_i$ of word$w_i$ as the number of common nodes that it has with$w_{i-1}$ . - Added relative encoding for dependency parsing option where we encode the root of the tree as a special token instead of computing the distance to 0. This should decrease the output label sparsity
- Implemented from scratch constituent trees, dependency trees and linearized trees as objects.
- Added a test module that should (i) check that the file obtained from encoding and decoding the labels match, (ii) check that there are no nulls in the decoded constituent trees and (iii) check that there are no out of bounds heads and loops in the decoded dependency trees.
- Re-structured the code to ease the use of CoDeLin as a library. Added a notebook with colab link (se README) as a tutorial.
Last version with Stanza Trees
Final tag that employs Stanza Trees. All further releases will have a built-in constituent and dependency trees data structures.
Bachelor Thesis
Version of the linearization system as delivered in the bachelor's thesis