Welcome to pyfn, a Python modules to process FrameNet annotation.
pyfn can be used to convert data to and from:
- FRAMENET XML: the format of the released FrameNet XML data
- SEMEVAL XML: the format of the SEMEVAL 2007 shared task 19 on frame semantic structure extraction
- SEMAFOR CoNLL: the format used by the SEMAFOR parser
- BIOS: the format used by the OPEN-SESAME parser
- CoNLL-X: the format used by various state-of-the-art POS taggers and dependency parsers (see preprocessing considerations for frame semantic parsing in REPLICATION.md)
As well as to generate the .csv hierarchy files used by both SEMAFOR and
OPEN-SESAME parsers to integrate the hierarchy feature (see (Kshirsagar et al., 2015) for details).
This repository also accompanies the (Kabbach et al., 2018) paper:
@InProceedings{C18-1267,
  author = 	"Kabbach, Alexandre
		and Ribeyre, Corentin
		and Herbelot, Aur{\'e}lie",
  title = 	"Butterfly Effects in Frame Semantic Parsing: impact of data processing on model ranking",
  booktitle = 	"Proceedings of the 27th International Conference on Computational Linguistics",
  year = 	"2018",
  publisher = 	"Association for Computational Linguistics",
  pages = 	"3158--3169",
  location = 	"Santa Fe, New Mexico, USA",
  url = 	"http://aclweb.org/anthology/C18-1267"
}To use pyfn to replicate frame semantic parsing results for SEMAFOR,
OPEN-SESAME and SIMPLEFRAMEID on a common preprocessing pipeline,
or to replicate results reported in (Kabbach et al., 2018),
check out REPLICATION.md.
On Unix, you may need to install the following packages:
libxml2 libxml2-dev libxslt1-dev python-3.x-dev
pip3 install pyfn
When using pyfn, your FrameNet splits directory structure should follow:
.
|-- fndata-1.x
|   |-- train
|   |   |-- fulltext
|   |   |-- lu
|   |-- dev
|   |   |-- fulltext
|   |   |-- lu
|   |-- test
|   |   |-- fulltext
|   |   |-- lu
For an exhaustive description of all formats, check out FORMAT.md.
The following sections provide examples of commands to convert FN data to and from different formats. All commands can make use of the following options:
- --splits: specify which splits should be converted.- --splits trainwill generate all train/dev/test splits, according to data found under the fndata-1.x/{train/dev/test} directories.- --splits devwill generate the dev and test splits according to data found under the fndata-1.x/{dev/test} directories. This option will skip the train splits but generate the same dev/test splits that would have been generated with- --splits train.- --splits testwill generate the test splits according to data found under the fndata-1.x/test directory, and skip the train/dev splits. The test splits generated with- --splits testwill be the same as those generated with the- --splits trainand- --splits dev. Default to- --splits test.
- --output_sentences: if specified, will output a- .sentencesfile in the process, containing all raw annotated sentences, one sentence per line.
- --with_exemplars: if specified, will process the exemplars (data under the- ludirectory) in addition to fulltext.
- --filter: specify data filtering options (see details below).
For details on pyfn usage, do:
pyfn --help
pyfn generate --help
pyfn convert --helpTo convert data from FrameNet XML format to BIOS format, do:
pyfn convert \
  --from fnxml \
  --to bios \
  --source /abs/path/to/fndata-1.x \
  --target /abs/path/to/xp/data/output/dir \
  --splits train \
  --output_sentences \
  --filter overlap_fesUsing --filter overlap_fes will skip all annotationsets with overlapping
frame elements, as those cases are not supported by the BIOS format.
To generate the train.frame.elements file used to train SEMAFOR, and the
{dev,test}.frames file used for decoding, do:
pyfn convert \
  --from fnxml \
  --to semafor \
  --source /abs/path/to/fndata-1.x \
  --target /abs/path/to/xp/data/output/dir \
  --splits train \
  --output_sentencesTo generate the {dev,test}.gold.xml gold files in SEMEVAL format for scoring, do:
pyfn convert \
  --from fnxml \
  --to semeval \
  --source /abs/path/to/fndata-1.x \
  --target /abs/path/to/xp/data/output/dir \
  --splits {dev,test}To convert the decoded BIOS files {dev,test}.bios.semeval.decoded of
OPEN-SESAME to SEMEVAL XML format for scoring, do:
pyfn convert \
  --from bios \
  --to semeval \
  --source /abs/path/to/{dev,test}.bios.semeval.decoded \
  --target /abs/path/to/output/{dev,test}.predicted.xml \
  --sent /abs/path/to/{dev,test}.sentencesTo convert the decoded {dev,test}.frame.elements files of SEMAFOR to
SEMEVAL XML format for scoring, do:
pyfn convert \
  --from semafor \
  --to semeval \
  --source /abs/path/to/{dev,test}.frame.elements \
  --target /abs/path/to/output/{dev,test}.predicted.xml \
  --sent /abs/path/to/{dev,test}.sentencespyfn generate \
  --source /abs/path/to/fndata-1.x \
  --target /abs/path/to/xp/data/output/dirTo also process exemplars, add the --with_exemplars option
We created a set of bash scripts to preprocess FrameNet data with various
POS taggers and dependency parsers as well as to run the SIMPLEFRAMEID,
SEMAFOR and OPEN-SESAME frame semantic parsers.
Check out REPLICATION.md for a detailed HowTo manual.