Neural Word Sense Disambiguation integrating synonyms in WordNet synsets

Intro

In this work we present a Word Sense Disambiguation (WSD) engine that integrates a Transformer-based neural architecture with knowledge present in WordNet, the resource from which the sense inventory is taken from.

Model

The architecture is composed of contextualized embeddings plus a Transformer on top with a final dense layer.

The models available include a base RoBERTa embeddings and are:

rdense with only a two dense layer encoder.
rtransform with a Transformer encoder.
wsddense with a two dense layer encoder + an advanced lemma prediction net.
wsdnetx same as above but with a Transformer encoder.

The advanced net can be represented as:

where h is the final hidden state of the encoder. The |S|x|V| matrix is build like in the following:

Training data

As a training dataset we use both SemCor and WordNet Gloss Corpus.

Environment setup

git clone http://github.com/spallas/wsd.git

cd wsd/ || return 1

tmux new -s train

python -c "import torch; print(torch.__version__)"

source setup.sh

Unzip in the res/ folder the pre-processed training and test data that you can download here. Also unzip in res dictinaries data that you can download here

Further Details

Please refer to the wiki page in this repository for further details about the implementation.

Notes: RoBERTa installation

# Download roberta.large model
cd res/
wget https://dl.fbaipublicfiles.com/fairseq/models/roberta.large.tar.gz
tar -xzvf roberta.large.tar.gz

# Load the model in fairseq
from fairseq.models.roberta import RobertaModel
roberta = RobertaModel.from_pretrained('res/roberta.large', checkpoint_file='model.pt')
roberta.eval()  # disable dropout (or leave in train mode to finetune)

Name		Name	Last commit message	Last commit date
Latest commit History 401 Commits
conf		conf
img		img
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build_dictionaries.py		build_dictionaries.py
data_preprocessing.py		data_preprocessing.py
models.py		models.py
requirements.txt		requirements.txt
setup.sh		setup.sh
train.py		train.py
wsd.py		wsd.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Neural Word Sense Disambiguation integrating synonyms in WordNet synsets

Intro

Model

Training data

Environment setup

Further Details

Notes: RoBERTa installation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

spallas/wsd

Folders and files

Latest commit

History

Repository files navigation

Neural Word Sense Disambiguation integrating synonyms in WordNet synsets

Intro

Model

Training data

Environment setup

Further Details

Notes: RoBERTa installation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages