DeBERTa: Decoding-enhanced BERT with Disentangled Attention

This repository is a fork of the official implementation of DeBERTa: Decoding-enhanced BERT with Disentangled Attention and DeBERTa V3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing

This repository is a modification of DeBERTa, adding a hierarchical architecture which combines the advantages of both character and word tokenizers, with word-level self attention and unlimited vocabulary as described in From Characters to Words: Hierarchical Pre-trained Language Model for Open-vocabulary Language Understanding. This implementation combines its advantages with the DeBERTA architecture. Pass --token_format char_to_word to the data preparation and training scripts.

Supported token formats

--token_format char

--token_format subword

--token_format char_to_word

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
DeBERTa		DeBERTa
char_experiments		char_experiments
experiments		experiments
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
VERSION		VERSION
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

Supported token formats

About

Uh oh!

Releases

Packages

Languages

License

kevinkrahn/CTW_DeBERTa

Folders and files

Latest commit

History

Repository files navigation

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

Supported token formats

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages