ID-CNN-CWS

Source codes and corpora of paper "Iterated Dilated Convolutions for Chinese Word Segmentation" published in NNW journal.

It implements the following 4 models for CWS:

Bi-LSTM
Bi-LSTM-CRF
ID-CNN
ID-CNN-CRF

Dependencies

Python >= 3.6
TensorFlow >= 1.2

Both CPU and GPU are supported. GPU training is 10 times faster.

Preparation

Run following script to convert corpus to TensorFlow dataset.

$ ./scripts/make.sh

Train and Test

Quick Start

$ ./scripts/run.sh $dataset $model

$dataset can be pku, msr, asSC or cityuSC.
$model can be cnn or bilstm.

For example:

$ ./scripts/run.sh pku cnn

It will train a cnn model on pku dataset, then evaluate performance on test set.

CRF Layer

To enable CRF layer, simply append --viterbi to your command, e.g.

$ ./scripts/run.sh pku cnn --viterbi

Accuracy

Speed

Acknowledgments

Corpora are from SIGHAN05, converted to Simplified Chinese via HanLP. Note that the SIGHAN datasets should only be used for research purposes.
Model implementations adopted from https://github.com/iesl/dilated-cnn-ner by Emma Strubell.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bilstm.py		bilstm.py
bilstm_char.py		bilstm_char.py
cnn.py		cnn.py
cnn_char.py		cnn_char.py
convert_corpus.py		convert_corpus.py
data_utils.py		data_utils.py
eval_f1.py		eval_f1.py
official_scorer.py		official_scorer.py
radical.py		radical.py
score		score
tf_utils.py		tf_utils.py
train.py		train.py
tsv_to_tfrecords.py		tsv_to_tfrecords.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ID-CNN-CWS

Dependencies

Preparation

Train and Test

Quick Start

CRF Layer

Accuracy

Speed

Acknowledgments

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

hankcs/ID-CNN-CWS

Folders and files

Latest commit

History

Repository files navigation

ID-CNN-CWS

Dependencies

Preparation

Train and Test

Quick Start

CRF Layer

Accuracy

Speed

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages