Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Source codes and corpora of paper "Iterated Dilated Convolutions for Chinese Word Segmentation"

License

Notifications You must be signed in to change notification settings

hankcs/ID-CNN-CWS

Repository files navigation

ID-CNN-CWS

Source codes and corpora of paper "Iterated Dilated Convolutions for Chinese Word Segmentation" published in NNW journal.

2017-10-20_13-23-31

It implements the following 4 models for CWS:

  • Bi-LSTM
  • Bi-LSTM-CRF
  • ID-CNN
  • ID-CNN-CRF

Dependencies

  • Python >= 3.6
  • TensorFlow >= 1.2

Both CPU and GPU are supported. GPU training is 10 times faster.

Preparation

Run following script to convert corpus to TensorFlow dataset.

$ ./scripts/make.sh

Train and Test

Quick Start

$ ./scripts/run.sh $dataset $model
  • $dataset can be pku, msr, asSC or cityuSC.
  • $model can be cnn or bilstm.

For example:

$ ./scripts/run.sh pku cnn

It will train a cnn model on pku dataset, then evaluate performance on test set.

CRF Layer

To enable CRF layer, simply append --viterbi to your command, e.g.

$ ./scripts/run.sh pku cnn --viterbi

Accuracy

2017-10-20_13-25-11

Speed

2017-10-20_11-44-42

Acknowledgments

About

Source codes and corpora of paper "Iterated Dilated Convolutions for Chinese Word Segmentation"

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •