- The code has been run on Google Colab which provides free GPU memory
-
Natural Language Processing(自然语言处理)
-
-
IMDB
-
TF-IDF + Logistic Regression
-
FastText
-
Attention
-
Sliced LSTM
-
-
-
-
SNLI
-
DAM
-
MatchPyramid
-
ESIM
-
RE2
-
-
-
Chatbot(对话机器人)
-
Spoken Language Understanding(对话理解)
-
ATIS
-
Bi-GRU
-
Bi-GRU + CRF
-
Transformer
-
Bi-GRU + Transformer
-
ELMO + Bi-GRU
-
-
-
-
-
Semantic Parsing for Task Oriented Dialog
- RNN Seq2Seq + Attention
-
-
-
bAbI
- Dynamic Memory Network
-
-
-
Knowledge Graph(知识图谱)
-
Knowledge Graph Completion(知识图谱补全)
-
WN18
-
DistMult
-
ComplEx
-
-
-
Knowledge Graph Retrieval(知识图谱检索)
-
WN18
- SPARQL
-
-
-
-
Movielens 1M
-
Fusion
-
Classification
-
Regression
-
-
-
└── finch/tensorflow2/text_classification/imdb
│
├── data
│ └── glove.840B.300d.txt # pretrained embedding, download and put here
│ └── make_data.ipynb # step 1. make data and vocab: train.txt, test.txt, word.txt
│ └── train.txt # incomplete sample, format <label, text> separated by \t
│ └── test.txt # incomplete sample, format <label, text> separated by \t
│ └── train_bt_part1.txt # (back-translated) incomplete sample, format <label, text> separated by \t
│
├── vocab
│ └── word.txt # incomplete sample, list of words in vocabulary
│
└── main
└── attention_linear.ipynb # step 2: train and evaluate model
└── attention_conv.ipynb # step 2: train and evaluate model
└── fasttext_unigram.ipynb # step 2: train and evaluate model
└── fasttext_bigram.ipynb # step 2: train and evaluate model
└── sliced_rnn.ipynb # step 2: train and evaluate model
└── sliced_rnn_bt.ipynb # step 2: train and evaluate model
-
Task: IMDB
-
Model: TF-IDF + Logistic Regression
-
Model: FastText
-
Model: Feedforward Attention
-
Model: Sliced RNN
-
TensorFlow 2
-
<Notebook> Sliced LSTM + Back-Translation -> 91.7 % Testing Accuracy
-
<Notebook> Sliced LSTM + Back-Translation + Char Embedding -> 92.3 % Testing Accuracy
This result (without transfer learning) is higher than CoVe (with transfer learning)
-
└── finch/tensorflow2/text_matching/snli
│
├── data
│ └── glove.840B.300d.txt # pretrained embedding, download and put here
│ └── download_data.ipynb # step 1. run this to download snli dataset
│ └── make_data.ipynb # step 2. run this to generate train.txt, test.txt, word.txt
│ └── train.txt # incomplete sample, format <label, text1, text2> separated by \t
│ └── test.txt # incomplete sample, format <label, text1, text2> separated by \t
│
├── vocab
│ └── word.txt # incomplete sample, list of words in vocabulary
│
└── main
└── dam.ipynb # step 3. train and evaluate model
└── esim.ipynb # step 3. train and evaluate model
-
Task: SNLI
-
Model: DAM
-
TensorFlow 2
The accuracy of this implementation is higher than UCL MR Group (84.6%)
-
-
Model: Match Pyramid
-
TensorFlow 2
The accuracy of this model is 0.3% below ESIM, however the speed is 1x faster than ESIM
-
-
Model: ESIM
-
TensorFlow 2
The accuracy of this implementation is sligntly higher than UCL MR Group (87.2%)
-
TensorFlow 1
-
-
Model: RE2
- Data: Some Book Titles
-
Model: TF-IDF + LDA
-
PySpark
-
Sklearn + pyLDAvis
-
-
└── finch/tensorflow2/spoken_language_understanding/atis
│
├── data
│ └── glove.840B.300d.txt # pretrained embedding, download and put here
│ └── make_data.ipynb # step 1. run this to generate vocab: word.txt, intent.txt, slot.txt
│ └── atis.train.w-intent.iob # incomplete sample, format <text, slot, intent>
│ └── atis.test.w-intent.iob # incomplete sample, format <text, slot, intent>
│
├── vocab
│ └── word.txt # list of words in vocabulary
│ └── intent.txt # list of intents in vocabulary
│ └── slot.txt # list of slots in vocabulary
│
└── main
└── bigru.ipynb # step 2. train and evaluate model
└── bigru_self_attn.ipynb # step 2. train and evaluate model
└── transformer.ipynb # step 2. train and evaluate model
└── transformer_elu.ipynb # step 2. train and evaluate model
-
Task: ATIS
-
Model: Bi-directional RNN
-
TensorFlow 2
-
97.8% Intent F1, 95.5% Slot F1 on Testing Data
-
-
TensorFlow 1
-
97.2% Intent F1, 95.7% Slot F1 on Testing Data
-
-
-
Model: Transformer
-
TensorFlow 2
-
97.5% Intent F1, 94.9% Slot F1 on Testing Data
-
<Notebook> Transformer + ELU activation
97.2% Intent F1, 95.5% Slot F1 on Testing Data
-
<Notebook> Bi-GRU + Transformer
97.7% Intent F1, 95.8% Slot F1 on Testing Data
-
-
-
Model: ELMO Embedding
-
TensorFlow 1
-
<Notebook> ELMO (the first LSTM hidden state) + Bi-GRU
97.6% Intent F1, 96.2% Slot F1 on Testing Data
-
<Notebook> ELMO (weighted sum of 3 layers) + Bi-GRU
97.6% Intent F1, 96.1% Slot F1 on Testing Data
-
-
-
Task: A Chinese Dataset(小黄鸡语料)
-
Model: RNN Seq2Seq
-
TensorFlow 1
-
GRU + Bahdanau Attention + Luong Attention + Beam Search
-
-
-
└── finch/tensorflow2/semantic_parsing/tree_slu
│
├── data
│ └── glove.840B.300d.txt # pretrained embedding, download and put here
│ └── make_data.ipynb # step 1. run this to generate vocab: word.txt, intent.txt, slot.txt
│ └── train.tsv # incomplete sample, format <text, tokenized_text, tree>
│ └── test.tsv # incomplete sample, format <text, tokenized_text, tree>
│
├── vocab
│ └── source.txt # list of words in vocabulary for source (of seq2seq)
│ └── target.txt # list of words in vocabulary for target (of seq2seq)
│
└── main
└── gru_seq2seq.ipynb # step 2. train and evaluate model
└── lstm_seq2seq.ipynb # step 2. train and evaluate model
-
Task: Semantic Parsing for Task Oriented Dialog
-
Model: RNN Seq2Seq
-
TensorFlow 2
-
<Notebook> GRU + Bahdanau Attention -> 72.9% Exact Match Accuracy on Testing Data
-
<Notebook> LSTM + Bahdanau Attention -> 72.2% Exact Match Accuracy on Testing Data
-
-
TensorFlow 1
-
<Notebook 1> <Notebook 2> ELMO + GRU + Bahdanau Attention + Luong Attention + Beam Search
-> 74.5% Exact Match Accuracy on Testing Data
-
-
└── finch/tensorflow2/knowledge_graph_completion/wn18
│
├── data
│ └── download_data.ipynb # step 1. run this to download wn18 dataset
│ └── make_data.ipynb # step 2. run this to generate vocabulary: entity.txt, relation.txt
│ └── wn18 # wn18 folder (will be auto created by download_data.ipynb)
│ └── train.txt # incomplete sample, format <entity1, relation, entity2> separated by \t
│ └── valid.txt # incomplete sample, format <entity1, relation, entity2> separated by \t
│ └── test.txt # incomplete sample, format <entity1, relation, entity2> separated by \t
│
├── vocab
│ └── entity.txt # incomplete sample, list of entities in vocabulary
│ └── relation.txt # incomplete sample, list of relations in vocabulary
│
└── main
└── distmult_1-N.ipynb # step 3. train and evaluate model
-
Task: WN18
-
Model: DistMult + 1-N Fast Evaluation
MRR: Mean Reciprocal Rank
-
TensorFlow 2
-
TensorFlow 1
-
-
Model: ComplEx + 1-N Fast Evaluation
-
TensorFlow 2
-
-
Task: WN18
└── finch/tensorflow1/question_answering/babi
│
├── data
│ └── make_data.ipynb # step 1. run this to generate vocabulary: word.txt
│ └── qa5_three-arg-relations_train.txt # one complete example of babi dataset
│ └── qa5_three-arg-relations_test.txt # one complete example of babi dataset
│
├── vocab
│ └── word.txt # complete list of words in vocabulary
│
└── main
└── dmn_train.ipynb
└── dmn_serve.ipynb
└── attn_gru_cell.py
-
Task: bAbI
-
Slot Extraction
-
Chinese
-
└── finch/tensorflow1/recommender/movielens
│
├── data
│ └── make_data.ipynb # run this to generate vocabulary
│
├── vocab
│ └── user_job.txt
│ └── user_id.txt
│ └── user_gender.txt
│ └── user_age.txt
│ └── movie_types.txt
│ └── movie_title.txt
│ └── movie_id.txt
│
└── main
└── dnn_softmax.ipynb
└── dnn_mse.ipynb
-
Task: Movielens 1M
-
Model: Fusion
-
TensorFlow 1
MAE: Mean Absolute Error
-