Codestin Search App

The code has been run on Google Colab which provides free GPU memory

Natural Language Processing（自然语言处理）
- Text Classification（文本分类）
  - IMDB
    - TF-IDF + Logistic Regression
    - FastText
    - Attention
    - Sliced LSTM
- Text Matching（文本匹配）
  - SNLI
    - DAM
    - MatchPyramid
    - ESIM
    - RE2
- Topic Modelling（主题模型）
- Chatbot（对话机器人）
  - Spoken Language Understanding（对话理解）
    - ATIS
      - Bi-GRU
      - Bi-GRU + CRF
      - Transformer
      - Bi-GRU + Transformer
      - ELMO + Bi-GRU
  - Generative Dialog（生成式对话）
- Semantic Parsing（语义解析）
  - Semantic Parsing for Task Oriented Dialog
    - RNN Seq2Seq + Attention
- Question Answering（问题回答）
  - bAbI
    - Dynamic Memory Network
- Text Preprocessing（文本预处理）
Knowledge Graph（知识图谱）
- Knowledge Graph Completion（知识图谱补全）
  - WN18
    - DistMult
    - ComplEx
- Knowledge Graph Retrieval（知识图谱检索）
  - WN18
    - SPARQL
Recommender System（推荐系统）
- Movielens 1M
  - Fusion
    - Classification
    - Regression

Text Classification

└── finch/tensorflow2/text_classification/imdb
	│
	├── data
	│   └── glove.840B.300d.txt          # pretrained embedding, download and put here
	│   └── make_data.ipynb              # step 1. make data and vocab: train.txt, test.txt, word.txt
	│   └── train.txt  		     # incomplete sample, format <label, text> separated by \t 
	│   └── test.txt   		     # incomplete sample, format <label, text> separated by \t
	│   └── train_bt_part1.txt  	     # (back-translated) incomplete sample, format <label, text> separated by \t
	│
	├── vocab
	│   └── word.txt                     # incomplete sample, list of words in vocabulary
	│	
	└── main              
		└── attention_linear.ipynb   # step 2: train and evaluate model
		└── attention_conv.ipynb     # step 2: train and evaluate model
		└── fasttext_unigram.ipynb   # step 2: train and evaluate model
		└── fasttext_bigram.ipynb    # step 2: train and evaluate model
		└── sliced_rnn.ipynb         # step 2: train and evaluate model
		└── sliced_rnn_bt.ipynb      # step 2: train and evaluate model

Task: IMDB
- <Notebook>: Make Data & Vocabulary
- Model: TF-IDF + Logistic Regression
  - PySpark
    - <Notebook> TF + IDF + Logistic Regression -> 88.2% Testing Accuracy
  - Sklearn
    - <Notebook> TF + IDF + Logistic Regression -> 88.3% Testing Accuracy
    - <Notebook> TF (binary) + IDF + Logistic Regression -> 88.8% Testing Accuracy
- Model: FastText
  - Facebook Official Release
    - <Notebook> Unigram FastText -> 87.3% Testing Accuracy
    - <Notebook> Bigram FastText -> 89.8% Testing Accuracy
  - TensorFlow 2
    - <Notebook> Unigram FastText -> 89.1 % Testing Accuracy
    - <Notebook> Bigram FastText -> 90.2 % Testing Accuracy
- Model: Feedforward Attention
  - TensorFlow 2
    - <Notebook> Unigram Attention -> 89.5 % Testing Accuracy
    - <Notebook> 5-gram Attention -> 90.7 % Testing Accuracy
- Model: Sliced RNN
  - TensorFlow 2
    - <Notebook> Sliced LSTM -> 91.4 % Testing Accuracy
    - <Notebook> Sliced LSTM + Back-Translation -> 91.7 % Testing Accuracy
    - <Notebook> Sliced LSTM + Back-Translation + Char Embedding -> 92.3 % Testing Accuracy
      
      This result (without transfer learning) is higher than CoVe (with transfer learning)

Text Matching

└── finch/tensorflow2/text_matching/snli
	│
	├── data
	│   └── glove.840B.300d.txt       # pretrained embedding, download and put here
	│   └── download_data.ipynb       # step 1. run this to download snli dataset
	│   └── make_data.ipynb           # step 2. run this to generate train.txt, test.txt, word.txt 
	│   └── train.txt  		  # incomplete sample, format <label, text1, text2> separated by \t 
	│   └── test.txt   		  # incomplete sample, format <label, text1, text2> separated by \t
	│
	├── vocab
	│   └── word.txt                  # incomplete sample, list of words in vocabulary
	│	
	└── main              
		└── dam.ipynb      	  # step 3. train and evaluate model
		└── esim.ipynb      	  # step 3. train and evaluate model

Task: SNLI
- <Notebook>: Download Data
- <Notebook>: Make Data & Vocabulary
  - <Text File>: Data Example
  - <Text File>: Vocabulary Example
- Model: DAM
  - TensorFlow 2
    - <Notebook> DAM -> 85.3% Testing Accuracy
    The accuracy of this implementation is higher than UCL MR Group (84.6%)
- Model: Match Pyramid
  - TensorFlow 2
    - <Notebook> Match Pyramid -> 85.9% Testing Accuracy
    - <Notebook> Match Pyramid + Multiway Attention -> 87.1% Testing Accuracy
    The accuracy of this model is 0.3% below ESIM, however the speed is 1x faster than ESIM
- Model: ESIM
  - TensorFlow 2
    - <Notebook> ESIM -> 87.4% Testing Accuracy
    The accuracy of this implementation is sligntly higher than UCL MR Group (87.2%)
  - TensorFlow 1
    - <Notebook> ESIM -> 87.4% Testing Accuracy
      - Inference
    - <Notebook> ESIM + ELMO Embedding -> 88.1% Testing Accuracy
- Model: RE2
  - TensorFlow 2
    - <Notebook> 2-layer RE2 -> 87.7% Testing Accuracy
    - <Notebook> 3-layer RE2 -> 88.0% Testing Accuracy

Topic Modelling

Data: Some Book Titles
- Model: TF-IDF + LDA
  - PySpark
    - <Notebook> TF + IDF + LDA
  - Sklearn + pyLDAvis
    - <Notebook> TF + IDF + LDA + Augmented Visualization

Spoken Language Understanding

└── finch/tensorflow2/spoken_language_understanding/atis
	│
	├── data
	│   └── glove.840B.300d.txt           # pretrained embedding, download and put here
	│   └── make_data.ipynb               # step 1. run this to generate vocab: word.txt, intent.txt, slot.txt 
	│   └── atis.train.w-intent.iob       # incomplete sample, format <text, slot, intent>
	│   └── atis.test.w-intent.iob        # incomplete sample, format <text, slot, intent>
	│
	├── vocab
	│   └── word.txt                      # list of words in vocabulary
	│   └── intent.txt                    # list of intents in vocabulary
	│   └── slot.txt                      # list of slots in vocabulary
	│	
	└── main              
		└── bigru.ipynb               # step 2. train and evaluate model
		└── bigru_self_attn.ipynb     # step 2. train and evaluate model
		└── transformer.ipynb         # step 2. train and evaluate model
		└── transformer_elu.ipynb     # step 2. train and evaluate model

Task: ATIS
- <Text File>: Data Example
- <Notebook>: Make Vocabulary
  - <Text File>: Vocabulary Example
- Model: Bi-directional RNN
  - TensorFlow 2
    - <Notebook> Bi-GRU
      
      97.8% Intent F1, 95.5% Slot F1 on Testing Data
  - TensorFlow 1
    - <Notebook> Bi-GRU + CRF
      
      97.2% Intent F1, 95.7% Slot F1 on Testing Data
- Model: Transformer
  - TensorFlow 2
    - <Notebook> Transformer
      
      97.5% Intent F1, 94.9% Slot F1 on Testing Data
    - <Notebook> Transformer + ELU activation
      
      97.2% Intent F1, 95.5% Slot F1 on Testing Data
    - <Notebook> Bi-GRU + Transformer
      
      97.7% Intent F1, 95.8% Slot F1 on Testing Data
- Model: ELMO Embedding
  - TensorFlow 1
    - <Notebook> ELMO (the first LSTM hidden state) + Bi-GRU
      
      97.6% Intent F1, 96.2% Slot F1 on Testing Data
    - <Notebook> ELMO (weighted sum of 3 layers) + Bi-GRU
      
      97.6% Intent F1, 96.1% Slot F1 on Testing Data

Generative Dialog

Task: A Chinese Dataset（小黄鸡语料）
- Model: RNN Seq2Seq
  - TensorFlow 1
    - <Notebook> Training
      
      GRU + Bahdanau Attention + Luong Attention + Beam Search
    - <Notebook> Demo

Semantic Parsing

└── finch/tensorflow2/semantic_parsing/tree_slu
	│
	├── data
	│   └── glove.840B.300d.txt     	# pretrained embedding, download and put here
	│   └── make_data.ipynb           	# step 1. run this to generate vocab: word.txt, intent.txt, slot.txt 
	│   └── train.tsv   		  	# incomplete sample, format <text, tokenized_text, tree>
	│   └── test.tsv    		  	# incomplete sample, format <text, tokenized_text, tree>
	│
	├── vocab
	│   └── source.txt                	# list of words in vocabulary for source (of seq2seq)
	│   └── target.txt                	# list of words in vocabulary for target (of seq2seq)
	│	
	└── main
		└── gru_seq2seq.ipynb           # step 2. train and evaluate model
		└── lstm_seq2seq.ipynb          # step 2. train and evaluate model

Task: Semantic Parsing for Task Oriented Dialog
- <Text File>: Data Example
- <Notebook>: Make Vocabulary
  - <Text File>: Vocabulary Example
- Model: RNN Seq2Seq
  - TensorFlow 2
    - <Notebook> GRU + Bahdanau Attention -> 72.9% Exact Match Accuracy on Testing Data
    - <Notebook> LSTM + Bahdanau Attention -> 72.2% Exact Match Accuracy on Testing Data
  - TensorFlow 1
    - <Notebook 1> <Notebook 2> ELMO + GRU + Bahdanau Attention + Luong Attention + Beam Search
      
      -> 74.5% Exact Match Accuracy on Testing Data

Knowledge Graph Completion

└── finch/tensorflow2/knowledge_graph_completion/wn18
	│
	├── data
	│   └── download_data.ipynb       	# step 1. run this to download wn18 dataset
	│   └── make_data.ipynb           	# step 2. run this to generate vocabulary: entity.txt, relation.txt
	│   └── wn18  		          	# wn18 folder (will be auto created by download_data.ipynb)
	│   	└── train.txt  		  	# incomplete sample, format <entity1, relation, entity2> separated by \t
	│   	└── valid.txt  		  	# incomplete sample, format <entity1, relation, entity2> separated by \t 
	│   	└── test.txt   		  	# incomplete sample, format <entity1, relation, entity2> separated by \t
	│
	├── vocab
	│   └── entity.txt                  	# incomplete sample, list of entities in vocabulary
	│   └── relation.txt                	# incomplete sample, list of relations in vocabulary
	│	
	└── main              
		└── distmult_1-N.ipynb    	# step 3. train and evaluate model

Task: WN18
- <Notebook>: Download Data
  - <Text File>: Data Example
- <Notebook>: Make Vocabulary
  - <Text File>: Vocabulary Example
- Model: DistMult + 1-N Fast Evaluation
  
  MRR: Mean Reciprocal Rank
  - TensorFlow 2
    - <Notebook> DistMult -> 81.0% MRR on Testing Data
  - TensorFlow 1
    - <Notebook> DistMult -> 79.2% MRR on Testing Data
    - Inference
- Model: ComplEx + 1-N Fast Evaluation
  - TensorFlow 2
    - <Notebook> ComplEx -> 94.2% MRR on Testing Data

Knowledge Graph Retrieval

Task: WN18
- SPARQL

Question Answering

└── finch/tensorflow1/question_answering/babi
	│
	├── data
	│   └── make_data.ipynb           		# step 1. run this to generate vocabulary: word.txt 
	│   └── qa5_three-arg-relations_train.txt       # one complete example of babi dataset
	│   └── qa5_three-arg-relations_test.txt	# one complete example of babi dataset
	│
	├── vocab
	│   └── word.txt                  		# complete list of words in vocabulary
	│	
	└── main              
		└── dmn_train.ipynb
		└── dmn_serve.ipynb
		└── attn_gru_cell.py

Task: bAbI
- <Text File>: Data Example
- <Notebook>: Make Vocabulary
- Model: Dynamic Memory Network
  - TensorFlow 1
    - <Notebook> DMN -> 99.4% Testing Accuracy
    - Inference

Text Preprocessing

Slot Extraction
- Chinese
  - <Notebook>: Regex

Recommender System

└── finch/tensorflow1/recommender/movielens
	│
	├── data
	│   └── make_data.ipynb           		# run this to generate vocabulary
	│
	├── vocab
	│   └── user_job.txt
	│   └── user_id.txt
	│   └── user_gender.txt
	│   └── user_age.txt
	│   └── movie_types.txt
	│   └── movie_title.txt
	│   └── movie_id.txt
	│	
	└── main              
		└── dnn_softmax.ipynb
		└── dnn_mse.ipynb

Task: Movielens 1M
- <Notebook>: Make Vocabulary
  - <Text File>: Data Example
- Model: Fusion
  - TensorFlow 1
    
    MAE: Mean Absolute Error
    - <Notebook> Fusion + Regression Loss -> 0.6618 Testing MAE
    - <Notebook> Fusion + Classification Loss -> 0.6320 Testing MAE

Name		Name	Last commit message	Last commit date
Latest commit History 171 Commits
finch		finch
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Contents

Text Classification

Text Matching

Topic Modelling

Spoken Language Understanding

Generative Dialog

Semantic Parsing

Knowledge Graph Completion

Knowledge Graph Retrieval

Question Answering

Text Preprocessing

Recommender System

About

Uh oh!

Releases

Packages

Languages

License

ai-learn-use/tensorflow-nlp

Folders and files

Latest commit

History

Repository files navigation

Contents

Text Classification

Text Matching

Topic Modelling

Spoken Language Understanding

Generative Dialog

Semantic Parsing

Knowledge Graph Completion

Knowledge Graph Retrieval

Question Answering

Text Preprocessing

Recommender System

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages