EricDoug
diff --git a/‎textsum/BUILD‎
Lines changed: 64 additions & 0 deletions b/‎textsum/BUILD‎
Lines changed: 64 additions & 0 deletions
diff --git a/‎textsum/README.md‎
Lines changed: 162 additions & 0 deletions b/‎textsum/README.md‎
Lines changed: 162 additions & 0 deletions
@@ -0,0 +1,64 @@
+package(default_visibility = [":internal"])
+
+licenses(["notice"])  # Apache 2.0
+
+exports_files(["LICENSE"])
+
+package_group(
+    name = "internal",
+    packages = [
+        "//textsum/...",
+    ],
+)
+
+py_library(
+    name = "seq2seq_attention_model",
+    srcs = ["seq2seq_attention_model.py"],
+    deps = [
+        ":seq2seq_lib",
+    ],
+)
+
+py_library(
+    name = "seq2seq_lib",
+    srcs = ["seq2seq_lib.py"],
+)
+
+py_binary(
+    name = "seq2seq_attention",
+    srcs = ["seq2seq_attention.py"],
+    deps = [
+        ":batch_reader",
+        ":data",
+        ":seq2seq_attention_decode",
+        ":seq2seq_attention_model",
+    ],
+)
+
+py_library(
+    name = "batch_reader",
+    srcs = ["batch_reader.py"],
+    deps = [
+        ":data",
+        ":seq2seq_attention_model",
+    ],
+)
+
+py_library(
+    name = "beam_search",
+    srcs = ["beam_search.py"],
+)
+
+py_library(
+    name = "seq2seq_attention_decode",
+    srcs = ["seq2seq_attention_decode.py"],
+    deps = [
+        ":beam_search",
+        ":data",
+    ],
+)
+
+py_library(
+    name = "data",
+    srcs = ["data.py"],
+)
@@ -0,0 +1,162 @@
+Sequence-to-Sequence with Attention Model for Text Summarization.
+
+Authors:
+
+Xin Pan ([email protected], github:panyx0718), Peter Liu ([email protected])
+
+<b>Introduction</b>
+
+The core model is the traditional seqeuence-to-sequence model with attention.
+It is customized (mostly inputs/outputs) for the text summarization task. The
+model has been trained on Gigaword dataset and achieved state-of-the-art
+results (as of June 2016).
+
+The results described below are based on model trained on multi-gpu and
+multi-machine settings. It has been simplified to run on only one machine
+for open source purpose.
+
+<b>DataSet</b>
+
+We used the Gigaword dataset described in
+https://arxiv.org/pdf/1602.06023.pdf
+
+We cannot provide the dataset due to the license. See ExampleGen in data.py
+about the data format. data/data contains a toy example. Also see data/vocab
+for example vocabulary format. In <b>How To Run</b> below, users can use toy
+data and vocab provided in the data/ directory to run the training by replacing
+the data directory flag.
+
+<b>Experiment Result</b>
+
+8000 examples from testset are sampled to generate summaries and rouge score is
+calculated for the generated summaries. Here is the best rouge score on
+Gigaword dataset:
+
+ROUGE-1 Average_R: 0.38272 (95%-conf.int. 0.37774 - 0.38755)
+
+ROUGE-1 Average_P: 0.50154 (95%-conf.int. 0.49509 - 0.50780)
+
+ROUGE-1 Average_F: 0.42568 (95%-conf.int. 0.42016 - 0.43099)
+
+ROUGE-2 Average_R: 0.20576 (95%-conf.int. 0.20060 - 0.21112)
+
+ROUGE-2 Average_P: 0.27565 (95%-conf.int. 0.26851 - 0.28257)
+
+ROUGE-2 Average_F: 0.23126 (95%-conf.int. 0.22539 - 0.23708)
+
+<b>Configuration:</b>
+
+Following is the configuration for the best trained model on Gigaword:
+
+batch_size: 64
+
+bidirectional encoding layer: 4
+
+article length: first 2 sentences, total words within 120.
+
+summary length: total words within 30.
+
+word embedding size: 128
+
+LSTM hidden units: 256
+
+Sampled softmax: 4096
+
+vocabulary size: Most frequent 200k words from dataset's article and summaries.
+
+<b>How To Run</b>
+
+Pre-requesite:
+
+Install TensorFlow and Bazel.
+
+```shell
+# cd to your workspace
+# clone the code to your workspace and create empty WORKSPACE file.
+# move the data to your workspace. If don't have full dataset yet, copy
+# the toy data from the data/ directory from code directory and rename
+# the files.
+ls -R
+.:
+data  textsum  WORKSPACE
+
+./data:
+vocab  test-0  training-0  training-1  validation-0 ...(omitted)
+
+./textsum:
+batch_reader.py       beam_search.py       BUILD    README.md                    seq2seq_attention_model.py  data
+data.py  seq2seq_attention_decode.py  seq2seq_attention.py        seq2seq_lib.py
+
+./textsum/data:
+data  vocab
+
+bazel build -c opt --config=cuda textsum/...
+
+# Run the training.
+bazel-bin/textsum/seq2seq_attention \
+  --mode=train \
+  --article_key=article \
+  --abstract_key=abstract \
+  --data_path=data/training-* \
+  --vocab_path=data/vocab \
+  --log_root=textsum/log_root \
+  --train_dir=textsum/log_root/train
+
+# Run the eval. Try to avoid running on the same matchine as training.
+bazel-bin/textsum/seq2seq_attention \
+  --mode=eval \
+  --article_key=article \
+  --abstract_key=abstract \
+  --data_path=data/validation-* \
+  --vocab_path=data/vocab \
+  --log_root=textsum/log_root \
+  --eval_dir=textsum/log_root/eval
+
+# Run the decode. Run it when the most is mostly converged.
+bazel-bin/textsum/seq2seq_attention \
+  --mode=decode \
+  --article_key=article \
+  --abstract_key=abstract \
+  --data_path=data/test-* \
+  --vocab_path=data/vocab \
+  --log_root=textsum/log_root \
+  --decode_dir=textsum/log_root/decode \
+  --beam_size=8
+```
+
+
+<b>Examples:</b>
+
+The following are some text summarization examples, including experiments
+using dataset other than Gigaword.
+
+article: novell inc. chief executive officer eric schmidt has been named chairman of the internet search-engine company google .
+
+human: novell ceo named google chairman
+
+machine:  novell chief executive named to head internet company
+
+======================================
+
+article: gulf newspapers voiced skepticism thursday over whether newly re - elected us president bill clinton could help revive the troubled middle east peace process but saw a glimmer of hope .
+
+human: gulf skeptical about whether clinton will revive peace process
+
+machine:  gulf press skeptical over clinton 's prospects for peace process
+
+======================================
+
+article:  the european court of justice ( ecj ) recently ruled in lock v british gas trading ltd that eu law requires a worker 's statutory holiday pay to take commission payments into account - it should not be based solely on basic salary . the case is not over yet , but its outcome could potentially be costly for employers with workers who are entitled to commission . mr lock , an energy salesman for british gas , was paid a basic salary and sales commission on a monthly basis . his sales commission made up around 60 % of his remuneration package . when he took two weeks ' annual leave in december 2012 , he was paid his basic salary and also received commission from previous sales that fell due during that period . lock obviously did not generate new sales while he was on holiday , which meant that in the following period he suffered a reduced income through lack of commission . he brought an employment tribunal claim asserting that this amounted to a breach of the working time regulations 1998 .....deleted rest for readability...
+
+abstract: will british gas ecj ruling fuel holiday pay hike ?
+
+decode: eu law requires worker 's statutory holiday pay 
+
+======================================
+
+article:  the junior all whites have been eliminated from the fifa u - 20 world cup in colombia with results on the final day of pool play confirming their exit . sitting on two points , new zealand needed results in one of the final two groups to go their way to join the last 16 as one of the four best third place teams . but while spain helped the kiwis ' cause with a 5 - 1 thrashing of australia , a 3 - 0 win for ecuador over costa rica saw the south americans climb to second in group c with costa rica 's three points also good enough to progress in third place . that left the junior all whites hopes hanging on the group d encounter between croatia and honduras finishing in a draw . a stalemate - and a place in the knockout stages for new zealand - appeared on the cards until midfielder marvin ceballos netted an 81st minute winner that sent guatemala through to the second round and left the junior all whites packing their bags . new zealand finishes the 24 - nation tournament in 17th place , having claimed their first ever points at this level in just their second appearance at the finals .
+
+abstract: junior all whites exit world cup
+
+decoded:  junior all whites eliminated from u- 20 world cup
+