Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Latest commit

Ā 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Ā 
Ā 

Probe Task

  1. BERT Rediscovers the Classical NLP Pipeline [ACL 2019] Ian Tenney, Dipanjan Das, Ellie Pavlick.
    • Scalar Mixing Weights, which layers more important?
    • Cumulative Scoring, how many layer need in that task?
  2. Language Models as Knowledge Bases? [EMNLP 2019] Fabio Petroni, Tim RocktƤschel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander H. Miller, Sebastian Riedel.
    • Bert contain relational knowledge, even if without fine-tune.
    • But the experimental can not verify this. Because of the Google-RE and T-REx are both part of Wikipedia which is the train set of BERT.
    • maybe is co-occurrence patterns.
    • the output of BERT is bigger, the more likely to be correct.
    • by using pearson correlation coefficient, to explain the co-occurrence.
    • ELMO is more like to BERT, even if the train set have no wikipedia.
  3. Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference [ACL 2019] R. Thomas McCoy, Ellie Pavlick, Tal Linzen.
    • BERT not good at some anti-heuristics samples, like:
      • lexical overlap
      • subsequence
      • constituent
    • proposal an data set which have many anti-heuristics samples, which called as HANS.
  4. Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data [ACL 2020] Emily M. Bender, Alexander Koller
    • They say Pre-trained don't have the ability to understand the meaning of language.
    • The ability of understanding language two part: meaning + linguistic form.
    • In my opinion, memory is one part of (need big size, like our daily experiments), and co-occurrence is only to ensure the grammar.
    • For domain-specific, the co-occurrence is very important, especially for the entity phrases.
    • For other hand, memory with the specific-topic maybe more important.
      • Does it can work?
  5. A Primer in BERTology: What we know about how BERT works [-] Anna Rogers, Olga Kovaleva, Anna Rumshisky.
    • What knowledge does BERT have?
      • BERT representations are hierarchical rather than linear.
      • BERT embeddings encode information about parts of speech, syntactic chunks and roles.
      • syntactic structure is not directly encoded in self-attention weights, but they can be transformed to reflect it.
      • BERT takes subject-predicate agreement into account when performing the cloze task.
      • BERT is better able to detect the presence of NPIs (e.g. ā€everā€) and the words that allow their use (e.g. ā€whetherā€) than scope violations.
      • BERT does not ā€œunderstandā€ negation and is insensitive to malformed input.
      • BERT’s encoding of syntactic structure does not indicate that it actually relies on that knowledge.
    • Semantic knowledge
      • BERT has some knowledge for semantic roles
      • BERT encodes information about entity types, relations, semantic roles, and proto-roles,
      • BERT struggles with representations of numbers
      • for some relation types, vanilla BERT is competitive with methods relying on knowledge bases
      • BERT cannot reason based on its world knowledge.
    • Localizing linguistic knowledge
      • most selfattention heads do not directly encode any nontrivial linguistic information,
      • Some BERT heads seem to specialize in certain types of syntactic relations.
      • no single head has the complete syntactic tree information.
      • attention weights are weak indicators of subjectverb agreement and reflexive anafora.
      • even when attention heads specialize in tracking semantic relations, they do not necessarily contribute to BERT’s performance on relevant tasks.
      • lower layers have the most linear word order information.
      • syntactic information is the most prominent in the middle BERT 3 layers.
      • conflicting evidence about syntactic chunks.
      • The final layers of BERT are the most taskspecific.
      • semantics is spread across the entire model
    • Training BERT
      • alternative training objectives
    • Future
      • Benchmarks that require verbal reasoning.
      • Developing methods to ā€œteachā€ reasoning.
      • Learning what happens at inference time.
  6. Intermediate-Task Transfer Learning with Pretrained Models for Natural Language Understanding: When and Why Does It Work? [ACL 2020] Yada Pruksachatkun, Jason Phang, Haokun Liu, Phu Mon Htut, Xiaoyi Zhang, Richard Yuanzhe Pang, Clara Vania, Katharina Kann, Samuel R. Bowman.
    • Test which intermediate task good for downstream task.
    • Do 10 intermediate * 11 downstream task which contains finetune and probing.
    • And calculate the Correlations matrix.