Probe

Probe Task

BERT Rediscovers the Classical NLP Pipeline [ACL 2019] Ian Tenney, Dipanjan Das, Ellie Pavlick.
- Scalar Mixing Weights, which layers more important?
- Cumulative Scoring, how many layer need in that task?
Language Models as Knowledge Bases? [EMNLP 2019] Fabio Petroni, Tim Rocktäschel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander H. Miller, Sebastian Riedel.
- Bert contain relational knowledge, even if without fine-tune.
- But the experimental can not verify this. Because of the Google-RE and T-REx are both part of Wikipedia which is the train set of BERT.
- maybe is co-occurrence patterns.
- the output of BERT is bigger, the more likely to be correct.
- by using pearson correlation coefficient, to explain the co-occurrence.
- ELMO is more like to BERT, even if the train set have no wikipedia.
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference [ACL 2019] R. Thomas McCoy, Ellie Pavlick, Tal Linzen.
- BERT not good at some anti-heuristics samples, like:
  - lexical overlap
  - subsequence
  - constituent
- proposal an data set which have many anti-heuristics samples, which called as HANS.
Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data [ACL 2020] Emily M. Bender, Alexander Koller
- They say Pre-trained don't have the ability to understand the meaning of language.
- The ability of understanding language two part: meaning + linguistic form.
- In my opinion, memory is one part of (need big size, like our daily experiments), and co-occurrence is only to ensure the grammar.
- For domain-specific, the co-occurrence is very important, especially for the entity phrases.
- For other hand, memory with the specific-topic maybe more important.
  - Does it can work?
A Primer in BERTology: What we know about how BERT works [-] Anna Rogers, Olga Kovaleva, Anna Rumshisky.
- What knowledge does BERT have?
  - BERT representations are hierarchical rather than linear.
  - BERT embeddings encode information about parts of speech, syntactic chunks and roles.
  - syntactic structure is not directly encoded in self-attention weights, but they can be transformed to reﬂect it.
  - BERT takes subject-predicate agreement into account when performing the cloze task.
  - BERT is better able to detect the presence of NPIs (e.g. ”ever”) and the words that allow their use (e.g. ”whether”) than scope violations.
  - BERT does not “understand” negation and is insensitive to malformed input.
  - BERT’s encoding of syntactic structure does not indicate that it actually relies on that knowledge.
- Semantic knowledge
  - BERT has some knowledge for semantic roles
  - BERT encodes information about entity types, relations, semantic roles, and proto-roles,
  - BERT struggles with representations of numbers
  - for some relation types, vanilla BERT is competitive with methods relying on knowledge bases
  - BERT cannot reason based on its world knowledge.
- Localizing linguistic knowledge
  - most selfattention heads do not directly encode any nontrivial linguistic information,
  - Some BERT heads seem to specialize in certain types of syntactic relations.
  - no single head has the complete syntactic tree information.
  - attention weights are weak indicators of subjectverb agreement and reﬂexive anafora.
  - even when attention heads specialize in tracking semantic relations, they do not necessarily contribute to BERT’s performance on relevant tasks.
  - lower layers have the most linear word order information.
  - syntactic information is the most prominent in the middle BERT 3 layers.
  - conﬂicting evidence about syntactic chunks.
  - The ﬁnal layers of BERT are the most taskspeciﬁc.
  - semantics is spread across the entire model
- Training BERT
  - alternative training objectives
- Future
  - Benchmarks that require verbal reasoning.
  - Developing methods to “teach” reasoning.
  - Learning what happens at inference time.
Intermediate-Task Transfer Learning with Pretrained Models for Natural Language Understanding: When and Why Does It Work? [ACL 2020] Yada Pruksachatkun, Jason Phang, Haokun Liu, Phu Mon Htut, Xiaoyi Zhang, Richard Yuanzhe Pang, Clara Vania, Katharina Kann, Samuel R. Bowman.
- Test which intermediate task good for downstream task.
- Do 10 intermediate * 11 downstream task which contains finetune and probing.
- And calculate the Correlations matrix.

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Probe Task

FilesExpand file tree

Probe

Directory actions

More options

Directory actions

More options

Latest commit

History

Probe

Folders and files

parent directory

README.md

Probe Task