Thanks to visit codestin.com
Credit goes to github.com

Skip to content

aJupyter/Reliable-LLM

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reliable LLM: A Framework of Mitigating Hallucination regarding Knowledge and Uncertainty


Awesome Made With Love License github license

The project demonstrates the background and collects research works about LLM uncertainty & confidence and calibration by systematically clustering in various directions and methods for reliable AI development.

Welcome to participate in this project to share interesting papers, exchange your great ideas!

Outline

👻 Hallucination

Definition

The definitions of hallucination vary and depend on specific tasks. This project focuses on hallucination issues in knowledge-intensive tasks (closed-book QA, dialogue, RAG, commonsense reasoning, translation, etc.), where hallucinations refer to the non-factual, incorrect knowledge in generations unfaithful with world knowledge.

Causes

The causes of hallucinations vary in unfiltered incorrect statements in pertaining data, limited input length of model architecture, maximum likelihood training strategy, and diverse decoding strategies.

Architectures and input lengths, pertaining data and strategy of released LLMs are fixed. Tracing incorrect texts in substantial pertaining data is challenging. This project mainly focuses on detecting hallucinations by tracing what LLMs learn in the pertaining stage and mitigating hallucinations in fine-tuning and decoding.

Address

Comparing open-generation tasks, knowledge-intensive tasks have specific grounding-truth reference - world knowledge. Therefore, we can estimate the knowledge boundary map of an LLM to specify what it knows. It is crucial to ensure the certainty level or honesty of LLMs to a piece of factual knowledge for hallucination detection (from grey area to green area).

📓 Knowledge

The above diagram can roughly and simply represent the knowledge boundary. However, in reality, like humans, for much knowledge, we exist in a state of uncertainty, rather than only in a state of knowing or not knowing. Moreover, maximum likelihood prediction in pertaining makes LLMs be prone to generate over-confident responses. Even if the LLM knows a fact, how to make LLMs accurately tell what they know is also important.

This adds complexity to determining the knowledge boundary, which leads to two challenging questions:

  1. How to accurately perceive (Perception) the knowledge boundary?

    (Example: Given a question, such as "What is the capital of France?", the model is required to provide its confidence level for this question.)

  2. How to accurately express (Expression) knowledge where the boundary is somewhat vague? (Previous work U2Align is a method to enhance expressions. Current interests for the second stage “expression” also lie in “alignment” methods.)

    (Example: If the confidence level for answering "Paris" to the above question is 40%, should the model refuse to answer or provide a response in this situation?)

Related Works of LLM Knowledge

Known-Unknown

Title Conference/Journal Notes
Knowledge of Knowledge: Exploring Known-Unknowns Uncertainty with Large Language Models prePrint [Link]
Can AI Assistants Know What They Don’t Know? prePrint [Link]

🤔 Uncertainty

Traditional Model Calibration

  • Models are prone to be over-confident in predictions using maximizing likelihood (MLE) training, it is crucial to identify the confidence score or uncertainty estimation for reliable AI applications.
  • A model is considered well-calibrated if the confidence score of predictions (SoftMax probability) are well-aligned with the actual probability of answers being correct.
  • Expected Calibration Error (ECE) and Reliability Diagram is used to measure the calibration performance.

Uncalibrated (left), over-confident (mid) and well-calibrated (right) models.

Uncertainty Estimation of Generative Models

  • To calibrate generative LLMs, we should quantify the confidence & uncertainty on generated sentences.
  • Uncertainty: Categorized into aleatoric (data) and epistemic (model) uncertainty. Frequently measured by the entropy of the prediction to indicate the dispersion of the model prediction.
  • Confidence: Generally associated with both the input and the prediction.
  • The terms uncertainty and confidence are often used interchangeably.

Although the knowledge boundary is important for knowledge-intensive tasks, there are no specific definitions or concepts in previous works. Current methods for estimating knowledge boundaries refer to confidence/uncertainty estimation methods including ① logit-based methods using token-level probabilities; ② prompt-based methods to make LLMs express confidence in words; ③ sampling-based methods to calculate consistency; and ④ training-based methods to learn the ability to express uncertainty.

Related Works of Uncertainty & Confidence & Calibration

Survey & Investigation

Title Conference/Journal Notes
A Survey of Confidence Estimation and Calibration in Large Language Models prePrint [Link]
Uncertainty Quantification with Pre-trained Language Models: A Large-Scale Empirical Analysis EMNLP 2022 [Link]
Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach prePrint [Link]
Confidence Under the Hood: An Investigation into the Confidence-Probability Alignment in Large Language Models prePrint [Link]
Large Language Models Must Be Taught to Know What They Don’t Know prePrint [Link]

Uncertainty Quantification

Title Conference/Journal Notes
Language Models (Mostly) Know What They Know prePrint [Link]
Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation ICLR 2023 [Link]
Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models prePrint [Link]
When Quantization Affects Confidence of Large Language Models? prePrint [Link]
Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs ICLR 2024 [Link]
Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities prePrint [Link]
Semantically Diverse Language Generation for Uncertainty Estimation in Language Models prePrint [Link]
Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models prePrint [Link]

Linguistic Uncertainty Expressions

Title Conference/Journal Notes
Navigating the Grey Area: Expressions of Overconfidence and Uncertainty in Language Models EMNLP 2023 [Link]
Teaching Models to Express Their Uncertainty in Words TMLR 2022 [Link]
Relying on the Unreliable: The Impact of Language Models’ Reluctance to Express Uncertainty prePrint [Link]
"I'm Not Sure, But...": Examining the Impact of Large Language Models' Uncertainty Expression on User Reliance and Trust FAccT 2024 [Link]
Can Large Language Models Faithfully Express Their Intrinsic Uncertainty in Words? prePrint [Link]

Confidence Expressions Improvements

This part of works focus on improving confidence expressions of LLMs in a two-stage form by 1) self-prompting LLMs to generate responses to queries and then collecting the samples to construct a dataset with specific features, and 2) fine-tuning LLMs on the collected dataset to improve the specific capability of LLMs.

Title Conference/Journal Notes
Enhancing Confidence Expression in Large Language Models Through Learning from Past Experience prePrint [Link]
Improving the Reliability of Large Language Models by Leveraging Uncertainty-Aware In-Context Learning prePrint [Link]
Uncertainty in Language Models: Assessment through Rank-Calibration prePrint [Link]
SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales prePrint [Link]
Linguistic Calibration of Language Models prePrint [Link]
R-Tuning: Instructing Large Language Models to Say ‘I Don’t Know’ prePrint [Link]

Hallucination Detection by Uncertainty

Title Conference/Journal Notes
On Hallucination and Predictive Uncertainty in Conditional Language Generation EACL 2021 [Link]
Learning Confidence for Transformer-based Neural Machine Translation ACL 2022 [Link]
Towards Reliable Misinformation Mitigation: Generalization, Uncertainty, and GPT-4 EMNLP 2023 [Note]
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models EMNLP 2023 [Note]
Detecting Hallucinations in Large Language Models using Semantic Entropy Nature [Link]
LLM Internal States Reveal Hallucination Risk Faced With a Query prePrint [Link]

Factuality Improvements by Confidence

Title Conference/Journal Notes
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model NeurIPS 2023 [Link]
When to Trust LLMs: Aligning Confidence with Response Quality prePrint [Link]
When to Trust LLMs: Aligning Confidence with Response Quality prePrint [Link]
Uncertainty Aware Learning for Language Model Alignment ACL 2024 [Link]

Generative Model Calibration

Title Conference/Journal Notes
Reducing Conversational Agents’ Overconfidence Through Linguistic Calibration TACL 2022 [Link]
Preserving Pre-trained Features Helps Calibrate Fine-tuned Language Models ICLR 2023 [Link]
Calibrating the Confidence of Large Language Models by Eliciting Fidelity prePrint [Link]
Few-Shot Recalibration of Language Models prePrint [Link]
How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering TACL 2022 [Link]
Knowing More About Questions Can Help: Improving Calibration in Question Answering ACL 2021 Findings [Link]
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback EMNLP 2023 [Link]
Re-Examining Calibration: The Case of Question Answering TACL 2021 [Link]
Calibrating Large Language Models Using Their Generations Only prePrint [Link]
Calibrating Large Language Models with Sample Consistency prePrint [Link]
Linguistic Calibration of Language Models prePrint [Link]

🔭 Future Directions

  1. More advanced methods to assist LLMs hallucination detection and human decisions. (A new paradigm)
  2. Confidence estimation for long-term generations like code, novel, etc. (Benchmark)
  3. Learning to explain and clarify its confidence estimation and calibration. (Natural language)
  4. Calibration on human variation (Misalignment between LM measures and human disagreement).
  5. Confidence estimation and calibration for multi-modal LLMs.

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 63.5%
  • HTML 35.0%
  • CSS 1.5%