Reliable LLM: A Framework of Mitigating Hallucination regarding Knowledge and Uncertainty

The project demonstrates the background and collects research works about LLM uncertainty & confidence and calibration by systematically clustering in various directions and methods for reliable AI development.

Welcome to participate in this project to share interesting papers, exchange your great ideas!

Outline

👻 Hallucination

Definition

The definitions of hallucination vary and depend on specific tasks. This project focuses on hallucination issues in knowledge-intensive tasks (closed-book QA, dialogue, RAG, commonsense reasoning, translation, etc.), where hallucinations refer to the non-factual, incorrect knowledge in generations unfaithful with world knowledge.

Causes

The causes of hallucinations vary in unfiltered incorrect statements in pertaining data, limited input length of model architecture, maximum likelihood training strategy, and diverse decoding strategies.

Architectures and input lengths, pertaining data and strategy of released LLMs are fixed. Tracing incorrect texts in substantial pertaining data is challenging. This project mainly focuses on detecting hallucinations by tracing what LLMs learn in the pertaining stage and mitigating hallucinations in fine-tuning and decoding.

Address

Comparing open-generation tasks, knowledge-intensive tasks have specific grounding-truth reference - world knowledge. Therefore, we can estimate the knowledge boundary map of an LLM to specify what it knows. It is crucial to ensure the certainty level or honesty of LLMs to a piece of factual knowledge for hallucination detection (from grey area to green area).

📓 Knowledge

The above diagram can roughly and simply represent the knowledge boundary. However, in reality, like humans, for much knowledge, we exist in a state of uncertainty, rather than only in a state of knowing or not knowing. Moreover, maximum likelihood prediction in pertaining makes LLMs be prone to generate over-confident responses. Even if the LLM knows a fact, how to make LLMs accurately tell what they know is also important.

This adds complexity to determining the knowledge boundary, which leads to two challenging questions:

How to accurately perceive (Perception) the knowledge boundary?

(Example: Given a question, such as "What is the capital of France?", the model is required to provide its confidence level for this question.)
How to accurately express (Expression) knowledge where the boundary is somewhat vague? (Previous work U2Align is a method to enhance expressions. Current interests for the second stage “expression” also lie in “alignment” methods.)

(Example: If the confidence level for answering "Paris" to the above question is 40%, should the model refuse to answer or provide a response in this situation?)

Related Works of LLM Knowledge

Known-Unknown

Title	Conference/Journal	Notes
Knowledge of Knowledge: Exploring Known-Unknowns Uncertainty with Large Language Models	prePrint	[Link]
Can AI Assistants Know What They Don’t Know?	prePrint	[Link]

🤔 Uncertainty

Traditional Model Calibration

Models are prone to be over-confident in predictions using maximizing likelihood (MLE) training, it is crucial to identify the confidence score or uncertainty estimation for reliable AI applications.
A model is considered well-calibrated if the confidence score of predictions (SoftMax probability) are well-aligned with the actual probability of answers being correct.
Expected Calibration Error (ECE) and Reliability Diagram is used to measure the calibration performance.

Uncalibrated (left), over-confident (mid) and well-calibrated (right) models.

Uncertainty Estimation of Generative Models

To calibrate generative LLMs, we should quantify the confidence & uncertainty on generated sentences.
Uncertainty: Categorized into aleatoric (data) and epistemic (model) uncertainty. Frequently measured by the entropy of the prediction to indicate the dispersion of the model prediction.
Confidence: Generally associated with both the input and the prediction.
The terms uncertainty and confidence are often used interchangeably.

Although the knowledge boundary is important for knowledge-intensive tasks, there are no specific definitions or concepts in previous works. Current methods for estimating knowledge boundaries refer to confidence/uncertainty estimation methods including ① logit-based methods using token-level probabilities; ② prompt-based methods to make LLMs express confidence in words; ③ sampling-based methods to calculate consistency; and ④ training-based methods to learn the ability to express uncertainty.

Related Works of Uncertainty & Confidence & Calibration

Survey & Investigation

Title	Conference/Journal	Notes
A Survey of Confidence Estimation and Calibration in Large Language Models	prePrint	[Link]
Uncertainty Quantification with Pre-trained Language Models: A Large-Scale Empirical Analysis	EMNLP 2022	[Link]
Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach	prePrint	[Link]
Confidence Under the Hood: An Investigation into the Confidence-Probability Alignment in Large Language Models	prePrint	[Link]
Large Language Models Must Be Taught to Know What They Don’t Know	prePrint	[Link]

Uncertainty Quantification

Title	Conference/Journal	Notes
Language Models (Mostly) Know What They Know	prePrint	[Link]
Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation	ICLR 2023	[Link]
Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models	prePrint	[Link]
When Quantization Affects Confidence of Large Language Models?	prePrint	[Link]
Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs	ICLR 2024	[Link]
Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities	prePrint	[Link]
Semantically Diverse Language Generation for Uncertainty Estimation in Language Models	prePrint	[Link]
Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models	prePrint	[Link]

Linguistic Uncertainty Expressions

Title	Conference/Journal	Notes
Navigating the Grey Area: Expressions of Overconfidence and Uncertainty in Language Models	EMNLP 2023	[Link]
Teaching Models to Express Their Uncertainty in Words	TMLR 2022	[Link]
Relying on the Unreliable: The Impact of Language Models’ Reluctance to Express Uncertainty	prePrint	[Link]
"I'm Not Sure, But...": Examining the Impact of Large Language Models' Uncertainty Expression on User Reliance and Trust	FAccT 2024	[Link]
Can Large Language Models Faithfully Express Their Intrinsic Uncertainty in Words?	prePrint	[Link]

Confidence Expressions Improvements

This part of works focus on improving confidence expressions of LLMs in a two-stage form by 1) self-prompting LLMs to generate responses to queries and then collecting the samples to construct a dataset with specific features, and 2) fine-tuning LLMs on the collected dataset to improve the specific capability of LLMs.

Title	Conference/Journal	Notes
Enhancing Confidence Expression in Large Language Models Through Learning from Past Experience	prePrint	[Link]
Improving the Reliability of Large Language Models by Leveraging Uncertainty-Aware In-Context Learning	prePrint	[Link]
Uncertainty in Language Models: Assessment through Rank-Calibration	prePrint	[Link]
SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales	prePrint	[Link]
Linguistic Calibration of Language Models	prePrint	[Link]
R-Tuning: Instructing Large Language Models to Say ‘I Don’t Know’	prePrint	[Link]

Hallucination Detection by Uncertainty

Title	Conference/Journal	Notes
On Hallucination and Predictive Uncertainty in Conditional Language Generation	EACL 2021	[Link]
Learning Confidence for Transformer-based Neural Machine Translation	ACL 2022	[Link]
Towards Reliable Misinformation Mitigation: Generalization, Uncertainty, and GPT-4	EMNLP 2023	[Note]
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models	EMNLP 2023	[Note]
Detecting Hallucinations in Large Language Models using Semantic Entropy	Nature	[Link]
LLM Internal States Reveal Hallucination Risk Faced With a Query	prePrint	[Link]

Factuality Improvements by Confidence

Title	Conference/Journal	Notes
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model	NeurIPS 2023	[Link]
When to Trust LLMs: Aligning Confidence with Response Quality	prePrint	[Link]
When to Trust LLMs: Aligning Confidence with Response Quality	prePrint	[Link]
Uncertainty Aware Learning for Language Model Alignment	ACL 2024	[Link]

Generative Model Calibration

Title	Conference/Journal	Notes
Reducing Conversational Agents’ Overconfidence Through Linguistic Calibration	TACL 2022	[Link]
Preserving Pre-trained Features Helps Calibrate Fine-tuned Language Models	ICLR 2023	[Link]
Calibrating the Confidence of Large Language Models by Eliciting Fidelity	prePrint	[Link]
Few-Shot Recalibration of Language Models	prePrint	[Link]
How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering	TACL 2022	[Link]
Knowing More About Questions Can Help: Improving Calibration in Question Answering	ACL 2021 Findings	[Link]
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback	EMNLP 2023	[Link]
Re-Examining Calibration: The Case of Question Answering	TACL 2021	[Link]
Calibrating Large Language Models Using Their Generations Only	prePrint	[Link]
Calibrating Large Language Models with Sample Consistency	prePrint	[Link]
Linguistic Calibration of Language Models	prePrint	[Link]

🔭 Future Directions

More advanced methods to assist LLMs hallucination detection and human decisions. (A new paradigm)
Confidence estimation for long-term generations like code, novel, etc. (Benchmark)
Learning to explain and clarify its confidence estimation and calibration. (Natural language)
Calibration on human variation (Misalignment between LM measures and human disagreement).
Confidence estimation and calibration for multi-modal LLMs.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
docs		docs
figs		figs
files		files
pages		pages
utils		utils
.nojekyll		.nojekyll
README.md		README.md
_coverpage.md		_coverpage.md
_navbar.md		_navbar.md
favicon.ico		favicon.ico
index.html		index.html
logo.png		logo.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Reliable LLM: A Framework of Mitigating Hallucination regarding Knowledge and Uncertainty

Outline

👻 Hallucination

Definition

Causes

Address

📓 Knowledge

Related Works of LLM Knowledge

Known-Unknown

🤔 Uncertainty

Traditional Model Calibration

Uncertainty Estimation of Generative Models

Related Works of Uncertainty & Confidence & Calibration

Survey & Investigation

Uncertainty Quantification

Linguistic Uncertainty Expressions

Confidence Expressions Improvements

Hallucination Detection by Uncertainty

Factuality Improvements by Confidence

Generative Model Calibration

🔭 Future Directions

About

Uh oh!

Releases

Packages

Languages

aJupyter/Reliable-LLM

Folders and files

Latest commit

History

Repository files navigation

Reliable LLM: A Framework of Mitigating Hallucination regarding Knowledge and Uncertainty

Outline

👻 Hallucination

Definition

Causes

Address

📓 Knowledge

Related Works of LLM Knowledge

Known-Unknown

🤔 Uncertainty

Traditional Model Calibration

Uncertainty Estimation of Generative Models

Related Works of Uncertainty & Confidence & Calibration

Survey & Investigation

Uncertainty Quantification

Linguistic Uncertainty Expressions

Confidence Expressions Improvements

Hallucination Detection by Uncertainty

Factuality Improvements by Confidence

Generative Model Calibration

🔭 Future Directions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages