Codestin Search App

onestardao · 2026-02-20T13:23:04Z

Follow-up to #20702 and #20721.

This PR keeps the existing RAG Failure Mode Checklist and extends it with a small set of system-level failure families that often show up in production, without changing any of the current recommendations.

Summary of changes

Keep sections 1–9 as-is (single-query failures: retrieval, chunking, embeddings, query understanding, synthesis).
Add section 10 “Embedding Metric Mismatch (Cosine Score ≠ True Meaning)” to cover cases where the distance metric or normalization does not match how meaning is distributed in the data.
Add section 11 “Session and Cache Memory Breaks” for cross-session instability caused by stateless indices, cache keys, or environment changes.
Add section 12 “Observability Gaps ("Black-Box Debugging")” to highlight that many issues cannot be fixed before basic traces and logs are in place.
Add section 13 “Index Lifecycle and Deployment Ordering” to capture failures caused by empty or half-built indices, wrong snapshot routing, or deployment ordering bugs.
Slightly update the introduction and the Quick Diagnostic Flowchart so they point to the new sections when issues appear only in production or after deploys.

All new content is written in a project-native way (no external dependencies or naming schemes) and is based on recurring failure patterns seen in real-world RAG deployments.

Happy to adjust wording, scope, or numbering if you would prefer a slimmer version or a separate “advanced” doc instead of extending this page.

Description

This is a documentation-only change that expands the existing RAG Failure Mode Checklist with several additional failure families that commonly appear in production systems (embedding metric issues, cross-session instability, observability gaps, and index lifecycle / deployment ordering problems).

Related issues: #20702, #20721 (docs follow-up; does not close new issues).

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

Yes
No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

Yes
No

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

This is a documentation-only change; no code paths were modified, so no additional tests were added.

I added new unit tests to cover this change
I believe this change is already covered by existing unit tests

Suggested Checklist

I have performed a self-review of my own changes
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added Google Colab support for the newly added notebooks
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I ran uv run make format; uv run make lint to appease the lint gods

Follow-up to run-llama#20702 and run-llama#20721. This PR keeps the existing RAG Failure Mode Checklist and extends it with a small set of system-level failure families that often show up in production, without changing any of the current recommendations. Summary of changes - Keep sections 1–9 as-is (single-query failures: retrieval, chunking, embeddings, query understanding, synthesis). - Add section 10 “Embedding Metric Mismatch (Cosine Score ≠ True Meaning)” to cover cases where the distance metric or normalization does not match how meaning is distributed in the data. - Add section 11 “Session and Cache Memory Breaks” for cross-session instability caused by stateless indices, cache keys, or environment changes. - Add section 12 “Observability Gaps ("Black-Box Debugging")” to highlight that many issues cannot be fixed before basic traces and logs are in place. - Add section 13 “Index Lifecycle and Deployment Ordering” to capture failures caused by empty or half-built indices, wrong snapshot routing, or deployment ordering bugs. - Slightly update the introduction and the Quick Diagnostic Flowchart so they point to the new sections when issues appear only in production or after deploys. All new content is written in a project-native way (no external dependencies or naming schemes) and is based on recurring failure patterns seen in real-world RAG deployments. Happy to adjust wording, scope, or numbering if you would prefer a slimmer version or a separate “advanced” doc instead of extending this page.

onestardao · 2026-02-21T01:04:20Z

I pushed a follow-up commit to fix the trailing whitespace that the linter reported.
From my side the checklist doc should now be lint-clean.
Happy to tweak any wording if you’d like.

AstraBert · 2026-02-23T10:35:31Z

Linting is failing because of prettier. In order to be sure that everything is linted correctly, please run:

uv pip install pre-commit
pre-commit install 
pre-commit run -a

From the root folder of the llama_index repo

AstraBert · 2026-02-23T12:01:19Z

Please do not merge main into this branch until the current CI is finished (so that we can merge this PR without problems, otherwise I have to keep re-approving the workflows to run at every commit pushed)

onestardao · 2026-02-23T12:18:59Z

Thanks a lot for the review and merge.
And got it on the CI process — will keep that in mind next time.

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Feb 20, 2026

logan-markewich approved these changes Feb 20, 2026

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Feb 20, 2026

logan-markewich enabled auto-merge (squash) February 20, 2026 23:15

onestardao added 2 commits February 21, 2026 08:44

Merge branch 'main' into main

8a811f1

fix: clean whitespace in RAG failure checklist

86f6238

auto-merge was automatically disabled February 21, 2026 00:55
Head branch was pushed to by a user without write access

chore: format RAG failure checklist with prettier

a19a832

AstraBert approved these changes Feb 23, 2026

View reviewed changes

AstraBert enabled auto-merge (squash) February 23, 2026 11:57

Merge branch 'main' into main

641cb3f

AstraBert merged commit a281640 into run-llama:main Feb 23, 2026
12 checks passed

onestardao mentioned this pull request Feb 23, 2026

docs suggestion, add a vector search failure mode checklist for RAG builders weaviate/docs#361

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: extend RAG Failure Mode Checklist with advanced failures#20760

docs: extend RAG Failure Mode Checklist with advanced failures#20760
AstraBert merged 5 commits intorun-llama:mainfrom
onestardao:main

onestardao commented Feb 20, 2026

Uh oh!

onestardao commented Feb 21, 2026

Uh oh!

AstraBert commented Feb 23, 2026

Uh oh!

AstraBert commented Feb 23, 2026

Uh oh!

Uh oh!

onestardao commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

onestardao commented Feb 20, 2026

Description

New Package?

Version Bump?

Type of Change

How Has This Been Tested?

Suggested Checklist

Uh oh!

onestardao commented Feb 21, 2026

Uh oh!

AstraBert commented Feb 23, 2026

Uh oh!

AstraBert commented Feb 23, 2026

Uh oh!

Uh oh!

onestardao commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants