Add LLM-based fallback for series matching via OpenAI #8307

sydrvxd · 2025-12-30T14:30:09Z

Introduces a new LLM (GPT) fallback mechanism for series matching when traditional parsing fails. Adds ILlmSeriesMatchingService with OpenAI, caching, and rate-limiting implementations. Integrates LLM matching into ParsingService, adds new config options, and exposes a REST API for configuration and testing. Includes unit and integration tests, and updates DI and solution files. This improves matching for ambiguous, foreign, or scene releases in a safe and configurable way.

Description

This PR adds an optional LLM-based fallback matching system for series identification when traditional parsing methods fail. The feature uses OpenAI-compatible APIs to intelligently match release titles to series in the user's library, significantly reducing manual import interventions.

Key Features:

Fallback-only: LLM matching only triggers after all traditional matching methods (scene mapping, TVDB/IMDb/TVRage ID, title-based, alternate titles) have failed
Confidence threshold: Configurable threshold (default 70%) - matches below threshold require manual confirmation
Cost control: Built-in rate limiting (default 60 calls/hour) and response caching (24h) to minimize API costs
Encoding resilience: Handles miscoded umlauts (e.g., Ã¼ → ü), foreign scripts (Japanese, Chinese, Korean, Cyrillic, Arabic), and various encoding issues common in release titles
Anime support: Recognizes alternate/localized titles (e.g., "Shingeki no Kyojin" → "Attack on Titan")
Provider flexibility: Supports OpenAI, Azure OpenAI, and local LLM endpoints (Ollama, LM Studio)

Architecture:

Decorator pattern: Rate Limiting → Caching → OpenAI API
New SeriesMatchType.Llm for tracking match source
Fully optional - disabled by default, requires API key configuration

Database Migration

YES - Adds the following configuration properties:

LlmMatchingEnabled (bool, default: false)
OpenAiApiKey (string, encrypted)
OpenAiApiEndpoint (string, default: OpenAI API)
OpenAiModel (string, default: gpt-4o-mini)
LlmConfidenceThreshold (double, default: 0.7)
LlmMaxCallsPerHour (int, default: 60)
LlmCacheEnabled (bool, default: true)
LlmCacheDurationHours (int, default: 24)

Introduces a new LLM (GPT) fallback mechanism for series matching when traditional parsing fails. Adds `ILlmSeriesMatchingService` with OpenAI, caching, and rate-limiting implementations. Integrates LLM matching into ParsingService, adds new config options, and exposes a REST API for configuration and testing. Includes unit and integration tests, and updates DI and solution files. This improves matching for ambiguous, foreign, or scene releases in a safe and configurable way.

augustuen · 2025-12-30T14:54:22Z

Please don't. There's no point in wasting natural resources on trying to parse crappy releases.

Encoding resilience: Handles miscoded umlauts (e.g., Ã¼ → ü), foreign scripts (Japanese, Chinese, Korean, Cyrillic, Arabic), and various encoding issues common in release titles

This should rather be fixed in the existing code than just dumping the issue on OpenAI.

Anime support: Recognizes alternate/localized titles (e.g., "Shingeki no Kyojin" → "Attack on Titan")

XEM/Aliases already do this.

Confidence threshold: Configurable threshold (default 70%) - matches below threshold require manual confirmation

Does the confidence matter when the LLM is liable to hallucinate?

And even if this was a decent feature, I can say even without being a maintainer that this AI slop won't get merged. If you really want this to get merged, you'd at least have to put some of your own work into it.

markus101 · 2025-12-30T16:10:56Z

Don't waste resources on a PR for a feature no one has requested, talk to us first, for such an ambitious idea, probably best to talk about it on Discord before filing an issue, especially if AI is going to be involved.
Don't dump your slop coded "work" into a PR and expect us to review it.

The AI hallucination is real with this PR.

github-actions bot added the parsing label Dec 30, 2025

markus101 closed this Dec 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add LLM-based fallback for series matching via OpenAI #8307

Add LLM-based fallback for series matching via OpenAI #8307

Uh oh!

sydrvxd commented Dec 30, 2025

Uh oh!

augustuen commented Dec 30, 2025

Uh oh!

markus101 commented Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Add LLM-based fallback for series matching via OpenAI #8307

Add LLM-based fallback for series matching via OpenAI #8307

Uh oh!

Conversation

sydrvxd commented Dec 30, 2025

Description

Database Migration

Uh oh!

augustuen commented Dec 30, 2025

Uh oh!

markus101 commented Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants