feat: add IGNORE_CACHES_FOR_LIVE_SCRAPING option #360

deejay189393 · 2025-10-08T14:29:15Z

Summary

Adds a new environment variable IGNORE_CACHES_FOR_LIVE_SCRAPING that allows forcing fresh scraping from configured scrapers during live requests, bypassing cached metadata and torrents in the database.

Changes

Added IGNORE_CACHES_FOR_LIVE_SCRAPING boolean setting to AppSettings model in comet/utils/models.py (defaults to False)
Updated .env-sample with documentation for the new parameter
Modified fast path condition in comet/api/stream.py to check the flag before using cached data
Added logging to indicate when caches are being ignored during live scraping

Use Case

When IGNORE_CACHES_FOR_LIVE_SCRAPING=True, the system will:

Skip the fast path that uses cached metadata and torrents
Always acquire a scrape lock and fetch fresh data from scrapers
Log when caches are being bypassed

This is useful when you want to force-fetch the latest torrents without relying on potentially stale cached data.

Test Plan

Set IGNORE_CACHES_FOR_LIVE_SCRAPING=False (default) and verify normal caching behavior works
Set IGNORE_CACHES_FOR_LIVE_SCRAPING=True and verify fresh scraping occurs even when cached data exists
Check logs to confirm appropriate messages are shown in both cases

Summary by CodeRabbit

New Features
- Added an optional environment setting to force live scraping to bypass caches and fetch fresh data. Disabled by default to preserve current behavior.
Documentation
- Updated sample environment configuration to include the new setting and its default value.
Chores
- Improved logging to clearly indicate when cache bypass is active during live scraping.

Add new environment variable to force live scraping and bypass cached metadata and torrents. When enabled, the system will always fetch fresh data from configured scrapers instead of using cached results. Changes: - Add IGNORE_CACHES_FOR_LIVE_SCRAPING setting to AppSettings model - Update .env-sample with documentation for new flag - Modify fast path condition in stream endpoint to check the flag - Add logging when cache is being ignored during live scraping

coderabbitai · 2025-10-08T14:30:00Z

Walkthrough

Adds a new IGNORE_CACHES_FOR_LIVE_SCRAPING setting (default False), surfaces it in .env-sample, and updates live scraping logic to bypass cached metadata/torrents when the flag is true. The stream API now conditionally avoids the cache fast path and proceeds to lock and scrape, with added logs reflecting the behavior.

Changes

Cohort / File(s)	Summary
Environment configuration `./.env-sample`	Documents new env var IGNORE_CACHES_FOR_LIVE_SCRAPING with default False.
Application settings `comet/utils/models.py`	Adds AppSettings field: IGNORE_CACHES_FOR_LIVE_SCRAPING: Optional[bool] = False (public attribute).
Live scraping control flow `comet/api/stream.py`	Fast-path cache read now requires flag=false plus cached metadata and torrents present; when flag=true, logs that caches are ignored and proceeds to acquire scrape lock and fetch via scrapers. Additional explicit logging in else branch.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant C as Client
  participant A as Stream API
  participant DB as Cache/DB
  participant L as Dist. Lock
  participant S as Scrapers

  C->>A: Request LIVE stream
  A->>A: Read setting IGNORE_CACHES_FOR_LIVE_SCRAPING
  A->>DB: Lookup cached metadata + torrents
  alt Flag=false AND cache complete
    A-->>C: Return cached response (fast path)
  else Flag=true OR cache incomplete
    A->>A: Log "ignoring caches for live scraping" (if flag=true)
    A->>L: Acquire scrape lock
    alt Lock acquired
      A->>S: Fetch metadata/torrents
      S-->>A: Results
      A->>DB: Update caches
      A-->>C: Return fresh results
    else Lock not acquired
      A-->>C: Wait/fallback to eventual cached results
    end
  end

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I twitch my whiskers, flip a flag—oh my!
No stale carrots from the cache to try.
I bound to fields where fresh bits grow,
Locks in place, the scrapers flow.
Back I hop with data bright—
A harvest warm from live-bite night. 🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title clearly states the primary change by indicating a new feature flag addition, uses concise phrasing, and directly reflects the main objective of adding the IGNORE_CACHES_FOR_LIVE_SCRAPING option.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

comet/api/stream.py (1)
169-172: Consider restructuring comments for clarity.

The comment on lines 171-172 is placed immediately after an if statement that checks when the flag is True, which creates ambiguity about which scenario it documents. The comment attempts to explain the case when the flag is False, but its placement makes this unclear.

Consider restructuring the comments like this:
+        # Scraping path: either forced (flag=True) or due to missing data (flag=False)
         if settings.IGNORE_CACHES_FOR_LIVE_SCRAPING:
             logger.log("SCRAPER", f"🔄 Ignoring caches for live scraping: {media_id}")
-        # If flag is False, something is missing (metadata or torrents), acquire lock for scraping
-
+        
+        # Acquire distributed lock for scraping
         scrape_lock = DistributedLock(media_id)
This makes it clearer that the else block handles both scenarios: forced re-scraping (flag=True) and cache misses (flag=False).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f29ec84 and 95f5794.

📒 Files selected for processing (3)

.env-sample (1 hunks)
comet/api/stream.py (2 hunks)
comet/utils/models.py (1 hunks)

🔇 Additional comments (3)

.env-sample (1)

103-105: Clear documentation for the new cache-bypassing flag.

The documentation clearly explains when and why to use IGNORE_CACHES_FOR_LIVE_SCRAPING, and the default value of False maintains backward compatibility while allowing operators to force fresh scraping when needed.

comet/utils/models.py (1)

98-98: Field correctly added to AppSettings.

The field type, name, and default value are consistent with the environment variable definition and other boolean settings in the class.

comet/api/stream.py (1)

160-160: Fast-path condition correctly guards cache usage.

The modified condition properly ensures that when IGNORE_CACHES_FOR_LIVE_SCRAPING is True, the fast path is bypassed regardless of cache availability, forcing fresh scraping as intended.

deejay189393 · 2025-10-21T15:48:58Z

Heya @g0ldyy any updates of testing this out on your end and merging it if all looks good? Thanks a ton!

coderabbitai bot reviewed Oct 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: add IGNORE_CACHES_FOR_LIVE_SCRAPING option #360

feat: add IGNORE_CACHES_FOR_LIVE_SCRAPING option #360

Uh oh!

deejay189393 commented Oct 8, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 8, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

deejay189393 commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

feat: add IGNORE_CACHES_FOR_LIVE_SCRAPING option #360

Are you sure you want to change the base?

feat: add IGNORE_CACHES_FOR_LIVE_SCRAPING option #360

Uh oh!

Conversation

deejay189393 commented Oct 8, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Use Case

Test Plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

deejay189393 commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

deejay189393 commented Oct 8, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 8, 2025 •

edited

Loading