Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@deejay189393
Copy link

@deejay189393 deejay189393 commented Oct 8, 2025

Summary

Adds a new environment variable IGNORE_CACHES_FOR_LIVE_SCRAPING that allows forcing fresh scraping from configured scrapers during live requests, bypassing cached metadata and torrents in the database.

Changes

  • Added IGNORE_CACHES_FOR_LIVE_SCRAPING boolean setting to AppSettings model in comet/utils/models.py (defaults to False)
  • Updated .env-sample with documentation for the new parameter
  • Modified fast path condition in comet/api/stream.py to check the flag before using cached data
  • Added logging to indicate when caches are being ignored during live scraping

Use Case

When IGNORE_CACHES_FOR_LIVE_SCRAPING=True, the system will:

  1. Skip the fast path that uses cached metadata and torrents
  2. Always acquire a scrape lock and fetch fresh data from scrapers
  3. Log when caches are being bypassed

This is useful when you want to force-fetch the latest torrents without relying on potentially stale cached data.

Test Plan

  • Set IGNORE_CACHES_FOR_LIVE_SCRAPING=False (default) and verify normal caching behavior works
  • Set IGNORE_CACHES_FOR_LIVE_SCRAPING=True and verify fresh scraping occurs even when cached data exists
  • Check logs to confirm appropriate messages are shown in both cases

Summary by CodeRabbit

  • New Features

    • Added an optional environment setting to force live scraping to bypass caches and fetch fresh data. Disabled by default to preserve current behavior.
  • Documentation

    • Updated sample environment configuration to include the new setting and its default value.
  • Chores

    • Improved logging to clearly indicate when cache bypass is active during live scraping.

Add new environment variable to force live scraping and bypass cached
metadata and torrents. When enabled, the system will always fetch fresh
data from configured scrapers instead of using cached results.

Changes:
- Add IGNORE_CACHES_FOR_LIVE_SCRAPING setting to AppSettings model
- Update .env-sample with documentation for new flag
- Modify fast path condition in stream endpoint to check the flag
- Add logging when cache is being ignored during live scraping
@coderabbitai
Copy link

coderabbitai bot commented Oct 8, 2025

Walkthrough

Adds a new IGNORE_CACHES_FOR_LIVE_SCRAPING setting (default False), surfaces it in .env-sample, and updates live scraping logic to bypass cached metadata/torrents when the flag is true. The stream API now conditionally avoids the cache fast path and proceeds to lock and scrape, with added logs reflecting the behavior.

Changes

Cohort / File(s) Summary
Environment configuration
./.env-sample
Documents new env var IGNORE_CACHES_FOR_LIVE_SCRAPING with default False.
Application settings
comet/utils/models.py
Adds AppSettings field: IGNORE_CACHES_FOR_LIVE_SCRAPING: Optional[bool] = False (public attribute).
Live scraping control flow
comet/api/stream.py
Fast-path cache read now requires flag=false plus cached metadata and torrents present; when flag=true, logs that caches are ignored and proceeds to acquire scrape lock and fetch via scrapers. Additional explicit logging in else branch.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant C as Client
  participant A as Stream API
  participant DB as Cache/DB
  participant L as Dist. Lock
  participant S as Scrapers

  C->>A: Request LIVE stream
  A->>A: Read setting IGNORE_CACHES_FOR_LIVE_SCRAPING
  A->>DB: Lookup cached metadata + torrents
  alt Flag=false AND cache complete
    A-->>C: Return cached response (fast path)
  else Flag=true OR cache incomplete
    A->>A: Log "ignoring caches for live scraping" (if flag=true)
    A->>L: Acquire scrape lock
    alt Lock acquired
      A->>S: Fetch metadata/torrents
      S-->>A: Results
      A->>DB: Update caches
      A-->>C: Return fresh results
    else Lock not acquired
      A-->>C: Wait/fallback to eventual cached results
    end
  end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I twitch my whiskers, flip a flag—oh my!
No stale carrots from the cache to try.
I bound to fields where fresh bits grow,
Locks in place, the scrapers flow.
Back I hop with data bright—
A harvest warm from live-bite night. 🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title clearly states the primary change by indicating a new feature flag addition, uses concise phrasing, and directly reflects the main objective of adding the IGNORE_CACHES_FOR_LIVE_SCRAPING option.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
comet/api/stream.py (1)

169-172: Consider restructuring comments for clarity.

The comment on lines 171-172 is placed immediately after an if statement that checks when the flag is True, which creates ambiguity about which scenario it documents. The comment attempts to explain the case when the flag is False, but its placement makes this unclear.

Consider restructuring the comments like this:

+        # Scraping path: either forced (flag=True) or due to missing data (flag=False)
         if settings.IGNORE_CACHES_FOR_LIVE_SCRAPING:
             logger.log("SCRAPER", f"🔄 Ignoring caches for live scraping: {media_id}")
-        # If flag is False, something is missing (metadata or torrents), acquire lock for scraping
-
+        
+        # Acquire distributed lock for scraping
         scrape_lock = DistributedLock(media_id)

This makes it clearer that the else block handles both scenarios: forced re-scraping (flag=True) and cache misses (flag=False).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f29ec84 and 95f5794.

📒 Files selected for processing (3)
  • .env-sample (1 hunks)
  • comet/api/stream.py (2 hunks)
  • comet/utils/models.py (1 hunks)
🔇 Additional comments (3)
.env-sample (1)

103-105: Clear documentation for the new cache-bypassing flag.

The documentation clearly explains when and why to use IGNORE_CACHES_FOR_LIVE_SCRAPING, and the default value of False maintains backward compatibility while allowing operators to force fresh scraping when needed.

comet/utils/models.py (1)

98-98: Field correctly added to AppSettings.

The field type, name, and default value are consistent with the environment variable definition and other boolean settings in the class.

comet/api/stream.py (1)

160-160: Fast-path condition correctly guards cache usage.

The modified condition properly ensures that when IGNORE_CACHES_FOR_LIVE_SCRAPING is True, the fast path is bypassed regardless of cache availability, forcing fresh scraping as intended.

@deejay189393
Copy link
Author

Heya @g0ldyy any updates of testing this out on your end and merging it if all looks good? Thanks a ton!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant