-
Couldn't load subscription status.
- Fork 96
feat: add IGNORE_CACHES_FOR_LIVE_SCRAPING option #360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add new environment variable to force live scraping and bypass cached metadata and torrents. When enabled, the system will always fetch fresh data from configured scrapers instead of using cached results. Changes: - Add IGNORE_CACHES_FOR_LIVE_SCRAPING setting to AppSettings model - Update .env-sample with documentation for new flag - Modify fast path condition in stream endpoint to check the flag - Add logging when cache is being ignored during live scraping
WalkthroughAdds a new IGNORE_CACHES_FOR_LIVE_SCRAPING setting (default False), surfaces it in .env-sample, and updates live scraping logic to bypass cached metadata/torrents when the flag is true. The stream API now conditionally avoids the cache fast path and proceeds to lock and scrape, with added logs reflecting the behavior. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant C as Client
participant A as Stream API
participant DB as Cache/DB
participant L as Dist. Lock
participant S as Scrapers
C->>A: Request LIVE stream
A->>A: Read setting IGNORE_CACHES_FOR_LIVE_SCRAPING
A->>DB: Lookup cached metadata + torrents
alt Flag=false AND cache complete
A-->>C: Return cached response (fast path)
else Flag=true OR cache incomplete
A->>A: Log "ignoring caches for live scraping" (if flag=true)
A->>L: Acquire scrape lock
alt Lock acquired
A->>S: Fetch metadata/torrents
S-->>A: Results
A->>DB: Update caches
A-->>C: Return fresh results
else Lock not acquired
A-->>C: Wait/fallback to eventual cached results
end
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
comet/api/stream.py (1)
169-172: Consider restructuring comments for clarity.The comment on lines 171-172 is placed immediately after an
ifstatement that checks when the flag isTrue, which creates ambiguity about which scenario it documents. The comment attempts to explain the case when the flag isFalse, but its placement makes this unclear.Consider restructuring the comments like this:
+ # Scraping path: either forced (flag=True) or due to missing data (flag=False) if settings.IGNORE_CACHES_FOR_LIVE_SCRAPING: logger.log("SCRAPER", f"🔄 Ignoring caches for live scraping: {media_id}") - # If flag is False, something is missing (metadata or torrents), acquire lock for scraping - + + # Acquire distributed lock for scraping scrape_lock = DistributedLock(media_id)This makes it clearer that the else block handles both scenarios: forced re-scraping (flag=True) and cache misses (flag=False).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.env-sample(1 hunks)comet/api/stream.py(2 hunks)comet/utils/models.py(1 hunks)
🔇 Additional comments (3)
.env-sample (1)
103-105: Clear documentation for the new cache-bypassing flag.The documentation clearly explains when and why to use
IGNORE_CACHES_FOR_LIVE_SCRAPING, and the default value ofFalsemaintains backward compatibility while allowing operators to force fresh scraping when needed.comet/utils/models.py (1)
98-98: Field correctly added to AppSettings.The field type, name, and default value are consistent with the environment variable definition and other boolean settings in the class.
comet/api/stream.py (1)
160-160: Fast-path condition correctly guards cache usage.The modified condition properly ensures that when
IGNORE_CACHES_FOR_LIVE_SCRAPINGisTrue, the fast path is bypassed regardless of cache availability, forcing fresh scraping as intended.
|
Heya @g0ldyy any updates of testing this out on your end and merging it if all looks good? Thanks a ton! |
Summary
Adds a new environment variable
IGNORE_CACHES_FOR_LIVE_SCRAPINGthat allows forcing fresh scraping from configured scrapers during live requests, bypassing cached metadata and torrents in the database.Changes
IGNORE_CACHES_FOR_LIVE_SCRAPINGboolean setting toAppSettingsmodel incomet/utils/models.py(defaults toFalse).env-samplewith documentation for the new parametercomet/api/stream.pyto check the flag before using cached dataUse Case
When
IGNORE_CACHES_FOR_LIVE_SCRAPING=True, the system will:This is useful when you want to force-fetch the latest torrents without relying on potentially stale cached data.
Test Plan
IGNORE_CACHES_FOR_LIVE_SCRAPING=False(default) and verify normal caching behavior worksIGNORE_CACHES_FOR_LIVE_SCRAPING=Trueand verify fresh scraping occurs even when cached data existsSummary by CodeRabbit
New Features
Documentation
Chores