-
-
Notifications
You must be signed in to change notification settings - Fork 313
Add DB-backed project freshness score using OWASP-Bumper #5330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
👋 Hi @Nachiket-Roy! This pull request needs a peer review before it can be merged. Please request a review from a team member who is not:
Once a valid peer review is submitted, this check will pass automatically. Thank you! |
WalkthroughAdds a stored Project.freshness field and calculate_freshness() method with migration, a management command to recalc/persist freshness (invoked from the daily runner), API validation/filtering for a freshness query param, moves serializer to expose stored freshness, removes legacy ProjectViewSet.filter, and updates tests. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant API as Django API (ProjectViewSet)
participant DB as Database (Project, Repo)
Note over API,DB: API request flow for filtering projects by stored freshness
Client->>API: GET /api/v1/projects/?freshness=50
API->>API: parse & validate freshness param (float, 0–100)
alt invalid
API-->>Client: 400 Bad Request (error message)
else valid
API->>DB: Query Projects with freshness__gte=50 (plus other filters)
DB-->>API: matching Project rows (includes stored freshness)
API-->>Client: 200 OK (serialized projects with read-only freshness)
end
sequenceDiagram
autonumber
participant Scheduler
participant Cmd as update_project_freshness
participant DB as Database (Project, Repo)
participant Logger
Note over Cmd,DB: Daily freshness update batch flow
Scheduler->>Cmd: call_command("update_project_freshness")
Cmd->>DB: fetch Project ids (batched)
loop per project in batch
Cmd->>DB: load project with row lock (select_for_update)
Cmd->>Cmd: freshness = project.calculate_freshness()
alt success
Cmd->>DB: save(project, update_fields=["freshness"])
else error
Cmd->>Logger: log exception for project id
end
end
Cmd-->>Scheduler: print summary (processed, errors, elapsed time)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
📊 Monthly LeaderboardHi @Nachiket-Roy! Here's how you rank for December 2025:
Leaderboard based on contributions in December 2025. Keep up the great work! 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (7)
website/api/views.py(1 hunks)website/management/commands/run_daily.py(1 hunks)website/management/commands/update_project_freshness.py(1 hunks)website/migrations/0264_project_freshness.py(1 hunks)website/models.py(1 hunks)website/serializers.py(1 hunks)website/tests/test_project_aggregation.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
website/models.py (1)
website/api/views.py (2)
filter(375-414)filter(844-922)
website/management/commands/update_project_freshness.py (1)
website/models.py (12)
Project(1366-1477)calculate_freshness(1401-1440)save(74-77)save(277-289)save(1442-1469)save(1577-1580)save(1770-1773)save(1888-1902)save(1993-2015)save(2814-2817)save(3346-3356)save(3654-3663)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Run Tests
- GitHub Check: docker-test
🔇 Additional comments (6)
website/management/commands/run_daily.py (1)
48-51: New daily freshness task wiring looks consistentThe new
update_project_freshnessinvocation matches the existing error-handling pattern for other daily commands and won’t break the overall job on failures. No changes needed.website/api/views.py (1)
856-869: Freshness filter validation and semantics are soundRange checking, type validation, and
freshness__gtebehavior all look correct and consistent with existing stars/forks filters.website/tests/test_project_aggregation.py (1)
16-24: Test setup now correctly relies on real freshness behaviorKeeping only the
prefetch_relatedpatch and dropping any freshness mocking is appropriate given the new DB-backed field. The updated comment still accurately describes the remaining patch.website/serializers.py (1)
129-141: Serializer exposure of freshness matches the model and requirementsUsing a read-only
DecimalField(max_digits=5, decimal_places=2)onProjectSerializercleanly reflects the storedfreshnessvalue while preventing client writes. Looks good.website/migrations/0264_project_freshness.py (1)
1-17: Migration correctly adds indexed freshness fieldThe migration cleanly adds
Project.freshnesswith the expectedDecimalFieldshape and index, aligned with the model definition and PR intent.website/management/commands/update_project_freshness.py (1)
1-40: Freshness recalculation command is correct and robustThe command correctly:
- Iterates all projects,
- Uses
project.calculate_freshness(),- Saves only the
freshnessfield,- Isolates per-project failures while continuing,
- Reports progress and timing.
This is a solid implementation for a daily batch job.
|
@coderabbitai help add tests for the changes made in this pr |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 826 🏁 Script executed: Length of output: 10233 🏁 Script executed: Length of output: 7006 @Nachiket-Roy: I'll help you create comprehensive tests for the freshness scoring feature. Here are test files covering all the changes: 1. Model Tests -
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (6)
website/api/views.py(1 hunks)website/models.py(1 hunks)website/tests/test_api.py(2 hunks)website/tests/test_project_aggregation.py(3 hunks)website/tests/test_project_freshness.py(1 hunks)website/tests/test_update_project_freshness_command.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- website/models.py
🧰 Additional context used
🧬 Code graph analysis (3)
website/tests/test_project_aggregation.py (1)
website/models.py (3)
Project(1366-1479)Repo(1948-2025)calculate_freshness(1401-1442)
website/tests/test_api.py (1)
website/models.py (2)
Project(1366-1479)Repo(1948-2025)
website/tests/test_project_freshness.py (1)
website/models.py (4)
Organization(181-289)Project(1366-1479)Repo(1948-2025)calculate_freshness(1401-1442)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: auto-assign
- GitHub Check: remove_last_active_label
- GitHub Check: add_changes_requested_label
- GitHub Check: check_peer_review
- GitHub Check: remove_last_active_label
- GitHub Check: Run Tests
🔇 Additional comments (4)
website/tests/test_project_aggregation.py (1)
160-215: Excellent integration test with correct freshness calculation.The test properly validates the end-to-end freshness calculation and persistence flow. The expected value of 9.5 is correct:
- very-active (2 days ago) → active_7 = 1 → weight 1.0
- somewhat-active (20 days ago) → active_30 = 1 → weight 0.6
- old-active (60 days ago) → active_90 = 1 → weight 0.3
- archived (1 day ago) → correctly excluded
- raw_score = 1.9, normalized = (1.9/20)*100 = 9.5
website/api/views.py (1)
760-775: LGTM! Clean implementation of freshness filtering.The validation logic is consistent with the existing stars/forks parameter handling, uses appropriate >= semantics with
freshness__gte, and provides clear error messages for invalid input.website/tests/test_update_project_freshness_command.py (1)
15-136: LGTM! Comprehensive command test coverage.The test suite thoroughly exercises the management command with good coverage of:
- Batch updates across multiple projects with varying activity levels
- Graceful per-project error handling using proper mocking
- Execution time reporting
- Zero-projects edge case
The expected freshness value of 5.0 at Line 105 is correctly calculated (1 repo active 5 days ago → raw_score 1.0 → 5.0 freshness).
website/tests/test_api.py (1)
299-427: LGTM! Comprehensive API filtering test coverage.The test suite thoroughly validates freshness filtering including:
- Threshold filtering with correct >= semantics
- Input validation for negative, >100, and non-numeric values
- Decimal freshness values
- Combined filtering with other parameters (stars)
- Freshness field presence in API responses
The use of hardcoded freshness values in setUp (Lines 307-321) is appropriate for API-level testing. The actual freshness calculation logic is validated by the integration test in
test_project_aggregation.py(lines 160-215).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
website/management/commands/update_project_freshness.py (1)
25-29: Add row-level locking to prevent concurrent update races.As noted in the PR comments, concurrent executions of this command (scheduled run overlapping with a manual trigger, or a long-running job still active when the next scheduled run starts) can cause race conditions: both instances read and update the same projects, and the last write wins. The "losing" update is wasted work.
Recommended fix: Use select_for_update() to lock rows
Refactor to select and lock the batch inside the transaction:
for offset in range(0, total, BATCH_SIZE): - batch = qs[offset : offset + BATCH_SIZE] - with transaction.atomic(): + batch = list(qs.select_for_update()[offset : offset + BATCH_SIZE]) for project in batch:Note: This assumes you adopt per-project transactions (from the previous comment). If you keep batch-level transactions,
select_for_update()will lock all 500 rows for the duration of the batch, which is acceptable but reduces concurrency. If you move to per-project transactions, you'd need a different approach (e.g.,select_for_update(skip_locked=True)to skip locked rows, or redesign locking strategy).Based on learnings from PR comments: "Review identified concurrency/race concerns and lack of transaction/locking around updates. Recommendation: wrap updates in transactions and consider select_for_update()."
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (1)
website/management/commands/update_project_freshness.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
website/management/commands/update_project_freshness.py (1)
website/models.py (2)
Project(1366-1481)calculate_freshness(1401-1444)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Run pre-commit
- GitHub Check: Run Tests
- GitHub Check: docker-test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
website/management/commands/update_project_freshness.py (1)
26-32: Batch-level transaction causes entire batch rollback on single failure.The current implementation wraps 500 projects in a single
transaction.atomic()block. If any project'scalculate_freshness()orsave()fails (e.g., due to anIntegrityError, database deadlock, or calculation error), Django marks the transaction as aborted and rolls back all 500 projects in that batch — even those that were processed successfully before the failure.Impact: While
select_for_update()prevents concurrent modifications (addressing the "concurrency fixed" commit), it doesn't provide isolation from rollbacks within the batch. A single failing project can cause 499 successful updates to be lost.🔎 Alternative: Per-project transactions for better isolation
As recommended in the previous review, move
transaction.atomic()inside the per-project loop to isolate failures:for offset in range(0, total, BATCH_SIZE): - try: - with transaction.atomic(): - batch = list(qs.select_for_update()[offset : offset + BATCH_SIZE]) - for project in batch: - project.freshness = project.calculate_freshness() - project.save(update_fields=["freshness"]) - processed += 1 - except Exception as e: - errors += 1 - self.stderr.write(f"[ERROR] Project ID {project.id}: {str(e)}") + batch = list(qs[offset : offset + BATCH_SIZE]) + for project in batch: + try: + with transaction.atomic(): + freshness = project.calculate_freshness() + project.freshness = freshness + project.save(update_fields=["freshness"]) + processed += 1 + except Exception as e: + errors += 1 + self.stderr.write(f"[ERROR] Project ID {project.id}: {str(e)}")Trade-offs:
- ✅ One failure doesn't affect other projects
- ✅ Accurate error reporting
- ❌ Removes batch-level
select_for_update()(loses row-level locking for the batch)- ❌ Slightly higher transaction overhead (500 small transactions vs. 1 large)
Alternative hybrid approach: Use
select_for_update(skip_locked=True)with per-project transactions to skip locked rows instead of waiting, or add retry logic for transient deadlocks.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (1)
website/management/commands/update_project_freshness.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
website/management/commands/update_project_freshness.py (3)
website/models.py (12)
Project(1366-1481)calculate_freshness(1401-1444)save(74-77)save(277-289)save(1446-1473)save(1581-1584)save(1774-1777)save(1892-1906)save(1997-2019)save(2818-2821)save(3350-3360)save(3658-3667)website/management/commands/run_daily.py (2)
Command(10-54)handle(13-54)website/static/js/debug-panel.js (1)
options(399-406)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Run Tests
- GitHub Check: docker-test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
website/management/commands/update_project_freshness.py (1)
41-41: Minor: Progress message could be clearer.The progress message reports "Processed X/Y projects..." but this count includes both successful and failed updates (it's actually "attempted"). The final summary (lines 45-48) correctly separates processed vs. errors, so the information is available at the end.
Optional: Make progress message more precise
Consider rephrasing to reflect that this is the attempted count:
- self.stdout.write(f"Processed {min(offset + BATCH_SIZE, total)}/{total} projects...") + self.stdout.write(f"Attempted {min(offset + BATCH_SIZE, total)}/{total} projects...")Or, for more detail, include the current processed/error counts in the progress message:
- self.stdout.write(f"Processed {min(offset + BATCH_SIZE, total)}/{total} projects...") + self.stdout.write( + f"Progress: {min(offset + BATCH_SIZE, total)}/{total} attempted " + f"({processed} successful, {errors} errors)" + )
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (1)
website/management/commands/update_project_freshness.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
website/management/commands/update_project_freshness.py (1)
website/models.py (12)
Project(1366-1481)calculate_freshness(1401-1444)save(74-77)save(277-289)save(1446-1473)save(1581-1584)save(1774-1777)save(1892-1906)save(1997-2019)save(2818-2821)save(3350-3360)save(3658-3667)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Run Tests
- GitHub Check: docker-test
🔇 Additional comments (4)
website/management/commands/update_project_freshness.py (4)
1-8: LGTM: Imports and batch size configuration are appropriate.The imports are necessary for the command functionality, and a batch size of 500 provides a good balance between memory usage and the number of ID-fetching queries.
11-23: LGTM: Command initialization is well-structured.The use of
only("id")optimizes memory usage by fetching only the necessary field. The counter initialization and progress messaging provide good visibility into the command's execution.
25-39: Excellent fix: Per-project transactions correctly address all past critical issues.The refactored implementation successfully resolves the previously identified problems:
- Per-project isolation (line 30): Each project's update is wrapped in its own
transaction.atomic(), so one failure no longer rolls back the entire batch.- No UnboundLocalError (line 39): The exception handler references
project_id(always defined in the loop) rather thanproject.id.- Accurate metrics (line 35): The
processedcounter increments only after a successful save, ensuring counts reflect actual persisted updates.The row-level lock via
select_for_update()(line 31) is correctly placed inside the transaction, preventing race conditions during the freshness calculation and save.Trade-off note: Holding the lock during
calculate_freshness()(line 33) ensures consistency but increases lock duration since the method queries related repos. This is acceptable for correctness, though it does mean slightly longer lock hold times per project.
43-48: LGTM: Summary output is clear and informative.The final summary correctly reports the processed count, errors, and execution time separately, providing clear visibility into the command's results.
Good catch! I have resolved the issues feel free to recheck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
website/management/commands/update_project_freshness.py (1)
25-44: Excellent fix! Previous critical issues resolved.The per-project transaction approach correctly addresses all the concerns raised in previous reviews:
UnboundLocalError fixed: Line 39 now references
project_id(always defined) instead ofproject(which might be unassigned if an exception occurs duringget()).Transaction isolation: Each project is processed in its own
atomic()block (line 30), so one failure won't roll back others in the batch.Accurate metrics: The
processedcounter is incremented inside the atomic block (line 35), ensuring it only counts successfully committed updates.Correct locking:
select_for_update()(line 31) prevents concurrent updates to the same project.The two-level loop structure (batching IDs for memory management, then per-project transactions) strikes a good balance between efficiency and robustness.
Optional: Consider adding a timeout to select_for_update()
If two instances of this command run concurrently, one will wait indefinitely for locks held by the other. You could add a timeout to fail fast:
project = Project.objects.select_for_update(nowait=True).get(pk=project_id)or
project = Project.objects.select_for_update(skip_locked=True).get(pk=project_id)
nowait=Trueraises an exception immediately if the row is locked;skip_locked=Truewould skip locked rows entirely. This is a nice-to-have enhancement, not required for this PR.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (1)
website/management/commands/update_project_freshness.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
website/management/commands/update_project_freshness.py (1)
website/models.py (12)
Project(1366-1481)calculate_freshness(1401-1444)save(74-77)save(277-289)save(1446-1473)save(1581-1584)save(1774-1777)save(1892-1906)save(1997-2019)save(2818-2821)save(3350-3360)save(3658-3667)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Run Tests
- GitHub Check: docker-test
🔇 Additional comments (3)
website/management/commands/update_project_freshness.py (3)
1-8: LGTM! Clean imports and reasonable batch size.The imports are appropriate for a management command that processes records in batched transactions. BATCH_SIZE of 500 provides good balance between memory usage and database round-trips.
14-24: LGTM! Efficient initialization.Using
only("id")when building the queryset is a good optimization that minimizes memory overhead when fetching batch IDs. The initialization logic is clean and includes helpful user feedback.
46-51: LGTM! Comprehensive reporting.The final summary provides all the key metrics (processed count, error count, execution time) that operators need to assess the command's success. Using
self.style.SUCCESSfollows Django management command conventions.
Jayant2908
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!!
CLOSES : #5226
Summary
This PR introduces a stored freshness score for projects and computes it using a Bumper-style activity decay model, based on repository update recency. The goal is to make project freshness queryable, filterable, and consistent across the platform.
Background : In the Projects section, there is a “Freshness” filter that calculates how active a repository is. However, this field is not stored in the database, which causes a FieldError.
Solution :
In this PR, I have fixed that issue by properly covering the filter. To calculate repository freshness, I am using the same algorithm as blt-bumper:
Last commit within 7 days → 1.0 point
Last commit within 30 days → 0.6 points
Last commit within 90 days → 0.3 points
Score is capped at 20, so highly active repositories do not outweigh less active ones by a large margin.
populates freshness for all existing projects.
Key Changes
Algorithm Notes
The freshness score is calculated using a Bumper-style time-decay activity model. Repositories are grouped into rolling 7-day, 30-day, and 90-day windows based on their most recent activity. Each window contributes with decreasing weights (1.0, 0.6, 0.3 respectively) to reflect diminishing relevance over time. The weighted score is normalized to a 0–100 range with an upper cap to prevent dominance by large projects. This approach mirrors Bumper’s activity graph logic by prioritizing recent activity while naturally discounting stale repositories.
Why this approach
Enables efficient sorting and filtering by freshness at the database level
Keeps scope limited to existing data and infrastructure
Summary by CodeRabbit
New Features
Bug Fixes
Chores
Tests
✏️ Tip: You can customize this high-level summary in your review settings.