Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@Nachiket-Roy
Copy link
Contributor

@Nachiket-Roy Nachiket-Roy commented Dec 17, 2025

CLOSES: #5277

Summary

This PR addresses multiple performance issues caused by N+1 query patterns and per-row database updates in contributor stats, hackathon pages, team challenges, and GSoC project refresh logic. The changes reduce query counts, improve scalability, and keep existing behavior unchanged.

Key Changes

  • Eliminated N+1 Queries
  • Replaced per-object lookups with select_related, prefetch_related, and in_bulk where appropriate.
  • Optimized follower/following lookups and team challenge participant checks.
  • Improved contributor stats and leaderboard queries by batching user and profile access.
  • Bulk Updates for GSoC PR Mapping
  • Replaced per-PR .save() calls with bulk_update() when linking merged PRs to UserProfile.
  • Reduced database writes from O(N) to O(1) per repository.

Query Efficiency Improvements

  • Used annotated counts instead of per-repository queries in hackathon dashboards.
  • Avoided repeated .count() calls inside loops.
  • Prefetched related objects (tags, participants, repositories) to minimize redundant queries.

Code Cleanup & Safety

  • Removed unnecessary try/except blocks after switching to in_bulk.
  • Preserved all existing functionality and user-facing behavior.

Impact

  • Significantly fewer database queries on high-traffic pages
  • Better performance for large datasets (contributors, PRs, challenges)
  • No schema changes or breaking behavior

Summary by CodeRabbit

  • Refactor

    • Bulk-fetching and annotations added across hackathons, issues, organizations, OSS channels, teams, projects, and user pages to reduce queries and speed up pages.
    • Recommendation and tag-normalization logic simplified; team progress rendering streamlined.
  • Bug Fixes

    • Voting flow simplified: emails sent only on new upvotes; like/dislike toggles return concise success responses.
  • Chores

    • Weekly reports batched per-domain with per-domain failure tracking; GSOC refresh and admin refresh tightened to authenticated/admin POST and switched to bulk updates.

✏️ Tip: You can customize this high-level summary in your review settings.

@github-actions
Copy link
Contributor

👋 Hi @Nachiket-Roy!

This pull request needs a peer review before it can be merged. Please request a review from a team member who is not:

  • The PR author
  • DonnieBLT
  • coderabbitai
  • copilot

Once a valid peer review is submitted, this check will pass automatically. Thank you!

@github-actions github-actions bot added files-changed: 6 PR changes 6 files needs-peer-review PR needs peer review labels Dec 17, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 17, 2025

Warning

Rate limit exceeded

@Nachiket-Roy has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 5 minutes and 41 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between ef7c19c and 97f509a.

📒 Files selected for processing (1)
  • website/views/organization.py (4 hunks)

Walkthrough

Replaces many per-item DB queries with bulk ORM operations (annotations, prefetch_related, select_related, in_bulk, bulk_update, bulk_create) across views; refactors vote handling, bulk-maps PRs to profiles, updates tag/recommender logic, per-domain reporting, challenge progress rendering, and wallet/leaderboard access to reduce N+1 queries. (44 words)

Changes

Cohort / File(s) Summary
Hackathon — merged PR annotation
website/views/hackathon.py
Replaced per-repo merged-PR counting loops with a single Count annotation (merged_pr_count) on hackathon.repositories (filters: type=pull_request, is_merged=True, merged_at window, exclude bots); callers now use repo.merged_pr_count.
Issue voting & GSOC PR→Profile bulk mapping
website/views/issue.py
Simplified like_issue / dislike_issue to fetch profile once, simplified toggle logic, and send email only on new upvote; refresh_gsoc_project changed from per-PR updates to batch processing: skip missing repos, collect contributor GitHub URLs, build profiles_map, assign user_profile from map and perform bulk updates; added access-control decorators (@login_required, @user_passes_test(admin_required), @require_http_methods(["POST"])).
Organization weekly reports & bulk issue updates
website/views/organization.py
weekly_report now annotates domains with open_count/closed_count, prefetches filtered Issues, builds per-domain report data, sends one email per domain.email (skips/logs missing addresses), catches SMTPException per-domain without aborting, and is protected by @staff_member_required. Replaced per-issue save() loops with bulk_update where applicable.
OSSH tag handling & recommender refactor
website/views/ossh.py
Added ALLOWED_NORMALIZED_TAGS and switched checks to it; discussion_channel_recommender now prefetches tags, builds tag_weight_map and channel_tag_names, computes tag/language matches from pre-fetched sets, and removes the channel-count factor from relevance scoring.
Teams — challenge progress rendering
website/views/teams.py
Prefetches team_participants; introduces circumference constant; simplifies per-challenge progress calculation to set progress only when user's team participates, and updates stroke_dasharray/stroke_dashoffset accordingly; streamlines render context.
User view: screenshots, wallets, contributors, leaderboard, wallet creation
website/views/user.py
Bulk-fetches issue screenshots and map to activities; changes wallet lookup to filter(...).first(); consolidates bug-type aggregation into a single queryset; followers/following use select_related; contributors fetched via in_bulk (skip missing), compute impact/level from pre-fetched contributors; leaderboard uses in_bulk for users/profiles; wallet creation switched from per-user get_or_create to bulk_create for missing wallets.
Project — contributor bulk fetch
website/views/project.py
Collects contributor IDs from stats_query, loads contributors via in_bulk, uses contributors.get(...) when building stats and skips stats whose contributor is missing, removing per-item Contributor.objects.get calls.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Focus areas requiring extra attention:
    • website/views/issue.py — correctness of profile mapping, decorator effects on access, and fields included in bulk_update.
    • website/views/organization.py — per-domain email formatting/selection, handling when domain.email is missing, and scope of SMTPException handling.
    • website/views/user.pyin_bulk usage correctness, behavior change from .get().first(), and bulk_create semantics with unique constraints.
    • website/views/ossh.py — correctness of ALLOWED_NORMALIZED_TAGS membership checks and revised relevance/tag-language weight calculations.
    • website/views/hackathon.py — verify annotation filter excludes bots correctly and that merged-at window boundaries match previous logic.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Linked Issues check ❓ Inconclusive PR implements most critical and high-priority optimizations from #5277: bulk wallet creation, select_related for followers, in_bulk leaderboard fetching, annotated counts for hackathons/organizations, bulk issue updates, and prefetching. However, CI failure due to field-name typo (githubissue vs github_issues) in hackathon.py prevents full verification. Fix the field-name typo in hackathon.py annotation (githubissue → github_issues) and related lookups to resolve the test_hackathon_leaderboard failure before merge.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title accurately describes the main objective: fixing N+1 queries and optimizing bulk updates across multiple modules (contributor, hackathon, etc).
Out of Scope Changes check ✅ Passed Changes are focused on performance optimizations targeting #5277 objectives: N+1 query elimination, bulk updates, prefetching, and annotations. All modifications align with the stated scope across user, organization, issue, hackathon, teams, ossh, and project modules.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

📊 Monthly Leaderboard

Hi @Nachiket-Roy! Here's how you rank for December 2025:

Rank User PRs Reviews Comments Total
🥇 #1 @Nachiket-Roy 18 24 54 384
#2 @DonnieBLT 9 25 29 248

Leaderboard based on contributions in December 2025. Keep up the great work! 🚀

@github-actions github-actions bot added the pre-commit: passed Pre-commit checks passed label Dec 17, 2025
@Nachiket-Roy Nachiket-Roy marked this pull request as draft December 17, 2025 18:37
@github-actions github-actions bot added the tests: failed Django tests failed label Dec 17, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
website/views/ossh.py (1)

278-305: Bug: discussion_channel_recommender treats tag counts as languages

user_tags is a list of (tag, count) pairs (as produced by preprocess_user_data and used elsewhere), but here it’s unpacked as (tag, lang):

  • language_weight = sum(language_weights.get(lang, 0) for tag, lang in user_tags if tag in channel_tag_names)
  • matching_languages = [lang for tag, lang in user_tags if tag in channel_tag_names and lang in language_weights]

Since the second element is an integer count, not a language string, language weights and matches will almost always be zero/empty, degrading recommendations.

A safer approach is to intersect channel tag names with language_weights keys.

 def discussion_channel_recommender(user_tags, language_weights, top_n=5):
-    matching_channels = (
-        OsshDiscussionChannel.objects.filter(Q(tags__name__in=[tag[0] for tag in user_tags]))
-        .distinct()
-        .prefetch_related("tags")
-    )
+    tag_names = [tag for tag, _ in user_tags]
+    matching_channels = (
+        OsshDiscussionChannel.objects.filter(Q(tags__name__in=tag_names))
+        .distinct()
+        .prefetch_related("tags")
+    )
@@
-    recommended_channels = []
-    for channel in matching_channels:
-        channel_tag_names = {tag.name for tag in channel.tags.all()}
-        tag_matches = sum(1 for tag, _ in user_tags if tag in channel_tag_names)
-
-        language_weight = sum(language_weights.get(lang, 0) for tag, lang in user_tags if tag in channel_tag_names)
+    recommended_channels = []
+    tag_weight_map = dict(user_tags)
+
+    for channel in matching_channels:
+        channel_tag_names = {tag.name for tag in channel.tags.all()}
+
+        # Number of user tags present on this channel
+        tag_matches = sum(1 for tag in channel_tag_names if tag in tag_weight_map)
+
+        # Sum weights for languages that appear as tags on this channel
+        language_weight = sum(language_weights.get(tag, 0) for tag in channel_tag_names)
@@
-        if relevance_score > 0:
-            matching_tags = [tag.name for tag in channel.tags.all() if tag.name in dict(user_tags)]
-            matching_languages = [
-                lang for tag, lang in user_tags if tag in channel_tag_names and lang in language_weights
-            ]
+        if relevance_score > 0:
+            matching_tags = [tag for tag in channel_tag_names if tag in tag_weight_map]
+            matching_languages = [lang for lang in channel_tag_names if lang in language_weights]
🧹 Nitpick comments (4)
website/views/teams.py (1)

269-299: TeamChallenges progress logic and prefetch look solid

The prefetch on team_participants plus the per-challenge progress calculation is correct and removes the obvious N+1. Using a single circumference constant also simplifies the stroke calculations. If you ever need to squeeze more perf, you could avoid repeated challenge.team_participants.all() membership checks by building a set of participant IDs per challenge first, but that’s optional here.

website/views/hackathon.py (1)

228-255: Efficient merged PR counting via annotation looks correct

The merged_pr_count annotation on hackathon.repositories correctly applies the merged‑window and bot filters and removes the per‑repo counting query. The subsequent use of repo.merged_pr_count in repos_with_pr_counts is consistent with that change. The earlier repositories = hackathon.repositories.all() assignment is now redundant and can be dropped for cleanliness.

website/views/ossh.py (1)

93-108: Normalized tag set usage is fine

Deriving ALLOWED_NORMALIZED_TAGS from TAG_NORMALIZATION.values() and checking membership against it is a reasonable way to keep tag handling consistent and efficient within preprocess_user_data. You could hoist this set to module scope to avoid rebuilding it on every call, but that’s just a micro‑optimization.

website/views/organization.py (1)

237-281: Weekly report prefetch + in‑memory aggregation are appropriate

Prefetching issue_set with a restricted field set and then computing open/closed/total counts plus per‑issue lines in Python is a good way to remove the per‑domain issue queries. One thing to consider is skipping or logging domains that have no valid domain.email before calling send_mail, to avoid attempting deliveries to empty/invalid recipients.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between cd7312d and 92f28df.

📒 Files selected for processing (6)
  • website/views/hackathon.py (1 hunks)
  • website/views/issue.py (2 hunks)
  • website/views/organization.py (1 hunks)
  • website/views/ossh.py (2 hunks)
  • website/views/teams.py (1 hunks)
  • website/views/user.py (5 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
website/views/teams.py (1)
website/models.py (1)
  • Challenge (2038-2060)
website/views/hackathon.py (1)
website/api/views.py (1)
  • repositories (706-716)
website/views/organization.py (1)
website/models.py (4)
  • Domain (386-484)
  • Issue (590-739)
  • open_issues (419-420)
  • closed_issues (423-424)
website/views/ossh.py (1)
website/models.py (1)
  • OsshDiscussionChannel (2460-2473)
website/views/issue.py (2)
website/models.py (3)
  • Issue (590-739)
  • UserProfile (896-1109)
  • GitHubIssue (2090-2328)
website/api/views.py (9)
  • get (282-287)
  • get (341-346)
  • get (365-368)
  • get (462-475)
  • get (488-494)
  • get (550-576)
  • get (609-633)
  • filter (375-414)
  • filter (844-911)
website/views/user.py (1)
website/models.py (6)
  • IssueScreenshot (780-783)
  • Issue (590-739)
  • Wallet (1152-1173)
  • Contributor (1351-1363)
  • Points (849-859)
  • UserProfile (896-1109)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Run Tests
  • GitHub Check: docker-test
🔇 Additional comments (7)
website/views/issue.py (1)

2636-2678: Excellent bulk update optimization for PR-profile mapping.

This refactoring effectively eliminates N+1 queries by:

  1. Collecting all GitHub URLs from PRs without profiles (lines 2647-2658)
  2. Bulk fetching matching UserProfiles using in_bulk pattern (line 2661)
  3. Building a list of PRs to update in memory (lines 2663-2676)
  4. Applying all updates in a single bulk_update call (line 2678)

This is a significant performance improvement over the previous per-PR save pattern and aligns well with the PR objectives.

Minor observation: The code safely handles cases where user_profile is None (line 2671 check), ensuring robustness when profiles don't exist for all GitHub URLs.

website/views/user.py (6)

440-441: LGTM: Efficient screenshot bulk fetch and mapping.

Lines 440-441 replace per-activity screenshot queries with a single bulk fetch using select_related("issue") and dictionary comprehension. This eliminates N+1 queries when displaying activity screenshots and aligns perfectly with the PR's performance objectives.


460-475: Excellent consolidation of profile-related queries.

This section implements multiple optimizations:

  • Lines 460-462: Build bug_qs once and reuse it in the loop, avoiding repeated base query construction
  • Lines 465-466: Use select_related("user") to fetch follower users in one query, then extract to list
  • Lines 469-470: Same optimization for following relationships
  • Line 472: Derive emails from already-fetched follower data instead of querying each profile
  • Line 475: Simplified tag access using direct relationship

These changes collectively eliminate multiple N+1 query patterns and significantly reduce database round-trips for profile pages with many followers/following.


1000-1030: Excellent leaderboard optimization using in_bulk pattern.

Lines 1000-1030 transform the leaderboard query from N queries to just 3:

  1. Lines 1003-1008: Single annotated query for points aggregation
  2. Line 1012: Bulk fetch users with in_bulk(user_ids)
  3. Line 1013: Bulk fetch profiles as a dictionary keyed by user_id

The loop (lines 1015-1030) then assembles leaderboard entries entirely from pre-fetched data, with safe handling for missing users or profiles (line 1021 check).

This pattern is consistent with the contributor_stats optimization and represents a significant performance improvement for leaderboard rendering.


1063-1067: Optimal wallet creation using bulk_create.

This function replaces a looped get_or_create pattern with a single bulk_create operation:

  1. Line 1063: Efficiently identify existing wallet user IDs using values_list
  2. Line 1064: Filter users without wallets using exclusion
  3. Line 1065: Build list of Wallet instances in memory
  4. Line 1066: Create all wallets in a single database operation

This is a textbook example of bulk operation optimization and aligns perfectly with the PR's stated objective to "replace per-user get_or_create() wallet loop with bulk_create."


946-983: Strong in_bulk optimization with proper defensive handling.

This refactoring replaces per-contributor queries with a bulk fetch pattern:

  1. Line 946: Extract contributor IDs from aggregated stats
  2. Line 948: Fetch all contributors at once using in_bulk(contributor_ids)
  3. Lines 950-983: Loop through stats and retrieve pre-fetched contributors from the map
  4. Lines 952-953: Skip stats for contributors not found in the map

The safe handling of missing contributors (lines 952-953) is good defensive programming. In practice, orphaned stats should not occur since the ContributorStats model defines the contributor FK with on_delete=models.CASCADE, meaning contributors cannot be deleted without their stats being automatically removed. The .get() check safeguards against edge cases like data corruption or race conditions.

The impact score calculation and level assignment logic remains unchanged and correct.


447-447: The wallet context variable set in this view is never used in any template. No template accesses it via {{ wallet }} or conditional checks, so returning None from filter().first() poses no risk to template rendering. The change from .get() to .filter().first() is safe and improves error handling by avoiding DoesNotExist exceptions.

@github-project-automation github-project-automation bot moved this from Backlog to Ready in 📌 OWASP BLT Project Board Dec 17, 2025
@github-actions github-actions bot added the changes-requested PR has requested changes from a reviewer label Dec 17, 2025
@github-actions github-actions bot removed the changes-requested PR has requested changes from a reviewer label Dec 17, 2025
@Nachiket-Roy Nachiket-Roy marked this pull request as ready for review December 17, 2025 19:11
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
website/views/organization.py (2)

238-243: Prefetch optimization looks good.

The use of prefetch_related with a restricted field set effectively addresses the N+1 query issue mentioned in the PR objectives. The selected fields match those used in the report generation loop.

Minor optimization: domain_id in the only() clause may be redundant since you're already iterating over domain.issue_set.all(), which implicitly filters by domain.


247-260: Consider the memory trade-off of Python-side filtering.

The current approach loads all issues for each domain into memory, then filters by status in Python. For domains with thousands of issues, this could be memory-intensive.

Consider using database-level aggregation instead:

-    domains = Domain.objects.prefetch_related(
-        Prefetch(
-            "issue_set",
-            queryset=Issue.objects.only("description", "views", "label", "status", "domain_id"),
-        )
-    )
+    domains = Domain.objects.prefetch_related(
+        Prefetch(
+            "issue_set",
+            queryset=Issue.objects.only("description", "views", "label", "status"),
+        )
+    ).annotate(
+        open_count=Count("issue", filter=Q(issue__status="open")),
+        closed_count=Count("issue", filter=Q(issue__status="closed"))
+    )

Then use the annotated counts in the report:

open_issues = [i for i in issues if i.status == "open"]
closed_issues = [i for i in issues if i.status == "closed"]
# Use domain.open_count and domain.closed_count for the counts

This way you get both efficient counts and the actual issue objects for iteration.

website/views/issue.py (2)

95-138: LGTM with minor suggestions.

The prefetch_related optimization and toggle logic are well-implemented. The past review concerns about issue.userprofile_set have been correctly resolved (line 137 now uses UserProfile.objects.filter()).

However, two minor points:

  1. Line 137: total_votes is computed but never used or returned. Consider removing this line or returning the value if needed by the caller.
  2. Lines 128-134: Email is sent synchronously, which may slow down the response. For better performance, consider using Django's async email sending or a task queue (Celery) for notifications.

142-166: LGTM, but clean up unused variables.

The prefetch_related optimization mirrors like_issue correctly, addressing the past review concern. However:

  1. Lines 158, 161: is_disliked is assigned but never used.
  2. Line 164: total_votes is computed but never returned or used.

Consider removing these unused variables to keep the code clean.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 92f28df and eac5e70.

📒 Files selected for processing (4)
  • website/views/hackathon.py (1 hunks)
  • website/views/issue.py (2 hunks)
  • website/views/organization.py (1 hunks)
  • website/views/ossh.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • website/views/hackathon.py
🧰 Additional context used
🧬 Code graph analysis (3)
website/views/organization.py (1)
website/models.py (4)
  • Domain (386-484)
  • Issue (590-739)
  • open_issues (419-420)
  • closed_issues (423-424)
website/views/issue.py (2)
website/models.py (3)
  • Issue (590-739)
  • UserProfile (896-1109)
  • GitHubIssue (2090-2328)
website/api/views.py (9)
  • get (282-287)
  • get (341-346)
  • get (365-368)
  • get (462-475)
  • get (488-494)
  • get (550-576)
  • get (609-633)
  • filter (375-414)
  • filter (844-911)
website/views/ossh.py (1)
website/models.py (1)
  • OsshDiscussionChannel (2460-2473)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Run Tests
  • GitHub Check: docker-test
🔇 Additional comments (4)
website/views/organization.py (1)

270-272: Good fix for the None email bug.

The email validation guard correctly addresses the issue identified in the past review comments where domain.email could be None, causing a TypeError in send_mail.

The implementation properly:

  • Checks for missing email before attempting to send
  • Logs a warning for visibility
  • Continues processing other domains instead of failing entirely
website/views/ossh.py (3)

94-94: Minor optimization: pre-compute allowed tags set.

Creating ALLOWED_NORMALIZED_TAGS once per function call avoids repeated .values() calls and provides O(1) membership testing.


279-282: Effective query optimization with prefetch_related.

The addition of prefetch_related("tags") eliminates N+1 queries when accessing channel tags in the loop. The filter by tag_names and use of distinct() are both appropriate for M2M relationships.


297-312: Improved matching logic using set operations.

The refactored logic using channel_tag_names as a set improves clarity and efficiency. The derivation of matching_tags and matching_languages through intersection is clean and correct.

However, this code will only work after fixing the critical relevance_score issue flagged in the previous comment.

@Nachiket-Roy Nachiket-Roy marked this pull request as draft December 17, 2025 19:17
@Nachiket-Roy Nachiket-Roy marked this pull request as ready for review December 17, 2025 20:17
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
website/views/issue.py (2)

95-157: Like/dislike toggle logic is sound; prefetch usage could be simplified

The new like/dislike implementation looks correct:

  • Fetches UserProfile once per request.
  • Ensures upvotes and downvotes are mutually exclusive.
  • Sends the notification email only on a new upvote and only when issue.user exists, which avoids noisy repeats.

A small performance nit: UserProfile.objects.prefetch_related("issue_upvoted", "issue_downvoted") plus userprof.issue_* .filter(pk=issue.pk).exists() still hits the DB for the .filter() checks, so the prefetch doesn’t buy you much here. If this endpoint is on a hot path, you could either:

  • Drop prefetch_related and just use UserProfile.objects.get(user=request.user) (simpler, fewer queries), or
  • Leverage the prefetched collections directly (e.g., check membership on userprof.issue_upvoted.all() / issue_downvoted.all()), accepting the in‑memory iteration trade‑off.

Not blocking, but worth considering given the PR’s performance focus.


2635-2671: Bulk linking PRs to UserProfiles is correct; minor reuse and normalization opportunities

This block cleanly switches from per‑PR updates to a per‑repo bulk update:

  • Only processes repos that actually exist in Repo.
  • Restricts to merged PRs since since_date that already have a contributor but lack user_profile.
  • Collects contributor GitHub URLs, fetches all matching UserProfile rows in one query, and then bulk_updates only PRs with a mapped profile.

Two minor follow‑ups you might consider:

  • You already compute repo_objs = Repo.objects.filter(name__in=[...]) earlier; using a {r.name: r for r in repo_objs} map here would avoid one extra Repo.objects.filter(name=repo_name).first() query per repo.
  • Matching is done on raw contributor.github_url; if any profiles store GitHub URLs with different normalization (trailing slash, case), a simple normalization step (e.g., lowercasing and stripping trailing / on both sides) before building github_urls / profiles_map would make the linking more robust.

Functionally this achieves the PR objective of batching PR→profile linkage.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between eac5e70 and b620e9d.

📒 Files selected for processing (3)
  • website/views/issue.py (2 hunks)
  • website/views/organization.py (1 hunks)
  • website/views/ossh.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
website/views/ossh.py (1)
website/models.py (1)
  • OsshDiscussionChannel (2460-2473)
website/views/organization.py (1)
website/models.py (4)
  • Domain (386-484)
  • Issue (590-739)
  • open_issues (419-420)
  • closed_issues (423-424)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Run Tests
  • GitHub Check: docker-test
🔇 Additional comments (4)
website/views/ossh.py (4)

279-282: Good optimization to eliminate N+1 queries.

The query now uses prefetch_related("tags") to fetch all channel tags in a single query, avoiding N+1 when accessing channel.tags.all() in the loop. This aligns well with the PR's performance objectives.


285-286: Good optimization for fast lookups.

Converting user_tags to a dictionary enables O(1) lookups in the subsequent loop instead of O(n) list scans.


288-296: Correctly fixes the critical relevance_score bug.

The relevance_score variable is now properly calculated at line 296 before being used in the conditional at line 298. This resolves the NameError that previous reviewers flagged.

The logic correctly:

  • Extracts channel tag names from prefetched tags
  • Counts matching tags between channel and user
  • Sums language weights for channel tags that match user languages
  • Combines both into a single relevance score

299-300: Efficient extraction of matching tags and languages.

The list comprehensions efficiently identify which specific tags and languages matched, using the optimized tag_weight_map and language_weights dictionaries for fast lookups.

@Nachiket-Roy
Copy link
Contributor Author

@coderabbitai could you also clarify why the tests failed?

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 17, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Fetch PR check runs to see which tests failed
gh pr checks 5333 --repo OWASP-BLT/BLT

echo "---"
echo "Fetching detailed check run logs..."
echo "---"

# Get the PR check runs with more details
gh api repos/OWASP-BLT/BLT/pulls/5333/checks --jq '.check_runs[] | {name: .name, status: .status, conclusion: .conclusion, started_at: .started_at, completed_at: .completed_at, html_url: .html_url}'

Length of output: 2665


🏁 Script executed:

#!/bin/bash
# Get logs from the failed test run
gh run view 20316051934 --repo OWASP-BLT/BLT --log --job 58360238056 | tail -500

Length of output: 59013


🏁 Script executed:

#!/bin/bash
# Fetch the hackathon.py file to see the problematic annotation at line 227
cat website/views/hackathon.py | head -250 | tail -40

Length of output: 1482


@Nachiket-Roy: The test failed due to a typo in the field name in website/views/hackathon.py at line 227.

Root Cause

The code is using "githubissue" (singular, no underscore) but the correct field name is "github_issues" (plural, with underscore).

Error:

django.core.exceptions.FieldError: Cannot resolve keyword 'githubissue' into field.
Choices are: ... github_issues, ...

The Problem

In website/views/hackathon.py starting at line 227:

repositories = hackathon.repositories.annotate(
    merged_pr_count=Count(
        "githubissue",  # ❌ Wrong - should be "github_issues"
        filter=Q(
            githubissue__type="pull_request",  # ❌ Wrong
            githubissue__is_merged=True,  # ❌ Wrong
            githubissue__merged_at__gte=hackathon.start_time,  # ❌ Wrong
            githubissue__merged_at__lte=hackathon.end_time,  # ❌ Wrong
        )
        & ~Q(githubissue__contributor__contributor_type="Bot")  # ❌ Wrong
        & ~Q(githubissue__contributor__name__endswith="[bot]")  # ❌ Wrong
        & ~Q(githubissue__contributor__name__icontains="bot"),  # ❌ Wrong
    )
)

Fix Required

Replace all occurrences of githubissue with github_issues in that annotation:

repositories = hackathon.repositories.annotate(
    merged_pr_count=Count(
        "github_issues",  # ✅ Correct
        filter=Q(
            github_issues__type="pull_request",  # ✅ Correct
            github_issues__is_merged=True,  # ✅ Correct
            github_issues__merged_at__gte=hackathon.start_time,  # ✅ Correct
            github_issues__merged_at__lte=hackathon.end_time,  # ✅ Correct
        )
        & ~Q(github_issues__contributor__contributor_type="Bot")  # ✅ Correct
        & ~Q(github_issues__contributor__name__endswith="[bot]")  # ✅ Correct
        & ~Q(github_issues__contributor__name__icontains="bot"),  # ✅ Correct
    )
)

Once you fix this typo, the test_hackathon_leaderboard test should pass.

@github-actions github-actions bot added changes-requested PR has requested changes from a reviewer files-changed: 7 PR changes 7 files and removed files-changed: 6 PR changes 6 files labels Dec 17, 2025
@github-actions github-actions bot removed the changes-requested PR has requested changes from a reviewer label Dec 17, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 871f8d1 and 128d2b8.

📒 Files selected for processing (3)
  • website/views/issue.py (2 hunks)
  • website/views/organization.py (4 hunks)
  • website/views/project.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
website/views/project.py (1)
website/models.py (1)
  • Contributor (1351-1363)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Run Tests
  • GitHub Check: docker-test
🔇 Additional comments (7)
website/views/project.py (1)

1266-1273: LGTM! Efficient bulk contributor fetch eliminates N+1 queries.

The refactor replaces per-stat Contributor.objects.get() calls with a single bulk fetch via in_bulk(), then uses dictionary lookup. Skipping stats with missing contributors is appropriate and prevents errors.

website/views/issue.py (4)

103-107: LGTM! Efficient M2M membership checks using .exists().

The refactor from issue in userprof.issue_downvoted.all() to userprof.issue_downvoted.filter(pk=issue.pk).exists() avoids loading entire querysets just to check membership, eliminating unnecessary database overhead. This pattern is correctly applied at both lines 103 and 107.


112-129: LGTM! Email notification sent only on new upvotes with proper validation.

The email logic correctly:

  • Sends only when adding a new upvote (line 110 else branch), not when removing
  • Validates both issue.user and issue.user.email exist (line 113) before calling send_mail

142-146: LGTM! Logic corrected and M2M checks optimized.

The function now correctly:

  • Removes upvote if present (line 142-143)
  • Toggles downvote (line 146-149)

Both membership checks use the efficient .exists() pattern consistent with like_issue.


2629-2672: LGTM! Efficient bulk update pattern replaces per-PR saves.

The refactor correctly:

  • Uses pr.contributor.github_url directly (lines 2641-2649), addressing past review comments about incorrect URL parsing
  • Filters out bot accounts with safe null checks (lines 2644-2648)
  • Bulk-fetches UserProfile objects once (line 2652) instead of per-PR queries
  • Uses dictionary lookup profiles_map.get() for O(1) profile retrieval (line 2658)
  • Applies updates via bulk_update() (line 2665) instead of per-PR .save() calls

This eliminates N+1 queries and reduces database writes, consistent with the PR's performance objectives.

website/views/organization.py (2)

239-296: LGTM! All critical issues from previous reviews have been resolved.

The weekly report function now correctly:

  • Requires staff authentication via @staff_member_required
  • Imports and handles SMTPException properly
  • Uses per-domain try/except blocks to prevent one failure from stopping the entire batch
  • Validates domain.email before sending
  • Optimizes queries with annotate() and prefetch_related()

The optimization effectively eliminates N+1 queries by computing counts in SQL and prefetching filtered issues in a single query per batch.


349-355: Excellent use of in_bulk to eliminate N+1 queries.

The code now fetches all required users in a single query using in_bulk() and retrieves them from an in-memory dictionary during iteration, avoiding per-user database hits. The graceful handling of missing users with continue ensures robustness.

@github-actions github-actions bot added the last-active: 0d PR last updated 0 days ago label Dec 18, 2025
@github-actions github-actions bot removed the last-active: 0d PR last updated 0 days ago label Dec 18, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 128d2b8 and 3dbd715.

📒 Files selected for processing (1)
  • website/views/organization.py (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Run Tests
  • GitHub Check: docker-test
🔇 Additional comments (4)
website/views/organization.py (4)

10-10: LGTM: Required imports added.

The addition of SMTPException and staff_member_required imports addresses critical issues from previous reviews.

Also applies to: 17-17


239-296: Good refactoring: N+1 queries eliminated and previous concerns addressed.

The weekly_report refactoring successfully addresses multiple performance and security issues from previous reviews:

  • ✅ Authentication required via @staff_member_required (line 239)
  • ✅ Per-domain SMTP error handling prevents cascade failures (lines 292-294)
  • ✅ Skips domains without email addresses (lines 257-259)
  • ✅ Uses annotations (open_count, closed_count) to avoid Python-side counting (lines 241-243, 263-265)
  • ✅ Uses prefetch_related with filtered queryset to reduce queries (lines 244-250)

The only remaining concern is the issue_set access pattern at line 261 (see separate comment).


314-317: LGTM: bulk_update moved outside loop.

The bulk_update call is now correctly placed outside the loop (line 317), eliminating N database calls and addressing the critical issue from previous reviews. This changes the update pattern from one query per issue to a single batched query for all issues.


349-356: Good optimization: in_bulk eliminates per-user queries.

The code now fetches winner/runner users in a single query using in_bulk (line 350), replacing the previous pattern of repeated User.objects.get() calls. This reduces the query count from N to 1 for user retrieval and addresses the N+1 concern from previous reviews.

The logic correctly fetches only the first 3 user IDs (line 349) since the loop only processes winners at indices 1-3 and breaks at index 4.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
website/views/organization.py (1)

340-341: Potential bug: Orphaned issue.save() in publish branch.

Line 341 calls issue.save() on the issue variable from the loop that ended at line 336. This saves whichever issue happened to be last in the iteration over request.POST.items(), which doesn't logically relate to the "publish" action or winner creation that follows.

This appears to be leftover code. Either remove it, or if a specific issue needs to be saved here, clarify which one and why.

🔎 Apply this diff to remove the orphaned save:
     if request.POST["submit"] == "save":
         pass
     elif request.POST["submit"] == "publish":
-        issue.save()
         winner = Winner()
🧹 Nitpick comments (1)
website/views/organization.py (1)

314-318: Good: bulk_update moved outside loop, fixing N+1 writes.

The bulk_update at line 318 is now correctly placed outside the loop, addressing the critical issue flagged in previous reviews. This eliminates N database writes and replaces them with a single batch operation.

Note: Lines 320-336 still call issue.save() individually for each updated issue. This is better than before (the initial reset is batched), but a fully optimal approach would collect all changes in memory during the loop and perform a single bulk_update at the end covering both the reset and the POST updates.


Based on previous review identifying bulk_update inside the loop as a critical performance issue.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 3dbd715 and 700ee3a.

📒 Files selected for processing (1)
  • website/views/organization.py (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
website/views/organization.py (1)
website/models.py (2)
  • Domain (386-484)
  • Issue (590-739)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Run Tests
  • GitHub Check: docker-test
🔇 Additional comments (3)
website/views/organization.py (3)

10-10: LGTM: Required imports added.

These imports correctly support the security and error-handling improvements in weekly_report:

  • SMTPException is now properly imported and caught at line 293
  • staff_member_required decorator protects the endpoint at line 239

Both address critical issues flagged in previous reviews.

Also applies to: 17-17


239-297: Excellent refactor: All critical issues from previous reviews resolved.

This implementation comprehensively addresses every concern raised in past reviews:

Security (Critical): @staff_member_required decorator added (line 239), preventing unauthorized mass email sending

Error handling (Critical): SMTPException is now imported (line 10) and caught inside the per-domain loop (line 293), so SMTP failures no longer abort the entire run

Data consistency (Major): Prefetch uses to_attr="filtered_issues" (line 250) and is accessed correctly (line 262), ensuring the report's issue list matches the header counts

Optimization (Major): Annotated open_count and closed_count (lines 242-243) are now used (lines 264-265) instead of redundant Python list comprehensions

Validation (High): Email check (line 258) now skips domains without email before attempting to send

The refactor eliminates N+1 queries via annotations and prefetching while fixing all previously identified bugs. Well done!


Based on previous review feedback indicating multiple critical and major issues in this function.


350-356: Excellent: Batch user fetch eliminates N+1 queries.

Replacing per-row User.objects.get() calls with in_bulk() (line 351) eliminates N queries. The .get(obj["user"]) pattern (line 354) with the None check (lines 355-356) safely handles missing users without raising exceptions.

This directly addresses the N+1 issue in organization.py:348 mentioned in the PR objectives and previous review comments.


Based on PR objectives noting "per-row User.objects.get() in organization.py:348; recommended in_bulk() for users."

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
website/views/organization.py (1)

316-319: Critical fix: bulk_update now correctly placed outside the loop.

The bulk_update() call is now properly positioned after the loop completes, ensuring all issues are modified in memory first and then updated in a single database operation. This addresses the previous critical N+1 issue.

Optional micro-optimization:

🔎 Guard against empty list
     for issue in issues:
         issue.verified = False
         issue.score = 0
-    Issue.objects.bulk_update(issues, ["verified", "score"])
+    if issues:
+        Issue.objects.bulk_update(issues, ["verified", "score"])

While Django's bulk_update() handles empty lists gracefully, explicitly checking avoids an unnecessary database call.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between deb5785 and ef7c19c.

📒 Files selected for processing (1)
  • website/views/organization.py (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
website/views/organization.py (1)
website/models.py (3)
  • Domain (386-484)
  • Issue (590-739)
  • Winner (832-846)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Run Tests
  • GitHub Check: docker-test
  • GitHub Check: remove_last_active_label
🔇 Additional comments (2)
website/views/organization.py (2)

240-298: Excellent work addressing all critical security and correctness issues!

This refactor successfully resolves all previously identified critical problems:

Authentication: Added @staff_member_required to prevent unauthorized mass email sending
Exception handling: SMTPException is now imported and caught per-domain (not aborting the entire batch)
Prefetch usage: Correctly using to_attr="filtered_issues" and accessing the cached queryset
Email validation: Skipping domains without email addresses
Annotations: Using open_count and closed_count efficiently
Consistency: Counts and listed issues both use the same open/closed filter

The implementation is clean, efficient, and correctly handles errors without stopping the batch process.


342-368: Excellent optimization using in_bulk() to eliminate N+1 queries.

The refactor correctly:

  • Wraps winner assignment in transaction.atomic() for data consistency
  • Aggregates and orders users by total score in SQL
  • Fetches exactly the top 3 users in a single in_bulk() call (previously was User.objects.get() per winner)
  • Guards each winner assignment with len(top_users) > checks

This eliminates the N+1 query pattern and batches user access as intended by the PR objectives.

Note: If a user is deleted between the aggregation query and the in_bulk() call, user_map.get() will return None and the corresponding winner field will be set to None. This is acceptable behavior for a rare edge case.

@Nachiket-Roy
Copy link
Contributor Author

@coderabbitai why the migration label test failed?

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 18, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Fetch all PR checks to see the migration label test status
echo "=== PR Check Runs ==="
gh pr checks 5333 --repo OWASP-BLT/BLT

echo ""
echo "=== Detailed check runs with conclusions ==="
gh api repos/OWASP-BLT/BLT/pulls/5333/checks --jq '.check_runs[] | {name: .name, status: .status, conclusion: .conclusion, html_url: .html_url}'

echo ""
echo "=== Checking for migration-related checks ==="
gh api repos/OWASP-BLT/BLT/pulls/5333/checks --jq '.check_runs[] | select(.name | contains("migration") or contains("Migration") or contains("label")) | {name: .name, status: .status, conclusion: .conclusion, output: .output}'

Length of output: 2350


🏁 Script executed:

#!/bin/bash
# Get logs from the failed add_migrations_label job
echo "=== Fetching logs for add_migrations_label job ==="
gh run view 20342691116 --repo OWASP-BLT/BLT --log --job 58446222163

echo ""
echo "=== Looking for the workflow file that defines add_migrations_label ==="
fd -e yml -e yaml . .github/workflows/ --exec grep -l "add_migrations_label" {}

Length of output: 26183


@Nachiket-Roy: The migration label test failed due to a GitHub API issue, not a problem with your code.

Root Cause

The test tried to fetch the list of changed files from GitHub's API but received a 500 Internal Server Error:

GET /repos/OWASP-BLT/BLT/pulls/5333/files?per_page=100 - 500 in 10040ms
RequestError [HttpError]: fetch failed
  status: 500,

What This Test Does

The add_migrations_label workflow:

  1. Fetches all files changed in your PR
  2. Checks if any migration files exist (matching pattern: website/migrations/*.py or comments/migrations/*.py)
  3. Adds a migrations label if found, removes it if not

Why It Failed

GitHub's API experienced a temporary server-side failure when trying to list your PR's files. The request timed out after 10 seconds and returned HTTP 500 (server error).

Solution

Re-run the workflow - this is a transient infrastructure issue that should resolve on retry:

  1. Go to the Actions tab
  2. Click "Re-run failed jobs" or "Re-run all jobs"

The workflow should pass on the next attempt once GitHub's API is responding normally.


Note: Your PR doesn't modify any migration files, so the label won't be added anyway. This test just validates that the labeling automation is working correctly.

@github-actions github-actions bot added the last-active: 0d PR last updated 0 days ago label Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

files-changed: 7 PR changes 7 files last-active: 0d PR last updated 0 days ago needs-peer-review PR needs peer review pre-commit: passed Pre-commit checks passed quality: high quality: medium tests: passed Django tests passed

Projects

Status: Ready

Development

Successfully merging this pull request may close these issues.

Multiple N+1 Query Performance Issues Across User, Organization, Issue & Hackathon Modules

1 participant