Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@rinkitadhana
Copy link
Contributor

@rinkitadhana rinkitadhana commented Apr 9, 2025

fixes #4122

Changes this PR Introduced:

This PR addresses an issue where raw HTML tags were displaying in blog posts instead of being properly rendered. Two main changes were made:

  1. Added markdown extensions in the PostDetailView to properly handle HTML elements:
  • Added fenced code, tables, and newline-to-br extensions
  • This allows markdown to correctly parse HTML within post content
  1. Added the Django 'safe' filter in the post_detail template:
  • Prevents Django from auto-escaping HTML entities
  • Now renders the content as intended HTML rather than displaying raw tags

Before:

image

After:

image

Summary by CodeRabbit

  • Bug Fixes

    • Improved security for blog post content by sanitizing rendered Markdown, preventing unsafe HTML from being displayed.
  • New Features

    • Enhanced Markdown support with features like fenced code blocks, tables, and line breaks in blog posts.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Apr 9, 2025

"""

Walkthrough

The get_object method in the PostDetailView class within website/views/blog.py has been updated to improve Markdown rendering and HTML sanitization. The new implementation renders Markdown content with additional extensions for fenced code blocks, tables, and automatic line breaks, then sanitizes the resulting HTML using the bleach library with a specified whitelist of allowed tags, attributes, and protocols. The sanitized HTML replaces the original post content before returning the post object. Additionally, the blog post template was modified to apply the safe filter to the post content, allowing the sanitized HTML to be rendered correctly. The django-bleach dependency was added to the project.

Changes

File(s) Change Summary
website/views/blog.py Enhanced Markdown rendering with extensions and added HTML sanitization using bleach for blog posts.
website/templates/blog/post_detail.html Modified template to render post.content as safe HTML using the safe filter.
pyproject.toml Added django-bleach dependency version ^3.1.0 to project dependencies.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant PostDetailView
    participant Markdown
    participant Bleach
    participant DjangoTemplate

    User->>PostDetailView: Request blog post
    PostDetailView->>Markdown: Render post.content with extensions
    Markdown-->>PostDetailView: Return rendered HTML
    PostDetailView->>Bleach: Sanitize HTML with allowed tags/attributes/protocols
    Bleach-->>PostDetailView: Return sanitized HTML
    PostDetailView->>DjangoTemplate: Pass sanitized HTML marked as safe
    DjangoTemplate-->>User: Display formatted blog post
Loading

Assessment against linked issues

Objective Addressed Explanation
Render raw HTML tags (e.g., <p>, <strong>) as stylized HTML in blog posts instead of plain text (#4122)
"""

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d71ecc2 and a25d7f8.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (3)
  • pyproject.toml (1 hunks)
  • website/templates/blog/post_detail.html (1 hunks)
  • website/views/blog.py (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • pyproject.toml
🚧 Files skipped from review as they are similar to previous changes (1)
  • website/views/blog.py
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Run Tests
  • GitHub Check: docker-test
🔇 Additional comments (1)
website/templates/blog/post_detail.html (1)

25-25:

Details

✅ Verification successful

XSS Risk: Unsafe safe filter usage without sanitization.

Applying {{ post.content | safe }} will render any embedded HTML—including malicious scripts—directly in the page. Unless the view layer sanitizes post.content (e.g., via bleach.clean with a strict whitelist), this opens a cross-site scripting (XSS) vulnerability.

Please ensure you sanitize the Markdown-rendered HTML before marking it safe. For example:

  1. Sanitize in the view:

    import bleach
    
    ALLOWED_TAGS = [...]
    ALLOWED_ATTRS = {...}
    
    class PostDetailView(DetailView):
        ...
        def get_object(self):
            post = super().get_object()
            html = markdown.markdown(post.content, extensions=[...])
            post.content = bleach.clean(html, tags=ALLOWED_TAGS, attributes=ALLOWED_ATTRS)
            return post

    Then keep your template:

    - {{ post.content | safe }}
    + {{ post.content | safe }}
  2. Or use django-bleach’s template filter directly:

    {% load bleach_tags %}
    - {{ post.content | safe }}
    + {{ post.content | bleach_clean }}

Could you confirm that all HTML is being sanitized in the view (or via a bleach filter) before this safe filter is applied?


🏁 Script executed:

#!/bin/bash
echo "Searching for bleach usage..."
rg -n "import bleach" .
rg -n "bleach.clean" .
rg -n "bleach_" .
echo "Searching for bleach_tags in templates..."
rg -n "{% load bleach_tags" website/templates
echo "Searching for PostDetailView definition..."
rg -n "class PostDetailView" .
echo "Searching for markdown usage..."
rg -n "markdown" .

Length of output: 13324


🏁 Script executed:

#!/bin/bash
echo "=== website/views/blog.py (lines 1-100) ==="
sed -n '1,100p' website/views/blog.py
echo "=== website/views/blog.py (lines 70-110) ==="
sed -n '70,110p' website/views/blog.py

Length of output: 3996


Sanitization Confirmed in PostDetailView.get_object

The markdown-rendered HTML is properly sanitized using bleach.clean with a strict whitelist in PostDetailView.get_object (website/views/blog.py), so applying {{ post.content | safe }} in the template does not introduce XSS.

No further changes are required.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
website/views/blog.py (1)

23-26: Enhancement of Markdown processing with additional extensions.

The addition of the three Markdown extensions (fenced_code, tables, nl2br) greatly improves the rendering capabilities for blog posts. This allows for:

  1. Proper rendering of code blocks with syntax highlighting using triple backticks
  2. Support for Markdown tables
  3. Automatic conversion of newlines to line breaks for better text formatting

These extensions work well with the safe filter added in the template to fix the HTML rendering issue.

For additional security, consider using a HTML sanitization library like bleach to ensure that even with the safe filter, only a whitelist of allowed HTML tags and attributes can be rendered. This would protect against potential XSS attacks while still allowing intended HTML formatting.

 def get_object(self):
     post = super().get_object()
     post.content = markdown.markdown(
         post.content,
         extensions=["markdown.extensions.fenced_code", "markdown.extensions.tables", "markdown.extensions.nl2br"],
     )
+    # Optional: Sanitize HTML content to prevent XSS
+    # import bleach
+    # allowed_tags = ['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'a', 'abbr', 'acronym', 'b', 'blockquote', 
+    #                 'code', 'em', 'i', 'li', 'ol', 'ul', 'strong', 'span', 'table', 'tbody', 'td', 
+    #                 'th', 'tr', 'thead', 'br', 'pre']
+    # allowed_attrs = {'a': ['href', 'title'], 'abbr': ['title'], 'acronym': ['title']}
+    # post.content = bleach.clean(post.content, tags=allowed_tags, attributes=allowed_attrs)
     return post
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 30a52a6 and 8983f4d.

📒 Files selected for processing (2)
  • website/templates/blog/post_detail.html (1 hunks)
  • website/views/blog.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Run Tests
🔇 Additional comments (1)
website/templates/blog/post_detail.html (1)

25-25:

Details

❓ Verification inconclusive

Addition of the safe filter addresses the HTML rendering issue.

The | safe filter correctly prevents Django from auto-escaping HTML entities, allowing HTML tags in the blog post content to be rendered properly rather than displayed as raw text. This change directly addresses the issue mentioned in the PR description.

However, be aware that using the safe filter bypasses Django's XSS protection. Since this content appears to come from authenticated blog authors (as seen in the edit/delete controls that only appear for the author), the risk is reduced, but it's still important to ensure that the markdown processing in the view properly sanitizes any potentially harmful HTML.


🏁 Script executed:

#!/bin/bash
# Check if there are input validation or content sanitization measures elsewhere in the codebase
echo "Checking for content validation in Post model:"
rg -A 3 "def clean|def save" --glob "website/models.py" || echo "No explicit content validation found in models"

echo -e "\nChecking for any HTML sanitization libraries:"
rg -l "bleach|sanitize|clean_html" --type py || echo "No common HTML sanitization libraries found"

echo -e "\nChecking model definition for Post:"
rg -A 10 "class Post" --glob "website/models.py" || echo "Post model definition not found"

Length of output: 2803


Action: Validate HTML Sanitization During Markdown Conversion

The added | safe filter indeed fixes the HTML rendering issue by allowing blog post content to display its HTML properly. However, using the safe filter bypasses Django’s default escaping and can open up XSS risks. Our investigation shows that while there isn’t explicit content sanitization at the model level (in website/models.py), there is evidence of HTML sanitization measures in website/views/core.py. Please ensure that the markdown conversion step (which processes post.content before it reaches the template) effectively sanitizes the HTML to mitigate any security vulnerabilities.

  • Verify that the markdown processing in the view (or a related module) sanitizes potentially dangerous HTML (e.g., via libraries like bleach).
  • Confirm that only trusted users can submit content when bypassing Django’s auto-escaping, so that the application remains secure.

coderabbitai[bot]
coderabbitai bot previously approved these changes Apr 9, 2025
@DonnieBLT
Copy link
Collaborator

Will this prevent xss attacks?

@rinkitadhana
Copy link
Contributor Author

Will this prevent xss attacks?

Oh, I just checked, it does not prevent xxs attacks. IDK how i missed it. Sorry, Fixing it right now.

@DonnieBLT DonnieBLT enabled auto-merge May 2, 2025 19:51
@DonnieBLT DonnieBLT disabled auto-merge May 2, 2025 19:51
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
website/views/blog.py (5)

31-52: Consider moving the allowed tags list to settings.

The comprehensive list of allowed HTML tags is appropriate for blog content, but consider moving this to a constants file or Django settings for better maintainability and reusability.

- allowed_tags = [
-     "p",
-     "br",
-     "b",
-     "strong",
-     "i",
-     "em",
-     "h1",
-     "h2",
-     "h3",
-     "h4",
-     "h5",
-     "h6",
-     "blockquote",
-     "ul",
-     "ol",
-     "li",
-     "a",
-     "img",
-     "code",
-     "pre",
- ]
+ from django.conf import settings
+ allowed_tags = settings.ALLOWED_HTML_TAGS

Then in settings.py:

ALLOWED_HTML_TAGS = [
    "p", "br", "b", "strong", "i", "em", 
    "h1", "h2", "h3", "h4", "h5", "h6", 
    "blockquote", "ul", "ol", "li", 
    "a", "img", "code", "pre",
]

53-59: Consider moving allowed attributes to settings.

Similar to the allowed tags, consider moving the attributes dictionary to settings for better maintainability and reusability.

- allowed_attributes = {
-     "a": ["href", "title", "rel", "target"],
-     "img": ["src", "alt", "title", "width", "height"],
-     "blockquote": ["cite"],
-     "code": ["class"],
-     "pre": ["class"],
- }
+ from django.conf import settings
+ allowed_attributes = settings.ALLOWED_HTML_ATTRIBUTES

23-65: Consider separating concerns in the get_object method.

The method is currently handling multiple responsibilities: fetching the object, rendering markdown, and sanitizing HTML. Consider separating these concerns for better maintainability.

def get_object(self):
    post = super().get_object()
-
-    html_content = markdown.markdown(
-        post.content,
-        extensions=["markdown.extensions.fenced_code", "markdown.extensions.tables", "markdown.extensions.nl2br"],
-    )
-
-    allowed_tags = [
-        "p",
-        "br",
-        "b",
-        "strong",
-        "i",
-        "em",
-        "h1",
-        "h2",
-        "h3",
-        "h4",
-        "h5",
-        "h6",
-        "blockquote",
-        "ul",
-        "ol",
-        "li",
-        "a",
-        "img",
-        "code",
-        "pre",
-    ]
-    allowed_attributes = {
-        "a": ["href", "title", "rel", "target"],
-        "img": ["src", "alt", "title", "width", "height"],
-        "blockquote": ["cite"],
-        "code": ["class"],
-        "pre": ["class"],
-    }
-
-    clean_html = bleach.clean(html_content, tags=allowed_tags, attributes=allowed_attributes, strip=True)
-
-    post.content = mark_safe(clean_html)
+    post.content = self._render_and_sanitize_content(post.content)

    return post

+ def _render_and_sanitize_content(self, content):
+     """Render markdown content and sanitize the resulting HTML."""
+     from django.conf import settings
+
+     html_content = markdown.markdown(
+         content,
+         extensions=["markdown.extensions.fenced_code", "markdown.extensions.tables", "markdown.extensions.nl2br"],
+     )
+
+     allowed_tags = getattr(settings, 'ALLOWED_HTML_TAGS', [
+         "p", "br", "b", "strong", "i", "em", 
+         "h1", "h2", "h3", "h4", "h5", "h6", 
+         "blockquote", "ul", "ol", "li", 
+         "a", "img", "code", "pre",
+     ])
+     
+     allowed_attributes = getattr(settings, 'ALLOWED_HTML_ATTRIBUTES', {
+         "a": ["href", "title", "rel", "target"],
+         "img": ["src", "alt", "title", "width", "height"],
+         "blockquote": ["cite"],
+         "code": ["class"],
+         "pre": ["class"],
+     })
+
+     clean_html = bleach.clean(html_content, tags=allowed_tags, attributes=allowed_attributes, strip=True)
+
+     return mark_safe(clean_html)

61-61: Consider adding link protocols whitelist for enhanced security.

To further strengthen XSS protection, consider adding allowed protocols to the bleach.clean() call.

- clean_html = bleach.clean(html_content, tags=allowed_tags, attributes=allowed_attributes, strip=True)
+ allowed_protocols = ['http', 'https', 'mailto']
+ clean_html = bleach.clean(html_content, tags=allowed_tags, attributes=allowed_attributes, protocols=allowed_protocols, strip=True)

26-29: Consider caching the rendered content for performance.

For long blog posts, the markdown rendering and sanitization process could impact performance. Consider implementing caching for rendered content.

+ from django.core.cache import cache
+ 
def get_object(self):
    post = super().get_object()

+    # Use cache key based on post ID and last modification time
+    cache_key = f"post_content_{post.id}_{post.modified_at.timestamp()}"
+    cached_content = cache.get(cache_key)
+    
+    if cached_content:
+        post.content = cached_content
+        return post
+
    html_content = markdown.markdown(
        post.content,
        extensions=["markdown.extensions.fenced_code", "markdown.extensions.tables", "markdown.extensions.nl2br"],
    )

    # [existing code for sanitization]

    post.content = mark_safe(clean_html)
+    # Cache the content for future requests (set a reasonable timeout)
+    cache.set(cache_key, post.content, timeout=86400)  # 24 hours

    return post
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8983f4d and d71ecc2.

📒 Files selected for processing (1)
  • website/views/blog.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: docker-test
🔇 Additional comments (5)
website/views/blog.py (5)

1-5: Appropriate imports for HTML sanitization and safe strings.

The addition of bleach for HTML sanitization and mark_safe for marking content as safe for Django templates aligns with the PR objective to properly render HTML content in blog posts.


26-29: Good use of markdown extensions.

Incorporating these extensions enhances the Markdown rendering capabilities:

  • fenced_code properly formats code blocks
  • tables enables proper table rendering
  • nl2br converts newlines to line breaks

This addresses the issue of HTML not rendering correctly in blog posts.


61-61: Good implementation of HTML sanitization.

Using bleach.clean() with whitelisted tags and attributes is an effective defense against XSS attacks, addressing the security concern raised in the PR comments.


63-63: Ensure safe HTML output with mark_safe.

Correctly marking sanitized content as safe for Django templates prevents double-escaping while maintaining security.


53-59:

Details

✅ Verification successful

Consider additional attribute restrictions for enhanced security.

The current attribute whitelist is a good start, but for enhanced security, consider restricting more attack vectors:


🏁 Script executed:

#!/bin/bash
# Check if there are any unhandled security attributes in the codebase

# Check for potential event handler attributes in HTML
echo "Checking for potential event handler attributes in templates:"
rg -i 'on\w+=' --glob "*.html" --stats

# Check for JavaScript protocol in href attributes
echo "Checking for JavaScript protocol in href attributes:"
rg -i 'href="javascript:' --glob "*.html" --stats

# Check for instances of directly using user input in HTML
echo "Checking for potential XSS vectors with user input:"
rg '{{ .+\|safe }}' --glob "*.html" --stats

Length of output: 52974


No additional attribute restrictions needed – bleach whitelist already blocks event handlers and JS protocols.

User‐supplied HTML is sanitized with bleach.clean() using the explicit allowed_attributes whitelist, which does not include any on* event handlers or javascript: protocols. All inline handlers found are in static templates and aren’t user‐controlled. Closing this concern.

coderabbitai[bot]
coderabbitai bot previously approved these changes May 2, 2025
@DonnieBLT DonnieBLT added this pull request to the merge queue May 2, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 2, 2025
@DonnieBLT DonnieBLT added this pull request to the merge queue May 10, 2025
Merged via the queue into OWASP-BLT:main with commit 5dccb42 May 10, 2025
12 checks passed
@rinkitadhana rinkitadhana deleted the fix/blog branch May 11, 2025 00:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Raw HTML Tags Rendered as Text in Blog Post

2 participants