-
-
Notifications
You must be signed in to change notification settings - Fork 313
Fix: HTML rendering in blog posts. #4126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
""" WalkthroughThe Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant PostDetailView
participant Markdown
participant Bleach
participant DjangoTemplate
User->>PostDetailView: Request blog post
PostDetailView->>Markdown: Render post.content with extensions
Markdown-->>PostDetailView: Return rendered HTML
PostDetailView->>Bleach: Sanitize HTML with allowed tags/attributes/protocols
Bleach-->>PostDetailView: Return sanitized HTML
PostDetailView->>DjangoTemplate: Pass sanitized HTML marked as safe
DjangoTemplate-->>User: Display formatted blog post
Assessment against linked issues
📜 Recent review detailsConfiguration used: CodeRabbit UI ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (2)
🔇 Additional comments (1)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
website/views/blog.py (1)
23-26: Enhancement of Markdown processing with additional extensions.The addition of the three Markdown extensions (fenced_code, tables, nl2br) greatly improves the rendering capabilities for blog posts. This allows for:
- Proper rendering of code blocks with syntax highlighting using triple backticks
- Support for Markdown tables
- Automatic conversion of newlines to line breaks for better text formatting
These extensions work well with the
safefilter added in the template to fix the HTML rendering issue.For additional security, consider using a HTML sanitization library like
bleachto ensure that even with thesafefilter, only a whitelist of allowed HTML tags and attributes can be rendered. This would protect against potential XSS attacks while still allowing intended HTML formatting.def get_object(self): post = super().get_object() post.content = markdown.markdown( post.content, extensions=["markdown.extensions.fenced_code", "markdown.extensions.tables", "markdown.extensions.nl2br"], ) + # Optional: Sanitize HTML content to prevent XSS + # import bleach + # allowed_tags = ['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'a', 'abbr', 'acronym', 'b', 'blockquote', + # 'code', 'em', 'i', 'li', 'ol', 'ul', 'strong', 'span', 'table', 'tbody', 'td', + # 'th', 'tr', 'thead', 'br', 'pre'] + # allowed_attrs = {'a': ['href', 'title'], 'abbr': ['title'], 'acronym': ['title']} + # post.content = bleach.clean(post.content, tags=allowed_tags, attributes=allowed_attrs) return post
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
website/templates/blog/post_detail.html(1 hunks)website/views/blog.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Run Tests
🔇 Additional comments (1)
website/templates/blog/post_detail.html (1)
25-25:Details
❓ Verification inconclusive
Addition of the
safefilter addresses the HTML rendering issue.The
| safefilter correctly prevents Django from auto-escaping HTML entities, allowing HTML tags in the blog post content to be rendered properly rather than displayed as raw text. This change directly addresses the issue mentioned in the PR description.However, be aware that using the
safefilter bypasses Django's XSS protection. Since this content appears to come from authenticated blog authors (as seen in the edit/delete controls that only appear for the author), the risk is reduced, but it's still important to ensure that the markdown processing in the view properly sanitizes any potentially harmful HTML.
🏁 Script executed:
#!/bin/bash # Check if there are input validation or content sanitization measures elsewhere in the codebase echo "Checking for content validation in Post model:" rg -A 3 "def clean|def save" --glob "website/models.py" || echo "No explicit content validation found in models" echo -e "\nChecking for any HTML sanitization libraries:" rg -l "bleach|sanitize|clean_html" --type py || echo "No common HTML sanitization libraries found" echo -e "\nChecking model definition for Post:" rg -A 10 "class Post" --glob "website/models.py" || echo "Post model definition not found"Length of output: 2803
Action: Validate HTML Sanitization During Markdown Conversion
The added
| safefilter indeed fixes the HTML rendering issue by allowing blog post content to display its HTML properly. However, using thesafefilter bypasses Django’s default escaping and can open up XSS risks. Our investigation shows that while there isn’t explicit content sanitization at the model level (inwebsite/models.py), there is evidence of HTML sanitization measures inwebsite/views/core.py. Please ensure that the markdown conversion step (which processespost.contentbefore it reaches the template) effectively sanitizes the HTML to mitigate any security vulnerabilities.
- Verify that the markdown processing in the view (or a related module) sanitizes potentially dangerous HTML (e.g., via libraries like
bleach).- Confirm that only trusted users can submit content when bypassing Django’s auto-escaping, so that the application remains secure.
|
Will this prevent xss attacks? |
Oh, I just checked, it does not prevent xxs attacks. IDK how i missed it. Sorry, Fixing it right now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
website/views/blog.py (5)
31-52: Consider moving the allowed tags list to settings.The comprehensive list of allowed HTML tags is appropriate for blog content, but consider moving this to a constants file or Django settings for better maintainability and reusability.
- allowed_tags = [ - "p", - "br", - "b", - "strong", - "i", - "em", - "h1", - "h2", - "h3", - "h4", - "h5", - "h6", - "blockquote", - "ul", - "ol", - "li", - "a", - "img", - "code", - "pre", - ] + from django.conf import settings + allowed_tags = settings.ALLOWED_HTML_TAGSThen in settings.py:
ALLOWED_HTML_TAGS = [ "p", "br", "b", "strong", "i", "em", "h1", "h2", "h3", "h4", "h5", "h6", "blockquote", "ul", "ol", "li", "a", "img", "code", "pre", ]
53-59: Consider moving allowed attributes to settings.Similar to the allowed tags, consider moving the attributes dictionary to settings for better maintainability and reusability.
- allowed_attributes = { - "a": ["href", "title", "rel", "target"], - "img": ["src", "alt", "title", "width", "height"], - "blockquote": ["cite"], - "code": ["class"], - "pre": ["class"], - } + from django.conf import settings + allowed_attributes = settings.ALLOWED_HTML_ATTRIBUTES
23-65: Consider separating concerns in the get_object method.The method is currently handling multiple responsibilities: fetching the object, rendering markdown, and sanitizing HTML. Consider separating these concerns for better maintainability.
def get_object(self): post = super().get_object() - - html_content = markdown.markdown( - post.content, - extensions=["markdown.extensions.fenced_code", "markdown.extensions.tables", "markdown.extensions.nl2br"], - ) - - allowed_tags = [ - "p", - "br", - "b", - "strong", - "i", - "em", - "h1", - "h2", - "h3", - "h4", - "h5", - "h6", - "blockquote", - "ul", - "ol", - "li", - "a", - "img", - "code", - "pre", - ] - allowed_attributes = { - "a": ["href", "title", "rel", "target"], - "img": ["src", "alt", "title", "width", "height"], - "blockquote": ["cite"], - "code": ["class"], - "pre": ["class"], - } - - clean_html = bleach.clean(html_content, tags=allowed_tags, attributes=allowed_attributes, strip=True) - - post.content = mark_safe(clean_html) + post.content = self._render_and_sanitize_content(post.content) return post + def _render_and_sanitize_content(self, content): + """Render markdown content and sanitize the resulting HTML.""" + from django.conf import settings + + html_content = markdown.markdown( + content, + extensions=["markdown.extensions.fenced_code", "markdown.extensions.tables", "markdown.extensions.nl2br"], + ) + + allowed_tags = getattr(settings, 'ALLOWED_HTML_TAGS', [ + "p", "br", "b", "strong", "i", "em", + "h1", "h2", "h3", "h4", "h5", "h6", + "blockquote", "ul", "ol", "li", + "a", "img", "code", "pre", + ]) + + allowed_attributes = getattr(settings, 'ALLOWED_HTML_ATTRIBUTES', { + "a": ["href", "title", "rel", "target"], + "img": ["src", "alt", "title", "width", "height"], + "blockquote": ["cite"], + "code": ["class"], + "pre": ["class"], + }) + + clean_html = bleach.clean(html_content, tags=allowed_tags, attributes=allowed_attributes, strip=True) + + return mark_safe(clean_html)
61-61: Consider adding link protocols whitelist for enhanced security.To further strengthen XSS protection, consider adding allowed protocols to the
bleach.clean()call.- clean_html = bleach.clean(html_content, tags=allowed_tags, attributes=allowed_attributes, strip=True) + allowed_protocols = ['http', 'https', 'mailto'] + clean_html = bleach.clean(html_content, tags=allowed_tags, attributes=allowed_attributes, protocols=allowed_protocols, strip=True)
26-29: Consider caching the rendered content for performance.For long blog posts, the markdown rendering and sanitization process could impact performance. Consider implementing caching for rendered content.
+ from django.core.cache import cache + def get_object(self): post = super().get_object() + # Use cache key based on post ID and last modification time + cache_key = f"post_content_{post.id}_{post.modified_at.timestamp()}" + cached_content = cache.get(cache_key) + + if cached_content: + post.content = cached_content + return post + html_content = markdown.markdown( post.content, extensions=["markdown.extensions.fenced_code", "markdown.extensions.tables", "markdown.extensions.nl2br"], ) # [existing code for sanitization] post.content = mark_safe(clean_html) + # Cache the content for future requests (set a reasonable timeout) + cache.set(cache_key, post.content, timeout=86400) # 24 hours return post
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
website/views/blog.py(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: docker-test
🔇 Additional comments (5)
website/views/blog.py (5)
1-5: Appropriate imports for HTML sanitization and safe strings.The addition of
bleachfor HTML sanitization andmark_safefor marking content as safe for Django templates aligns with the PR objective to properly render HTML content in blog posts.
26-29: Good use of markdown extensions.Incorporating these extensions enhances the Markdown rendering capabilities:
fenced_codeproperly formats code blockstablesenables proper table renderingnl2brconverts newlines to line breaksThis addresses the issue of HTML not rendering correctly in blog posts.
61-61: Good implementation of HTML sanitization.Using
bleach.clean()with whitelisted tags and attributes is an effective defense against XSS attacks, addressing the security concern raised in the PR comments.
63-63: Ensure safe HTML output with mark_safe.Correctly marking sanitized content as safe for Django templates prevents double-escaping while maintaining security.
53-59:Details
✅ Verification successful
Consider additional attribute restrictions for enhanced security.
The current attribute whitelist is a good start, but for enhanced security, consider restricting more attack vectors:
🏁 Script executed:
#!/bin/bash # Check if there are any unhandled security attributes in the codebase # Check for potential event handler attributes in HTML echo "Checking for potential event handler attributes in templates:" rg -i 'on\w+=' --glob "*.html" --stats # Check for JavaScript protocol in href attributes echo "Checking for JavaScript protocol in href attributes:" rg -i 'href="javascript:' --glob "*.html" --stats # Check for instances of directly using user input in HTML echo "Checking for potential XSS vectors with user input:" rg '{{ .+\|safe }}' --glob "*.html" --statsLength of output: 52974
No additional attribute restrictions needed – bleach whitelist already blocks event handlers and JS protocols.
User‐supplied HTML is sanitized with bleach.clean() using the explicit
allowed_attributeswhitelist, which does not include anyon*event handlers orjavascript:protocols. All inline handlers found are in static templates and aren’t user‐controlled. Closing this concern.
fixes #4122
Changes this PR Introduced:
This PR addresses an issue where raw HTML tags were displaying in blog posts instead of being properly rendered. Two main changes were made:
Before:
After:
Summary by CodeRabbit
Bug Fixes
New Features