cython magic: extract style, body, not full HTML document #5760

minrk · 2023-10-10T13:31:48Z

displaying the full HTML document inline can result in problems rendering and can produce invalid HTML documents after export via nbconvert, mystnb, etc. In particular, my jupyter-book-built pages with %%cython --annotate output end up applying .cython to the document body because of the duplicated <body> tags, in particular causing the whole page to render with courier.

This is easier and cleaner with a full HTML parser, but I imagine you don't want that as a dependency just for this. My in-production workaround is here:

from functools import partial

from bs4 import BeautifulSoup


def _clean_annotated_html(original_clean, html):
    """Substitute inline Cython annotated output

    extracts body and style contents to avoid conflicts with page-level body/head elements.
    """
    html = original_clean(html)
    page = BeautifulSoup(html, "html.parser")
    chunks = []
    # could do this only once, but then clearing output and re-running would lose style.
    for style in page.find_all("style"):
        chunks.append(str(style))
    # add css to fix cython line padding in jupyter-book output
    chunks.append('<style type="text/css">.cython.line { padding: 0px; }</style>')
    chunks.append("".join(str(element) for element in page.find("body").contents))
    return "\n".join(chunks)


def load_ipython_extension(ip):
    cython_magics = ip.magics_manager.magics["cell"]["cython"].__self__
    original_clean = cython_magics.clean_annotated_html
    cython_magics.clean_annotated_html = partial(_clean_annotated_html, original_clean)

but in the absence of proper HTML parsing, this regular expression splitting seems to work well enough, given how basic Cython-generated HTML is, and we just want the whole body and any style tag(s).

Another (probably better) way would be to request this subset of HTML from the annotation process, but I couldn't see a simple way to do that.

displaying the full HTML document inline can result in problems rendering and produces invalid HTML documents after export

da-woods · 2023-11-19T11:05:53Z

This looks reasonable to me. I'm going to target this to the next major release, rather than applying it to a point release.

Thanks.

cython magic: extract style, body, not full HTML document

f7990b5

displaying the full HTML document inline can result in problems rendering and produces invalid HTML documents after export

da-woods added this to the 3.1 milestone Nov 19, 2023

da-woods added the Tools label Nov 19, 2023

da-woods merged commit d5e0f3b into cython:master Nov 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

cython magic: extract style, body, not full HTML document #5760

cython magic: extract style, body, not full HTML document #5760

Uh oh!

minrk commented Oct 10, 2023

Uh oh!

da-woods commented Nov 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

cython magic: extract style, body, not full HTML document #5760

cython magic: extract style, body, not full HTML document #5760

Uh oh!

Conversation

minrk commented Oct 10, 2023

Uh oh!

da-woods commented Nov 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants