Nested clozes and increased cloze meta data #2141

triaeiou · 2022-10-23T21:36:55Z

As per https://forums.ankiweb.net/t/ability-to-generate-nested-cloze/9743/6: implementation to allow nested clozes. Also included some additional cloze meta data as per #1968.

kleinerpirat · 2022-10-23T23:18:38Z

As Damien reminded me in a recent PR, you can run the tests locally with the command

bazel test ...

from the root folder of the project. Further info: https://github.com/ankitects/anki/blob/main/docs/development.md#running-tests

triaeiou · 2022-10-23T23:32:40Z

As Damien reminded me in a recent PR, you can run the tests locally with the command
bazel test ...
from the root folder of the project. Further info: https://github.com/ankitects/anki/blob/main/docs/development.md#running-tests

Thanks, yeah I had a look at that but it fails to "determine workspace status" so I figured I wouldn't go down the rabbit hole of trying to figure out what the problem was (I am building on windows machine and guess the scripts are really made for a *nix/shell machine).

Cheers

dae

My initial impression is that your Rust code looks pretty good; some comments below.

dae · 2022-10-24T04:06:34Z

rslib/src/cloze.rs

+}
+
+/// Minimal encoding of string for storage in attribute (", &, \n, <, >)
+pub fn encode_attribute(text: &str) -> Cow<str> {


I think we should be using htmlescape::encode_attribute() here. See the review on #1968

dae · 2022-10-24T04:13:07Z

rslib/src/cloze.rs

+    use self::State::*;
+    match (state, c) {
+        (Root, '{') => Open,
+        (Open, '{') => Open2,


Processing a character a time feels a bit complicated. I wonder whether this could be simplified by tokenizing the string with nom instead? We already use it for dealing parsing card templates.

https://docs.rs/nom/latest/nom/

Possibly. I don't feel the state machine is particularly complicated (although probably a bit more verbose compared to a library). I am unfamiliar with Rust and its libraries so it was simpler to write a state machine. Are you adamant on using nom?

I am unfamiliar with Rust

You've done a good job - I can't really tell.

I'd appreciate it if you'd give nom a try - I've done the bulk of the parsing work for you, so it should mostly be a case of dealing with the tokens now.

use std::error::Error; use nom::branch::alt; use nom::bytes::complete::{take_until, take_while}; use nom::multi::many0; use nom::{bytes::complete::tag, combinator::map, IResult}; #[derive(Debug)] enum Token<'a> { OpenCloze(u16), Text(&'a str), Hint(&'a str), CloseCloze, } fn open_cloze(text: &str) -> IResult<&str, Token> { // opening brackets and 'c' let (text, _opening_brackets_and_c) = tag("{{c")(text)?; // following number let (text, digits) = take_while(|c: char| c.is_ascii_digit())(text)?; let digits: u16 = match digits.parse() { Ok(digits) => digits, Err(_) => { // not a valid number; fail to recognize return Err(nom::Err::Error(nom::error::make_error( text, nom::error::ErrorKind::Digit, ))); } }; // :: let (text, _colons) = tag("::")(text)?; Ok((text, Token::OpenCloze(digits))) } fn close_cloze(text: &str) -> IResult<&str, Token> { map(tag("}}"), |_| Token::CloseCloze)(text) } fn hint(text: &str) -> IResult<&str, Token> { let (text, _separating_colons) = tag("::")(text)?; let (text, hint) = take_until("}}")(text)?; Ok((text, Token::Hint(hint))) } /// Match a run of text until an open/close or hint is encountered. /// This will stop on a hint marker even outside a cloze, so the processing /// should handle hint tokens outside of an active cloze. fn normal_text(text: &str) -> IResult<&str, Token> { if text.is_empty() { return Err(nom::Err::Error(nom::error::make_error( text, nom::error::ErrorKind::Eof, ))); } let mut index = 0; let mut other_token = alt((open_cloze, close_cloze, hint)); while other_token(&text[index..]).is_err() && index + 1 < text.len() { index += 1; } Ok((&text[index..], Token::Text(&text[0..index]))) } // todo: error handling fn tokenize(text: &str) -> Vec<Token> { let (remaining, tokens) = many0(alt((open_cloze, hint, close_cloze, normal_text)))(text).unwrap(); assert!(remaining.is_empty()); tokens } fn main() -> Result<(), Box<dyn Error>> { dbg!(tokenize("foo {{c1::bar {{c2::baz}}}}")); dbg!(tokenize("foo {{c1::bar {{c2::baz}}::qux}}")); Ok(()) }

Example output:

[src/main.rs:74] tokenize("foo {{c1::bar {{c2::baz}}::qux}}") = [ Text( "foo ", ), OpenCloze( 1, ), Text( "bar ", ), OpenCloze( 2, ), Text( "baz", ), CloseCloze, Hint( "qux", ), CloseCloze, ]

(that should give two advantages - it refers to the original strings instead of constructing a bunch of new ones a character a time, and it allows the parsing code to deal with only 4 token kinds instead of having to ignore a bunch of other ones like HintClose1)

Found a bug; this line should have the + 1 removed from it:

while other_token(&text[index..]).is_err() && index < text.len() {

I am not sure I understand but as mentioned I don't see a clean solution for tokenize()/parse_tokens() without some state during tokenizing (such as my earlier tokenize()example). If you are set on stateless tokenizing then I suggest going with your solution above and declaring "cloze open" invalid inside hints? I.e. I will look at a resolver for your tokenizing()/parse_tokens()`?

I don't understand why tokenize would need changing - couldn't it be solved using something like this?

fn parse_tokens<'a>(tokens: &'a [Token]) -> Vec<ParseOutput<'a>> { let mut open_clozes: Vec<ExtractedCloze> = vec![]; // nested clozes are placed here until the stack is empty, so they are gathered in the correct order let mut nested_clozes: Vec<ExtractedCloze> = vec![]; let mut output = vec![]; let mut current_cloze_has_hint = false; for token in tokens { match token { Token::OpenCloze(number) => { if current_cloze_has_hint { // text_fragments would need to become a Cow for us to encode the original number open_clozes .last_mut() .unwrap() .text_fragments .push("{{cx::") } else { open_clozes.push(ExtractedCloze { number: *number, text_fragments: Vec::with_capacity(1), // common case }); } } Token::Text(text) => { if let Some(cloze) = open_clozes.last_mut() { if text.contains("::") { current_cloze_has_hint = true; } cloze.text_fragments.push(text); } else { output.push(ParseOutput::Text(text)); } } Token::CloseCloze => { let nested = open_clozes.len() > 1; if let Some(cloze) = open_clozes.pop() { current_cloze_has_hint = false; if nested { nested_clozes.push(cloze); } else { output.push(ParseOutput::Cloze(cloze)); while let Some(cloze) = nested_clozes.pop() { output.push(ParseOutput::Cloze(cloze)) } } } else { output.push(ParseOutput::Text("}}")) } } } } output }

But that said, I'd lean towards just declaring this case as unsupported for now, as I suspect we don't actually need it.

I don't understand why tokenize would need changing - couldn't it be solved using something like this?

It can definitely be solved by something like your solution, I just meant I have a hard time coming up with a solution that I find "clean" (not several layers of parsing etc.), but that is just my personal taste. If you are ok with declaring {{c1:: unsupported in the hint I will update the PR with your earlier suggestion (and the other issues) and add the "rendering" code.

Yes please, let's declare it unsupported until we know we need it.

Ok, I've updated the PR, I don't know if you wanted another layer of functions to take the result of parse_tokens() or to implement the rendering logic "inside" it. I went with the latter and moved parse_tokens() inside reveal_cloze_text() and reveal_cloze_text_only() as the logic the required steps are fewer in the latter. I also put "local local" functions and types inside their only user function (e.g. the different tokenizing functions) to make it clearer where they are and aren't used.

Sorry for all the back and forth on the "cloze open" inside a hint, it is now supported because it made sense in the way the logic turned out.

As I am no rust expert: someone who is good with rust memory management might want to look at string handling in reveal_cloze_text(), possibly there is some unnecessary string copying going on there (I had to create String from &str earlier than I would like). I can see some solutions to it but they involve growing the code so I don't know if a slight performance gain is worth a larger code base.

rslib/src/cloze.rs

dae · 2022-10-24T04:18:42Z

rslib/src/cloze.rs

+            )),
+            // Answer - active cloze
+            (false, false, true, _) => last.text.push_str(&format!(
+                r#"<span class="cloze active" data-ordinal="{}">{}</span>"#,


While a separate active class on only the active clozes may make sense in a new implementation, I'm not sure we can do that here - most users will have lines like .cloze { color: blue; } in their card templates, and this change will lead to the inactive clozes to be colored as well, which I'm not sure we want. Perhaps an approach like #2140 might better preserve the existing behaviour?

Ok, so how about class="cloze" for active and class="cloze-inactive" for inactive?

Yep, that's less likely to break things

RumovZ · 2022-10-24T07:03:59Z

Thanks, yeah I had a look at that but it fails to "determine workspace status" so I figured I wouldn't go down the rabbit hole of trying to figure out what the problem was (I am building on windows machine and guess the scripts are really made for a *nix/shell machine).

Sounds like you haven't configured PATH correctly: https://github.com/ankitects/anki/blob/main/docs/windows.md#more
I'm on Windows, too, and have everything working. Saves an immense amount of time in the long run. 🙂

triaeiou · 2022-10-24T07:40:02Z

Sounds like you haven't configured PATH correctly: https://github.com/ankitects/anki/blob/main/docs/windows.md#more I'm on Windows, too, and have everything working. Saves an immense amount of time in the long run. 🙂

Thanks but I already have bazel and msys on the path and still get <builtin>: BazelWorkspaceStatusAction stable-status.txt failed: Failed to determine workspace status: Process exited with status 127 /bin/bash: .toolsstatus.sh: No such file or directory - You are of course right that it will save a lot of time in the long run - I'll look into it at some point (I don't run bazel as admin because reasons but I guessed the issue was really allowing bazel to create symlinks, which can be set for the build user in a GPO).

triaeiou · 2022-10-24T10:45:36Z

Sorry, I was referring to the comment.

Ah, ok, it is there to emulate the original regex functionality, i.e. foo {{{c1:bar}} should result in foo {<span class="cloze" data-cloze="bar" data-ordinal="1">bar</span>.

dae · 2022-10-24T11:08:40Z

Ah, I see. We should get that for free with the nom parser.

dae · 2022-10-31T05:25:35Z

Ok, firstly, I need to apologize for my previous example. It is indeed difficult to achieve the desired outcomes from a flat list of parsed tokens, as I discovered when I went to try implement routines on top of it. What I should have been doing is creating a tree of parsed tokens instead.

I'm afraid I'm not super-fond of your current approach, as while the low-level tokenizing is now handled separately, the logic that maps the flat tokens into logical structures has crept into each of our routines like reveal_cloze_text(), reveal_cloze_text_only(), contains_cloze() and add_cloze_numbers_in_string(), making them somewhat harder to follow. If they instead were based on a parsed token tree, they can focus on the structure of the data, without having to determine it themselves.

I've created an example that shows how the routines above could be implemented with a token tree. It's not complete: it doesn't properly wrap the text with HTML yet, and I have not tested it thoroughly. And I tested it in a separate Rust project, so it includes some copy+pasted routines from the Anki codebase that aren't relevant. But hopefully it better illustrates the direction I was hoping to go in than my previous example. What do you think?

use std::borrow::Cow;
use std::collections::HashSet;
use std::error::Error;

use lazy_static::lazy_static;
use nom::branch::alt;
use nom::bytes::complete::take_while;
use nom::{bytes::complete::tag, combinator::map, IResult};
use regex::Regex;

#[derive(Debug)]
enum Token<'a> {
    OpenCloze(u16),
    Text(&'a str),
    CloseCloze,
}

// todo: error handling
fn tokenize(mut text: &str) -> impl Iterator<Item = Token> {
    fn open_cloze(text: &str) -> IResult<&str, Token> {
        // opening brackets and 'c'
        let (text, _opening_brackets_and_c) = tag("{{c")(text)?;
        // following number
        let (text, digits) = take_while(|c: char| c.is_ascii_digit())(text)?;
        let digits: u16 = match digits.parse() {
            Ok(digits) => digits,
            Err(_) => {
                // not a valid number; fail to recognize
                return Err(nom::Err::Error(nom::error::make_error(
                    text,
                    nom::error::ErrorKind::Digit,
                )));
            }
        };
        // ::
        let (text, _colons) = tag("::")(text)?;
        Ok((text, Token::OpenCloze(digits)))
    }

    fn close_cloze(text: &str) -> IResult<&str, Token> {
        map(tag("}}"), |_| Token::CloseCloze)(text)
    }

    /// Match a run of text until an open/close marker is encountered.
    fn normal_text(text: &str) -> IResult<&str, Token> {
        if text.is_empty() {
            return Err(nom::Err::Error(nom::error::make_error(
                text,
                nom::error::ErrorKind::Eof,
            )));
        }
        let mut index = 0;
        let mut other_token = alt((open_cloze, close_cloze));
        while other_token(&text[index..]).is_err() && index < text.len() {
            index += 1;
        }
        Ok((&text[index..], Token::Text(&text[0..index])))
    }

    std::iter::from_fn(move || {
        if text.is_empty() {
            None
        } else {
            let (remaining_text, token) =
                alt((open_cloze, close_cloze, normal_text))(text).unwrap();
            text = remaining_text;
            Some(token)
        }
    })
}

#[derive(Debug)]
enum TextOrCloze<'a> {
    Text(&'a str),
    Cloze(ExtractedCloze<'a>),
}

#[derive(Debug)]
struct ExtractedCloze<'a> {
    ordinal: u16,
    nodes: Vec<TextOrCloze<'a>>,
    hint: Option<&'a str>,
}

fn parse_text_with_clozes(text: &str) -> Vec<TextOrCloze<'_>> {
    let mut open_clozes: Vec<ExtractedCloze> = vec![];
    let mut output = vec![];
    for token in tokenize(text) {
        match token {
            Token::OpenCloze(ordinal) => open_clozes.push(ExtractedCloze {
                ordinal,
                nodes: Vec::with_capacity(1), // common case
                hint: None,
            }),
            Token::Text(mut text) => {
                if let Some(cloze) = open_clozes.last_mut() {
                    // extract hint if found
                    if let Some((head, tail)) = text.split_once("::") {
                        text = head;
                        cloze.hint = Some(tail);
                    }
                    cloze.nodes.push(TextOrCloze::Text(text));
                } else {
                    output.push(TextOrCloze::Text(text));
                }
            }
            Token::CloseCloze => {
                // take the currently active cloze
                if let Some(cloze) = open_clozes.pop() {
                    let target = if let Some(outer_cloze) = open_clozes.last_mut() {
                        // and place it into the cloze layer above
                        &mut outer_cloze.nodes
                    } else {
                        // or the top level if no other clozes active
                        &mut output
                    };
                    target.push(TextOrCloze::Cloze(cloze));
                } else {
                    // closing marker outside of any clozes
                    output.push(TextOrCloze::Text("}}"))
                }
            }
        }
    }
    output
}

impl ExtractedCloze<'_> {
    /// Return the cloze's hint, or "..." if none was provided.
    fn hint(&self) -> &str {
        self.hint.unwrap_or("...")
    }

    fn clozed_text(&self) -> Cow<str> {
        // happy efficient path?
        if self.nodes.len() == 1 {
            if let TextOrCloze::Text(text) = self.nodes.last().unwrap() {
                return (*text).into();
            }
        }

        let mut buf = String::new();
        for node in &self.nodes {
            match node {
                TextOrCloze::Text(text) => buf.push_str(text),
                TextOrCloze::Cloze(cloze) => buf.push_str(&cloze.clozed_text()),
            }
        }

        buf.into()
    }
}

pub fn cloze_numbers_in_string(html: &str) -> HashSet<u16> {
    let mut set = HashSet::with_capacity(4);
    add_cloze_numbers_in_string(html, &mut set);
    set
}

#[allow(clippy::implicit_hasher)]
pub fn add_cloze_numbers_in_string(html: &str, set: &mut HashSet<u16>) {
    add_cloze_numbers_in_text_with_clozes(&parse_text_with_clozes(html), set)
}

fn add_cloze_numbers_in_text_with_clozes(nodes: &[TextOrCloze], set: &mut HashSet<u16>) {
    for node in nodes {
        if let TextOrCloze::Cloze(cloze) = node {
            set.insert(cloze.ordinal);
            add_cloze_numbers_in_text_with_clozes(&cloze.nodes, set);
        }
    }
}

pub fn reveal_cloze_text_only(text: &str, cloze_ord: u16, question: bool) -> String {
    let mut output = Vec::new();
    for node in &parse_text_with_clozes(text) {
        reveal_cloze_text_in_nodes(node, cloze_ord, question, &mut output);
    }
    output.join(", ")
}

fn reveal_cloze_text_in_nodes(
    node: &TextOrCloze,
    cloze_ord: u16,
    question: bool,
    output: &mut Vec<String>,
) {
    if let TextOrCloze::Cloze(cloze) = node {
        if cloze.ordinal == cloze_ord {
            if question {
                output.push(cloze.hint().into())
            } else {
                output.push(cloze.clozed_text().into())
            }
        }
        for node in &cloze.nodes {
            reveal_cloze_text_in_nodes(&node, cloze_ord, question, output);
        }
    }
}

pub fn reveal_cloze_text(text: &str, cloze_ord: u16, question: bool) -> String {
    let mut buf = String::new();
    let mut active_cloze_found_in_text = false;
    for node in &parse_text_with_clozes(text) {
        match node {
            // top-level text is indiscriminately added
            TextOrCloze::Text(text) => buf.push_str(text),
            TextOrCloze::Cloze(cloze) => reveal_cloze(
                cloze,
                cloze_ord,
                question,
                &mut active_cloze_found_in_text,
                &mut buf,
            ),
        }
    }
    if active_cloze_found_in_text {
        buf
    } else {
        String::new()
    }
}

fn reveal_cloze(
    cloze: &ExtractedCloze,
    cloze_ord: u16,
    question: bool,
    active_cloze_found_in_text: &mut bool,
    buf: &mut String,
) {
    let active = cloze.ordinal == cloze_ord;
    *active_cloze_found_in_text |= active;
    match (question, active) {
        (true, true) => {
            // question side with active cloze; all inner content is elided
            // (but will probably need to recurse to gather the text to include in data-cloze)
            buf.push_str("<div class=active>");
            buf.push_str(&format!("[{}]", cloze.hint()));
            buf.push_str("</div>");
            return;
        }
        (false, true) => {
            // answer side with active cloze; content from inner clozes is included
            buf.push_str("<div class=active>");
            buf.push_str(&cloze.clozed_text());
            buf.push_str("</>");
        }
        (true, false) => {
            // question side with inactive cloze; text shown normally, but child clozes may be elided
            buf.push_str("<div class=inactive>");
            for node in &cloze.nodes {
                match node {
                    TextOrCloze::Text(text) => buf.push_str(text),
                    TextOrCloze::Cloze(cloze) => {
                        reveal_cloze(cloze, cloze_ord, question, active_cloze_found_in_text, buf)
                    }
                }
            }
            buf.push_str("</div>")
        }
        (false, false) => {
            // answer side with inactive cloze; inner clozes may be active
            buf.push_str("<div class=inactive>");
            for node in &cloze.nodes {
                match node {
                    TextOrCloze::Text(text) => buf.push_str(text),
                    TextOrCloze::Cloze(cloze) => {
                        reveal_cloze(cloze, cloze_ord, question, active_cloze_found_in_text, buf)
                    }
                }
            }
            buf.push_str("</div>");
        }
    }
}

fn contains_cloze(text: &str) -> bool {
    parse_text_with_clozes(text)
        .iter()
        .any(|node| matches!(node, TextOrCloze::Cloze(_)))
}

pub(crate) trait CowMapping<'a, B: ?Sized + 'a + ToOwned> {
    /// Returns [self]
    /// - unchanged, if the given function returns [Cow::Borrowed]
    /// - with the new value, if the given function returns [Cow::Owned]
    fn map_cow(self, f: impl FnOnce(&B) -> Cow<B>) -> Self;
    fn get_owned(self) -> Option<B::Owned>;
}

impl<'a, B: ?Sized + 'a + ToOwned> CowMapping<'a, B> for Cow<'a, B> {
    fn map_cow(self, f: impl FnOnce(&B) -> Cow<B>) -> Self {
        if let Cow::Owned(o) = f(&self) {
            Cow::Owned(o)
        } else {
            self
        }
    }

    fn get_owned(self) -> Option<B::Owned> {
        match self {
            Cow::Borrowed(_) => None,
            Cow::Owned(s) => Some(s),
        }
    }
}

pub fn decode_entities(html: &str) -> Cow<str> {
    if html.contains('&') {
        match htmlescape::decode_html(html) {
            Ok(text) => text.replace('\u{a0}', " ").into(),
            Err(_) => html.into(),
        }
    } else {
        // nothing to do
        html.into()
    }
}

lazy_static! {
static ref HTML: Regex = Regex::new(concat!(
    "(?si)",
    // wrapped text
    r"(<!--.*?-->)|(<style.*?>.*?</style>)|(<script.*?>.*?</script>)",
    // html tags
    r"|(<.*?>)",
))
.unwrap();
}
pub fn strip_html(html: &str) -> Cow<str> {
    strip_html_preserving_entities(html).map_cow(decode_entities)
}

pub fn strip_html_preserving_entities(html: &str) -> Cow<str> {
    HTML.replace_all(html, "")
}

fn main() -> Result<(), Box<dyn Error>> {
    dbg!(parse_text_with_clozes("foo {{c1::bar {{c2::baz}}}}"));
    dbg!(parse_text_with_clozes(
        "foo {{{c1::bar {{c2::baz}}::qux}} hello ::"
    ));
    dbg!(parse_text_with_clozes("foo }} :: {{c1::bar}} baz"));

    assert_eq!(
        cloze_numbers_in_string("test"),
        vec![].into_iter().collect::<HashSet<u16>>()
    );
    assert_eq!(
        cloze_numbers_in_string("{{c2::te}}{{c1::s}}t{{"),
        vec![1, 2].into_iter().collect::<HashSet<u16>>()
    );
    // new addition
    assert_eq!(
        cloze_numbers_in_string("foo {{c1::bar {{c2::baz}}}}"),
        vec![1, 2].into_iter().collect::<HashSet<u16>>()
    );

    // cloze only
    assert_eq!(reveal_cloze_text_only("foo", 1, true), "");
    assert_eq!(reveal_cloze_text_only("foo {{c1::bar}}", 1, true), "...");
    assert_eq!(
        reveal_cloze_text_only("foo {{c1::bar::baz}}", 1, true),
        "baz"
    );
    assert_eq!(reveal_cloze_text_only("foo {{c1::bar}}", 1, false), "bar");
    assert_eq!(reveal_cloze_text_only("foo {{c1::bar}}", 2, false), "");
    assert_eq!(
        reveal_cloze_text_only("{{c1::foo}} {{c1::bar}}", 1, false),
        "foo, bar"
    );
    // new additions
    assert_eq!(
        reveal_cloze_text_only("foo {{c1::bar {{c2::baz}}}}", 1, true),
        "..."
    );
    assert_eq!(
        reveal_cloze_text_only("foo {{c1::bar {{c2::baz}}}}", 1, false),
        "bar baz"
    );
    assert_eq!(
        reveal_cloze_text_only("foo {{c1::bar {{c2::baz}}}}", 2, true),
        "..."
    );
    assert_eq!(
        reveal_cloze_text_only("foo {{c1::bar {{c2::baz}}}}", 2, false),
        "baz"
    );
    assert_eq!(
        reveal_cloze_text_only("foo {{c1::bar {{c1::baz::hint2}}::hint}}", 1, true),
        "hint, hint2"
    );
    assert_eq!(
        reveal_cloze_text_only("foo {{c1::bar {{c2::baz::hint2}}::hint}}", 2, true),
        "hint2"
    );
    assert_eq!(
        reveal_cloze_text_only("foo {{c1::bar {{c1::baz}}}}", 1, false),
        "bar baz, baz"
    );

    assert_eq!(
        strip_html(reveal_cloze_text("foo {{c1::bar {{c2::baz}}}}", 1, true).as_ref()),
        "foo [...]"
    );
    assert_eq!(
        strip_html(reveal_cloze_text("foo {{c1::bar {{c2::baz}}}}", 1, false).as_ref()),
        "foo bar baz"
    );
    assert_eq!(
        strip_html(reveal_cloze_text("foo {{c1::bar {{c2::baz}}::qux}}", 2, true).as_ref()),
        "foo bar [...]"
    );
    assert_eq!(
        strip_html(reveal_cloze_text("foo {{c1::bar {{c2::baz}}::qux}}", 2, false).as_ref()),
        "foo bar baz"
    );
    assert_eq!(
        strip_html(reveal_cloze_text("foo {{c1::bar {{c2::baz}}::qux}}", 1, true).as_ref()),
        "foo [qux]"
    );
    assert_eq!(
        strip_html(reveal_cloze_text("foo {{c1::bar {{c2::baz}}::qux}}", 1, false).as_ref()),
        "foo bar baz"
    );

    Ok(())
}

triaeiou · 2022-11-05T18:53:42Z

What do you think?

If state is acceptable (if let Some(cloze) = open_clozes.last_mut() { and if let Some(cloze) = open_clozes.pop() { in the example above) I can see a "clean" solution to the "rendering". I think that if we use state we should also consider parsing the hint part "correctly" (i.e. correctly parse {{c1:: inside a hint) since it should be doable without making the code harder to follow.

As I understand it you have a fairly specific design in mind so if you want to keep your sample code verbatim please let me know, otherwise I would like to change some of it to include hint tokenizing/parsing.

As for the inclusion of "parsing logic" into reveal_cloze_text() et al - that was intentional to avoid unnecessary processing in functions that don't need "full parsing" of a note (e.g. contains_cloze()). I chose not to split the parsing part out into a separate functions because the end result would have been "every function has their own parser". I haven't analyzed the code for which functions get called often enough to merit optimizing, so if you are happy the overhead of always doing a "full" parse we can leave it as is. If not I would implement different versions of "parsers" (e.g. contains_cloze() only needs to parse until a valid Token::Close is found, add_cloze_numbers_in_string() can skip a lot of string copying etc.) or possibly have those functions that don't produce "html output" contain their own specific parsing logic?

Thoughts?

dae · 2022-11-07T04:30:27Z

If state is acceptable

Please note I was not suggesting state can be avoided everywhere; I just wanted to keep it focused at the convert-tokens-to-logical-structure level, and not at the tokenizing level, or the upper level that consumes the logical structure that has been pulled from the text.

I think that if we use state we should also consider parsing the hint part "correctly" (i.e. correctly parse {{c1:: inside a hint)

I would lean towards not trying to handle this at the moment. For one, I'm not sure we really need to - I suspect it's not something most users would even notice/encounter. Do you have particular cases in mind that I'm overlooking, or are you just trying to be cautious? If my assumption proves incorrect, then we'd be able to add the extra logic to the code in the future, with the knowledge that it's actually required. If we add it now "just in case", we may end up doing extra work for no benefit, and we lose the ability to assume that a hint will only occur in the final fragment of a cloze.

As for the inclusion of "parsing logic" into reveal_cloze_text() et al - that was intentional to avoid unnecessary processing in functions that don't need "full parsing" of a note (e.g. contains_cloze()).

While that's a valid approach, I'd prefer to start simple and avoid making readability/DRY trade-offs unless benchmarks show a big difference. contains_cloze() is actually never run in bulk. add_cloze_numbers_in_string() is, but I'm not sure a manual implementation would be a big enough performance difference to justify the extra code.

… Nested-clozes

triaeiou · 2022-11-08T19:40:28Z

Updated the PR with your code.

dae

Thanks @triaeiou, just one minor point I noticed. Otherwise looks good, and should be able to merge this in once 2.1.55 is out the door.

dae · 2022-11-09T01:43:40Z

rslib/src/cloze.rs

+            ));
+        }
+        (false, true) => {
+            buf.push_str(&format!(


It's a little more efficient here to do

writeln!(buf, "...", ...).unwrap()

Same with other lines which call push_str(&format...)

Ok, updated.

Did you forget to push your changes?

Hmm, seems like I missed that and just saw the rebuild triggered from update merge from main.

… Nested-clozes

dae

Thanks for all your work on this @triaeiou!

dae · 2022-12-16T11:44:55Z

If you could address the conflict, I'll merge this into main. If .56 needs to be pushed out quickly this may not get cherry-picked, but it'll make it into a stable release once it's had some time in testing.

triaeiou · 2022-12-16T12:44:59Z

If you could address the conflict, I'll merge this into main. If .56 needs to be pushed out quickly this may not get cherry-picked, but it'll make it into a stable release once it's had some time in testing.

Ok, resolved (was only a merge conflict in CONTRIBUTORS).

dae · 2022-12-19T02:02:42Z

Thanks!

triaeiou added 13 commits October 23, 2022 23:32

Nested clozes and increased cloze meta data

fa97e95

Update contributors

3423df7

This reverts commit 3423df7.

1b57fd5

Update CONTRIBUTORS

ba28161

Formating

adcf00e

Formating

63bc187

Formating

5ec58c5

Formating

e7a2103

Formating

b519af3

Formating

af3438b

Formating

633ac67

Formating

b003e19

Formating

b81eff2

Merge branch 'main' into Nested-clozes

eed4882

dae reviewed Oct 24, 2022

View reviewed changes

dae mentioned this pull request Oct 24, 2022

Wrap inactive clozes in CSS class #2140

Closed

triaeiou added 8 commits October 28, 2022 14:22

Merge branch 'ankitects:main' into Nested-clozes

7a83d0a

Code refactor

e7a0d81

Formating

8a5aa8c

Formating

177ff07

Formating

f37f7de

Formating and dead code

69a7a4a

Correct test case

146588f

Remove Hint and Close storage of token string

7a30a35

Merge branch 'main' into Nested-clozes

f8366e0

triaeiou added 6 commits November 7, 2022 23:27

Update

d745032

Merge branch 'main' into Nested-clozes

70b5bbc

Formating

8eff106

Merge branch 'Nested-clozes' of https://github.com/TRIAEIOU/anki into…

b5ca14e

… Nested-clozes

Formating

44b133a

Formating

65ba265

dae reviewed Nov 9, 2022

View reviewed changes

triaeiou added 4 commits November 9, 2022 20:45

Use write! instead of .push_str(&format).

db55ac0

Merge branch 'main' into Nested-clozes

939473e

Merge branch 'Nested-clozes' of https://github.com/TRIAEIOU/anki into…

4fa88eb

… Nested-clozes

Formating

5cb0d49

dae approved these changes Nov 10, 2022

View reviewed changes

dae added this to the Next feature update milestone Nov 10, 2022

dae marked this pull request as ready for review November 10, 2022 23:27

dae mentioned this pull request Nov 30, 2022

Update to Anki 2.1.55 ankidroid/Anki-Android#12897

Merged

triaeiou and others added 2 commits December 16, 2022 13:39

Merge branch 'main' into Nested-clozes

91e0573

Merge branch 'ankitects:main' into Nested-clozes

e9cc2a0

dae merged commit 9901ae4 into ankitects:main Dec 19, 2022

galantra mentioned this pull request Jul 27, 2023

[BUG] Exaggerated ranges after optimizing open-spaced-repetition/fsrs-optimizer#5

Closed

iamllama mentioned this pull request Mar 26, 2025

Improve performance of card rendering parser #3886

Merged

Nested clozes and increased cloze meta data #2141

Nested clozes and increased cloze meta data #2141

Uh oh!

Conversation

triaeiou commented Oct 23, 2022

Uh oh!

kleinerpirat commented Oct 23, 2022

Uh oh!

triaeiou commented Oct 23, 2022

Uh oh!

dae left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dae Oct 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RumovZ commented Oct 24, 2022

Uh oh!

triaeiou commented Oct 24, 2022

Uh oh!

triaeiou commented Oct 24, 2022

Uh oh!

dae commented Oct 24, 2022

Uh oh!

dae commented Oct 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

triaeiou commented Nov 5, 2022

Uh oh!

dae commented Nov 7, 2022

Uh oh!

triaeiou commented Nov 8, 2022

Uh oh!

dae left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dae left a comment

Choose a reason for hiding this comment

Uh oh!

dae commented Dec 16, 2022

Uh oh!

triaeiou commented Dec 16, 2022

dae Oct 24, 2022 •

edited

Loading

dae commented Oct 31, 2022 •

edited

Loading