Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@triaeiou
Copy link
Contributor

As per https://forums.ankiweb.net/t/ability-to-generate-nested-cloze/9743/6: implementation to allow nested clozes. Also included some additional cloze meta data as per #1968.

@kleinerpirat
Copy link
Contributor

As Damien reminded me in a recent PR, you can run the tests locally with the command

bazel test ...

from the root folder of the project. Further info: https://github.com/ankitects/anki/blob/main/docs/development.md#running-tests

@triaeiou
Copy link
Contributor Author

As Damien reminded me in a recent PR, you can run the tests locally with the command

bazel test ...

from the root folder of the project. Further info: https://github.com/ankitects/anki/blob/main/docs/development.md#running-tests

Thanks, yeah I had a look at that but it fails to "determine workspace status" so I figured I wouldn't go down the rabbit hole of trying to figure out what the problem was (I am building on windows machine and guess the scripts are really made for a *nix/shell machine).

Cheers

Copy link
Member

@dae dae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial impression is that your Rust code looks pretty good; some comments below.

}

/// Minimal encoding of string for storage in attribute (", &, \n, <, >)
pub fn encode_attribute(text: &str) -> Cow<str> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be using htmlescape::encode_attribute() here. See the review on #1968

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

use self::State::*;
match (state, c) {
(Root, '{') => Open,
(Open, '{') => Open2,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Processing a character a time feels a bit complicated. I wonder whether this could be simplified by tokenizing the string with nom instead? We already use it for dealing parsing card templates.

https://docs.rs/nom/latest/nom/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly. I don't feel the state machine is particularly complicated (although probably a bit more verbose compared to a library). I am unfamiliar with Rust and its libraries so it was simpler to write a state machine. Are you adamant on using nom?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unfamiliar with Rust

You've done a good job - I can't really tell.

I'd appreciate it if you'd give nom a try - I've done the bulk of the parsing work for you, so it should mostly be a case of dealing with the tokens now.

use std::error::Error;

use nom::branch::alt;
use nom::bytes::complete::{take_until, take_while};
use nom::multi::many0;
use nom::{bytes::complete::tag, combinator::map, IResult};

#[derive(Debug)]
enum Token<'a> {
    OpenCloze(u16),
    Text(&'a str),
    Hint(&'a str),
    CloseCloze,
}

fn open_cloze(text: &str) -> IResult<&str, Token> {
    // opening brackets and 'c'
    let (text, _opening_brackets_and_c) = tag("{{c")(text)?;
    // following number
    let (text, digits) = take_while(|c: char| c.is_ascii_digit())(text)?;
    let digits: u16 = match digits.parse() {
        Ok(digits) => digits,
        Err(_) => {
            // not a valid number; fail to recognize
            return Err(nom::Err::Error(nom::error::make_error(
                text,
                nom::error::ErrorKind::Digit,
            )));
        }
    };
    // ::
    let (text, _colons) = tag("::")(text)?;
    Ok((text, Token::OpenCloze(digits)))
}

fn close_cloze(text: &str) -> IResult<&str, Token> {
    map(tag("}}"), |_| Token::CloseCloze)(text)
}

fn hint(text: &str) -> IResult<&str, Token> {
    let (text, _separating_colons) = tag("::")(text)?;
    let (text, hint) = take_until("}}")(text)?;
    Ok((text, Token::Hint(hint)))
}

/// Match a run of text until an open/close or hint is encountered.
/// This will stop on a hint marker even outside a cloze, so the processing
/// should handle hint tokens outside of an active cloze.
fn normal_text(text: &str) -> IResult<&str, Token> {
    if text.is_empty() {
        return Err(nom::Err::Error(nom::error::make_error(
            text,
            nom::error::ErrorKind::Eof,
        )));
    }
    let mut index = 0;
    let mut other_token = alt((open_cloze, close_cloze, hint));
    while other_token(&text[index..]).is_err() && index + 1 < text.len() {
        index += 1;
    }
    Ok((&text[index..], Token::Text(&text[0..index])))
}

// todo: error handling
fn tokenize(text: &str) -> Vec<Token> {
    let (remaining, tokens) =
        many0(alt((open_cloze, hint, close_cloze, normal_text)))(text).unwrap();
    assert!(remaining.is_empty());
    tokens
}

fn main() -> Result<(), Box<dyn Error>> {
    dbg!(tokenize("foo {{c1::bar {{c2::baz}}}}"));
    dbg!(tokenize("foo {{c1::bar {{c2::baz}}::qux}}"));
    Ok(())
}

Example output:

[src/main.rs:74] tokenize("foo {{c1::bar {{c2::baz}}::qux}}") = [
    Text(
        "foo ",
    ),
    OpenCloze(
        1,
    ),
    Text(
        "bar ",
    ),
    OpenCloze(
        2,
    ),
    Text(
        "baz",
    ),
    CloseCloze,
    Hint(
        "qux",
    ),
    CloseCloze,
]

Copy link
Member

@dae dae Oct 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(that should give two advantages - it refers to the original strings instead of constructing a bunch of new ones a character a time, and it allows the parsing code to deal with only 4 token kinds instead of having to ignore a bunch of other ones like HintClose1)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a bug; this line should have the + 1 removed from it:

    while other_token(&text[index..]).is_err() && index < text.len() {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand but as mentioned I don't see a clean solution for tokenize()/parse_tokens() without some state during tokenizing (such as my earlier tokenize()example). If you are set on stateless tokenizing then I suggest going with your solution above and declaring "cloze open" invalid inside hints? I.e. I will look at a resolver for your tokenizing()/parse_tokens()`?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why tokenize would need changing - couldn't it be solved using something like this?

fn parse_tokens<'a>(tokens: &'a [Token]) -> Vec<ParseOutput<'a>> {
    let mut open_clozes: Vec<ExtractedCloze> = vec![];
    // nested clozes are placed here until the stack is empty, so they are gathered in the correct order
    let mut nested_clozes: Vec<ExtractedCloze> = vec![];
    let mut output = vec![];
    let mut current_cloze_has_hint = false;
    for token in tokens {
        match token {
            Token::OpenCloze(number) => {
                if current_cloze_has_hint {
                    // text_fragments would need to become a Cow for us to encode the original number
                    open_clozes
                        .last_mut()
                        .unwrap()
                        .text_fragments
                        .push("{{cx::")
                } else {
                    open_clozes.push(ExtractedCloze {
                        number: *number,
                        text_fragments: Vec::with_capacity(1), // common case
                    });
                }
            }
            Token::Text(text) => {
                if let Some(cloze) = open_clozes.last_mut() {
                    if text.contains("::") {
                        current_cloze_has_hint = true;
                    }
                    cloze.text_fragments.push(text);
                } else {
                    output.push(ParseOutput::Text(text));
                }
            }
            Token::CloseCloze => {
                let nested = open_clozes.len() > 1;
                if let Some(cloze) = open_clozes.pop() {
                    current_cloze_has_hint = false;
                    if nested {
                        nested_clozes.push(cloze);
                    } else {
                        output.push(ParseOutput::Cloze(cloze));
                        while let Some(cloze) = nested_clozes.pop() {
                            output.push(ParseOutput::Cloze(cloze))
                        }
                    }
                } else {
                    output.push(ParseOutput::Text("}}"))
                }
            }
        }
    }
    output
}

But that said, I'd lean towards just declaring this case as unsupported for now, as I suspect we don't actually need it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why tokenize would need changing - couldn't it be solved using something like this?

It can definitely be solved by something like your solution, I just meant I have a hard time coming up with a solution that I find "clean" (not several layers of parsing etc.), but that is just my personal taste. If you are ok with declaring {{c1:: unsupported in the hint I will update the PR with your earlier suggestion (and the other issues) and add the "rendering" code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please, let's declare it unsupported until we know we need it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've updated the PR, I don't know if you wanted another layer of functions to take the result of parse_tokens() or to implement the rendering logic "inside" it. I went with the latter and moved parse_tokens() inside reveal_cloze_text() and reveal_cloze_text_only() as the logic the required steps are fewer in the latter. I also put "local local" functions and types inside their only user function (e.g. the different tokenizing functions) to make it clearer where they are and aren't used.

Sorry for all the back and forth on the "cloze open" inside a hint, it is now supported because it made sense in the way the logic turned out.

As I am no rust expert: someone who is good with rust memory management might want to look at string handling in reveal_cloze_text(), possibly there is some unnecessary string copying going on there (I had to create String from &str earlier than I would like). I can see some solutions to it but they involve growing the code so I don't know if a slight performance gain is worth a larger code base.

)),
// Answer - active cloze
(false, false, true, _) => last.text.push_str(&format!(
r#"<span class="cloze active" data-ordinal="{}">{}</span>"#,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While a separate active class on only the active clozes may make sense in a new implementation, I'm not sure we can do that here - most users will have lines like .cloze { color: blue; } in their card templates, and this change will lead to the inactive clozes to be colored as well, which I'm not sure we want. Perhaps an approach like #2140 might better preserve the existing behaviour?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so how about class="cloze" for active and class="cloze-inactive" for inactive?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's less likely to break things

@RumovZ
Copy link
Collaborator

RumovZ commented Oct 24, 2022

Thanks, yeah I had a look at that but it fails to "determine workspace status" so I figured I wouldn't go down the rabbit hole of trying to figure out what the problem was (I am building on windows machine and guess the scripts are really made for a *nix/shell machine).

Sounds like you haven't configured PATH correctly: https://github.com/ankitects/anki/blob/main/docs/windows.md#more
I'm on Windows, too, and have everything working. Saves an immense amount of time in the long run. 🙂

@triaeiou
Copy link
Contributor Author

Sounds like you haven't configured PATH correctly: https://github.com/ankitects/anki/blob/main/docs/windows.md#more I'm on Windows, too, and have everything working. Saves an immense amount of time in the long run. 🙂

Thanks but I already have bazel and msys on the path and still get <builtin>: BazelWorkspaceStatusAction stable-status.txt failed: Failed to determine workspace status: Process exited with status 127 /bin/bash: .toolsstatus.sh: No such file or directory - You are of course right that it will save a lot of time in the long run - I'll look into it at some point (I don't run bazel as admin because reasons but I guessed the issue was really allowing bazel to create symlinks, which can be set for the build user in a GPO).

@triaeiou
Copy link
Contributor Author

Sorry, I was referring to the comment.

Ah, ok, it is there to emulate the original regex functionality, i.e. foo {{{c1:bar}} should result in foo {<span class="cloze" data-cloze="bar" data-ordinal="1">bar</span>.

@dae
Copy link
Member

dae commented Oct 24, 2022

Ah, I see. We should get that for free with the nom parser.

@dae
Copy link
Member

dae commented Oct 31, 2022

Ok, firstly, I need to apologize for my previous example. It is indeed difficult to achieve the desired outcomes from a flat list of parsed tokens, as I discovered when I went to try implement routines on top of it. What I should have been doing is creating a tree of parsed tokens instead.

I'm afraid I'm not super-fond of your current approach, as while the low-level tokenizing is now handled separately, the logic that maps the flat tokens into logical structures has crept into each of our routines like reveal_cloze_text(), reveal_cloze_text_only(), contains_cloze() and add_cloze_numbers_in_string(), making them somewhat harder to follow. If they instead were based on a parsed token tree, they can focus on the structure of the data, without having to determine it themselves.

I've created an example that shows how the routines above could be implemented with a token tree. It's not complete: it doesn't properly wrap the text with HTML yet, and I have not tested it thoroughly. And I tested it in a separate Rust project, so it includes some copy+pasted routines from the Anki codebase that aren't relevant. But hopefully it better illustrates the direction I was hoping to go in than my previous example. What do you think?

use std::borrow::Cow;
use std::collections::HashSet;
use std::error::Error;

use lazy_static::lazy_static;
use nom::branch::alt;
use nom::bytes::complete::take_while;
use nom::{bytes::complete::tag, combinator::map, IResult};
use regex::Regex;

#[derive(Debug)]
enum Token<'a> {
    OpenCloze(u16),
    Text(&'a str),
    CloseCloze,
}

// todo: error handling
fn tokenize(mut text: &str) -> impl Iterator<Item = Token> {
    fn open_cloze(text: &str) -> IResult<&str, Token> {
        // opening brackets and 'c'
        let (text, _opening_brackets_and_c) = tag("{{c")(text)?;
        // following number
        let (text, digits) = take_while(|c: char| c.is_ascii_digit())(text)?;
        let digits: u16 = match digits.parse() {
            Ok(digits) => digits,
            Err(_) => {
                // not a valid number; fail to recognize
                return Err(nom::Err::Error(nom::error::make_error(
                    text,
                    nom::error::ErrorKind::Digit,
                )));
            }
        };
        // ::
        let (text, _colons) = tag("::")(text)?;
        Ok((text, Token::OpenCloze(digits)))
    }

    fn close_cloze(text: &str) -> IResult<&str, Token> {
        map(tag("}}"), |_| Token::CloseCloze)(text)
    }

    /// Match a run of text until an open/close marker is encountered.
    fn normal_text(text: &str) -> IResult<&str, Token> {
        if text.is_empty() {
            return Err(nom::Err::Error(nom::error::make_error(
                text,
                nom::error::ErrorKind::Eof,
            )));
        }
        let mut index = 0;
        let mut other_token = alt((open_cloze, close_cloze));
        while other_token(&text[index..]).is_err() && index < text.len() {
            index += 1;
        }
        Ok((&text[index..], Token::Text(&text[0..index])))
    }

    std::iter::from_fn(move || {
        if text.is_empty() {
            None
        } else {
            let (remaining_text, token) =
                alt((open_cloze, close_cloze, normal_text))(text).unwrap();
            text = remaining_text;
            Some(token)
        }
    })
}

#[derive(Debug)]
enum TextOrCloze<'a> {
    Text(&'a str),
    Cloze(ExtractedCloze<'a>),
}

#[derive(Debug)]
struct ExtractedCloze<'a> {
    ordinal: u16,
    nodes: Vec<TextOrCloze<'a>>,
    hint: Option<&'a str>,
}

fn parse_text_with_clozes(text: &str) -> Vec<TextOrCloze<'_>> {
    let mut open_clozes: Vec<ExtractedCloze> = vec![];
    let mut output = vec![];
    for token in tokenize(text) {
        match token {
            Token::OpenCloze(ordinal) => open_clozes.push(ExtractedCloze {
                ordinal,
                nodes: Vec::with_capacity(1), // common case
                hint: None,
            }),
            Token::Text(mut text) => {
                if let Some(cloze) = open_clozes.last_mut() {
                    // extract hint if found
                    if let Some((head, tail)) = text.split_once("::") {
                        text = head;
                        cloze.hint = Some(tail);
                    }
                    cloze.nodes.push(TextOrCloze::Text(text));
                } else {
                    output.push(TextOrCloze::Text(text));
                }
            }
            Token::CloseCloze => {
                // take the currently active cloze
                if let Some(cloze) = open_clozes.pop() {
                    let target = if let Some(outer_cloze) = open_clozes.last_mut() {
                        // and place it into the cloze layer above
                        &mut outer_cloze.nodes
                    } else {
                        // or the top level if no other clozes active
                        &mut output
                    };
                    target.push(TextOrCloze::Cloze(cloze));
                } else {
                    // closing marker outside of any clozes
                    output.push(TextOrCloze::Text("}}"))
                }
            }
        }
    }
    output
}

impl ExtractedCloze<'_> {
    /// Return the cloze's hint, or "..." if none was provided.
    fn hint(&self) -> &str {
        self.hint.unwrap_or("...")
    }

    fn clozed_text(&self) -> Cow<str> {
        // happy efficient path?
        if self.nodes.len() == 1 {
            if let TextOrCloze::Text(text) = self.nodes.last().unwrap() {
                return (*text).into();
            }
        }

        let mut buf = String::new();
        for node in &self.nodes {
            match node {
                TextOrCloze::Text(text) => buf.push_str(text),
                TextOrCloze::Cloze(cloze) => buf.push_str(&cloze.clozed_text()),
            }
        }

        buf.into()
    }
}

pub fn cloze_numbers_in_string(html: &str) -> HashSet<u16> {
    let mut set = HashSet::with_capacity(4);
    add_cloze_numbers_in_string(html, &mut set);
    set
}

#[allow(clippy::implicit_hasher)]
pub fn add_cloze_numbers_in_string(html: &str, set: &mut HashSet<u16>) {
    add_cloze_numbers_in_text_with_clozes(&parse_text_with_clozes(html), set)
}

fn add_cloze_numbers_in_text_with_clozes(nodes: &[TextOrCloze], set: &mut HashSet<u16>) {
    for node in nodes {
        if let TextOrCloze::Cloze(cloze) = node {
            set.insert(cloze.ordinal);
            add_cloze_numbers_in_text_with_clozes(&cloze.nodes, set);
        }
    }
}

pub fn reveal_cloze_text_only(text: &str, cloze_ord: u16, question: bool) -> String {
    let mut output = Vec::new();
    for node in &parse_text_with_clozes(text) {
        reveal_cloze_text_in_nodes(node, cloze_ord, question, &mut output);
    }
    output.join(", ")
}

fn reveal_cloze_text_in_nodes(
    node: &TextOrCloze,
    cloze_ord: u16,
    question: bool,
    output: &mut Vec<String>,
) {
    if let TextOrCloze::Cloze(cloze) = node {
        if cloze.ordinal == cloze_ord {
            if question {
                output.push(cloze.hint().into())
            } else {
                output.push(cloze.clozed_text().into())
            }
        }
        for node in &cloze.nodes {
            reveal_cloze_text_in_nodes(&node, cloze_ord, question, output);
        }
    }
}

pub fn reveal_cloze_text(text: &str, cloze_ord: u16, question: bool) -> String {
    let mut buf = String::new();
    let mut active_cloze_found_in_text = false;
    for node in &parse_text_with_clozes(text) {
        match node {
            // top-level text is indiscriminately added
            TextOrCloze::Text(text) => buf.push_str(text),
            TextOrCloze::Cloze(cloze) => reveal_cloze(
                cloze,
                cloze_ord,
                question,
                &mut active_cloze_found_in_text,
                &mut buf,
            ),
        }
    }
    if active_cloze_found_in_text {
        buf
    } else {
        String::new()
    }
}

fn reveal_cloze(
    cloze: &ExtractedCloze,
    cloze_ord: u16,
    question: bool,
    active_cloze_found_in_text: &mut bool,
    buf: &mut String,
) {
    let active = cloze.ordinal == cloze_ord;
    *active_cloze_found_in_text |= active;
    match (question, active) {
        (true, true) => {
            // question side with active cloze; all inner content is elided
            // (but will probably need to recurse to gather the text to include in data-cloze)
            buf.push_str("<div class=active>");
            buf.push_str(&format!("[{}]", cloze.hint()));
            buf.push_str("</div>");
            return;
        }
        (false, true) => {
            // answer side with active cloze; content from inner clozes is included
            buf.push_str("<div class=active>");
            buf.push_str(&cloze.clozed_text());
            buf.push_str("</>");
        }
        (true, false) => {
            // question side with inactive cloze; text shown normally, but child clozes may be elided
            buf.push_str("<div class=inactive>");
            for node in &cloze.nodes {
                match node {
                    TextOrCloze::Text(text) => buf.push_str(text),
                    TextOrCloze::Cloze(cloze) => {
                        reveal_cloze(cloze, cloze_ord, question, active_cloze_found_in_text, buf)
                    }
                }
            }
            buf.push_str("</div>")
        }
        (false, false) => {
            // answer side with inactive cloze; inner clozes may be active
            buf.push_str("<div class=inactive>");
            for node in &cloze.nodes {
                match node {
                    TextOrCloze::Text(text) => buf.push_str(text),
                    TextOrCloze::Cloze(cloze) => {
                        reveal_cloze(cloze, cloze_ord, question, active_cloze_found_in_text, buf)
                    }
                }
            }
            buf.push_str("</div>");
        }
    }
}

fn contains_cloze(text: &str) -> bool {
    parse_text_with_clozes(text)
        .iter()
        .any(|node| matches!(node, TextOrCloze::Cloze(_)))
}

pub(crate) trait CowMapping<'a, B: ?Sized + 'a + ToOwned> {
    /// Returns [self]
    /// - unchanged, if the given function returns [Cow::Borrowed]
    /// - with the new value, if the given function returns [Cow::Owned]
    fn map_cow(self, f: impl FnOnce(&B) -> Cow<B>) -> Self;
    fn get_owned(self) -> Option<B::Owned>;
}

impl<'a, B: ?Sized + 'a + ToOwned> CowMapping<'a, B> for Cow<'a, B> {
    fn map_cow(self, f: impl FnOnce(&B) -> Cow<B>) -> Self {
        if let Cow::Owned(o) = f(&self) {
            Cow::Owned(o)
        } else {
            self
        }
    }

    fn get_owned(self) -> Option<B::Owned> {
        match self {
            Cow::Borrowed(_) => None,
            Cow::Owned(s) => Some(s),
        }
    }
}

pub fn decode_entities(html: &str) -> Cow<str> {
    if html.contains('&') {
        match htmlescape::decode_html(html) {
            Ok(text) => text.replace('\u{a0}', " ").into(),
            Err(_) => html.into(),
        }
    } else {
        // nothing to do
        html.into()
    }
}

lazy_static! {
static ref HTML: Regex = Regex::new(concat!(
    "(?si)",
    // wrapped text
    r"(<!--.*?-->)|(<style.*?>.*?</style>)|(<script.*?>.*?</script>)",
    // html tags
    r"|(<.*?>)",
))
.unwrap();
}
pub fn strip_html(html: &str) -> Cow<str> {
    strip_html_preserving_entities(html).map_cow(decode_entities)
}

pub fn strip_html_preserving_entities(html: &str) -> Cow<str> {
    HTML.replace_all(html, "")
}

fn main() -> Result<(), Box<dyn Error>> {
    dbg!(parse_text_with_clozes("foo {{c1::bar {{c2::baz}}}}"));
    dbg!(parse_text_with_clozes(
        "foo {{{c1::bar {{c2::baz}}::qux}} hello ::"
    ));
    dbg!(parse_text_with_clozes("foo }} :: {{c1::bar}} baz"));

    assert_eq!(
        cloze_numbers_in_string("test"),
        vec![].into_iter().collect::<HashSet<u16>>()
    );
    assert_eq!(
        cloze_numbers_in_string("{{c2::te}}{{c1::s}}t{{"),
        vec![1, 2].into_iter().collect::<HashSet<u16>>()
    );
    // new addition
    assert_eq!(
        cloze_numbers_in_string("foo {{c1::bar {{c2::baz}}}}"),
        vec![1, 2].into_iter().collect::<HashSet<u16>>()
    );

    // cloze only
    assert_eq!(reveal_cloze_text_only("foo", 1, true), "");
    assert_eq!(reveal_cloze_text_only("foo {{c1::bar}}", 1, true), "...");
    assert_eq!(
        reveal_cloze_text_only("foo {{c1::bar::baz}}", 1, true),
        "baz"
    );
    assert_eq!(reveal_cloze_text_only("foo {{c1::bar}}", 1, false), "bar");
    assert_eq!(reveal_cloze_text_only("foo {{c1::bar}}", 2, false), "");
    assert_eq!(
        reveal_cloze_text_only("{{c1::foo}} {{c1::bar}}", 1, false),
        "foo, bar"
    );
    // new additions
    assert_eq!(
        reveal_cloze_text_only("foo {{c1::bar {{c2::baz}}}}", 1, true),
        "..."
    );
    assert_eq!(
        reveal_cloze_text_only("foo {{c1::bar {{c2::baz}}}}", 1, false),
        "bar baz"
    );
    assert_eq!(
        reveal_cloze_text_only("foo {{c1::bar {{c2::baz}}}}", 2, true),
        "..."
    );
    assert_eq!(
        reveal_cloze_text_only("foo {{c1::bar {{c2::baz}}}}", 2, false),
        "baz"
    );
    assert_eq!(
        reveal_cloze_text_only("foo {{c1::bar {{c1::baz::hint2}}::hint}}", 1, true),
        "hint, hint2"
    );
    assert_eq!(
        reveal_cloze_text_only("foo {{c1::bar {{c2::baz::hint2}}::hint}}", 2, true),
        "hint2"
    );
    assert_eq!(
        reveal_cloze_text_only("foo {{c1::bar {{c1::baz}}}}", 1, false),
        "bar baz, baz"
    );

    assert_eq!(
        strip_html(reveal_cloze_text("foo {{c1::bar {{c2::baz}}}}", 1, true).as_ref()),
        "foo [...]"
    );
    assert_eq!(
        strip_html(reveal_cloze_text("foo {{c1::bar {{c2::baz}}}}", 1, false).as_ref()),
        "foo bar baz"
    );
    assert_eq!(
        strip_html(reveal_cloze_text("foo {{c1::bar {{c2::baz}}::qux}}", 2, true).as_ref()),
        "foo bar [...]"
    );
    assert_eq!(
        strip_html(reveal_cloze_text("foo {{c1::bar {{c2::baz}}::qux}}", 2, false).as_ref()),
        "foo bar baz"
    );
    assert_eq!(
        strip_html(reveal_cloze_text("foo {{c1::bar {{c2::baz}}::qux}}", 1, true).as_ref()),
        "foo [qux]"
    );
    assert_eq!(
        strip_html(reveal_cloze_text("foo {{c1::bar {{c2::baz}}::qux}}", 1, false).as_ref()),
        "foo bar baz"
    );

    Ok(())
}

@triaeiou
Copy link
Contributor Author

triaeiou commented Nov 5, 2022

What do you think?

If state is acceptable (if let Some(cloze) = open_clozes.last_mut() { and if let Some(cloze) = open_clozes.pop() { in the example above) I can see a "clean" solution to the "rendering". I think that if we use state we should also consider parsing the hint part "correctly" (i.e. correctly parse {{c1:: inside a hint) since it should be doable without making the code harder to follow.

As I understand it you have a fairly specific design in mind so if you want to keep your sample code verbatim please let me know, otherwise I would like to change some of it to include hint tokenizing/parsing.

As for the inclusion of "parsing logic" into reveal_cloze_text() et al - that was intentional to avoid unnecessary processing in functions that don't need "full parsing" of a note (e.g. contains_cloze()). I chose not to split the parsing part out into a separate functions because the end result would have been "every function has their own parser". I haven't analyzed the code for which functions get called often enough to merit optimizing, so if you are happy the overhead of always doing a "full" parse we can leave it as is. If not I would implement different versions of "parsers" (e.g. contains_cloze() only needs to parse until a valid Token::Close is found, add_cloze_numbers_in_string() can skip a lot of string copying etc.) or possibly have those functions that don't produce "html output" contain their own specific parsing logic?

Thoughts?

@dae
Copy link
Member

dae commented Nov 7, 2022

If state is acceptable

Please note I was not suggesting state can be avoided everywhere; I just wanted to keep it focused at the convert-tokens-to-logical-structure level, and not at the tokenizing level, or the upper level that consumes the logical structure that has been pulled from the text.

I think that if we use state we should also consider parsing the hint part "correctly" (i.e. correctly parse {{c1:: inside a hint)

I would lean towards not trying to handle this at the moment. For one, I'm not sure we really need to - I suspect it's not something most users would even notice/encounter. Do you have particular cases in mind that I'm overlooking, or are you just trying to be cautious? If my assumption proves incorrect, then we'd be able to add the extra logic to the code in the future, with the knowledge that it's actually required. If we add it now "just in case", we may end up doing extra work for no benefit, and we lose the ability to assume that a hint will only occur in the final fragment of a cloze.

As for the inclusion of "parsing logic" into reveal_cloze_text() et al - that was intentional to avoid unnecessary processing in functions that don't need "full parsing" of a note (e.g. contains_cloze()).

While that's a valid approach, I'd prefer to start simple and avoid making readability/DRY trade-offs unless benchmarks show a big difference. contains_cloze() is actually never run in bulk. add_cloze_numbers_in_string() is, but I'm not sure a manual implementation would be a big enough performance difference to justify the extra code.

@triaeiou
Copy link
Contributor Author

triaeiou commented Nov 8, 2022

Updated the PR with your code.

Copy link
Member

@dae dae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @triaeiou, just one minor point I noticed. Otherwise looks good, and should be able to merge this in once 2.1.55 is out the door.

));
}
(false, true) => {
buf.push_str(&format!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little more efficient here to do

writeln!(buf, "...", ...).unwrap()

Same with other lines which call push_str(&format...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, updated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you forget to push your changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, seems like I missed that and just saw the rebuild triggered from update merge from main.

Copy link
Member

@dae dae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all your work on this @triaeiou!

@dae dae added this to the Next feature update milestone Nov 10, 2022
@dae dae marked this pull request as ready for review November 10, 2022 23:27
@dae
Copy link
Member

dae commented Dec 16, 2022

If you could address the conflict, I'll merge this into main. If .56 needs to be pushed out quickly this may not get cherry-picked, but it'll make it into a stable release once it's had some time in testing.

@triaeiou
Copy link
Contributor Author

If you could address the conflict, I'll merge this into main. If .56 needs to be pushed out quickly this may not get cherry-picked, but it'll make it into a stable release once it's had some time in testing.

Ok, resolved (was only a merge conflict in CONTRIBUTORS).

@dae
Copy link
Member

dae commented Dec 19, 2022

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants