-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Nested clozes and increased cloze meta data #2141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
As Damien reminded me in a recent PR, you can run the tests locally with the command from the root folder of the project. Further info: https://github.com/ankitects/anki/blob/main/docs/development.md#running-tests |
Thanks, yeah I had a look at that but it fails to "determine workspace status" so I figured I wouldn't go down the rabbit hole of trying to figure out what the problem was (I am building on windows machine and guess the scripts are really made for a *nix/shell machine). Cheers |
dae
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial impression is that your Rust code looks pretty good; some comments below.
rslib/src/cloze.rs
Outdated
| } | ||
|
|
||
| /// Minimal encoding of string for storage in attribute (", &, \n, <, >) | ||
| pub fn encode_attribute(text: &str) -> Cow<str> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be using htmlescape::encode_attribute() here. See the review on #1968
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
rslib/src/cloze.rs
Outdated
| use self::State::*; | ||
| match (state, c) { | ||
| (Root, '{') => Open, | ||
| (Open, '{') => Open2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Processing a character a time feels a bit complicated. I wonder whether this could be simplified by tokenizing the string with nom instead? We already use it for dealing parsing card templates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly. I don't feel the state machine is particularly complicated (although probably a bit more verbose compared to a library). I am unfamiliar with Rust and its libraries so it was simpler to write a state machine. Are you adamant on using nom?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unfamiliar with Rust
You've done a good job - I can't really tell.
I'd appreciate it if you'd give nom a try - I've done the bulk of the parsing work for you, so it should mostly be a case of dealing with the tokens now.
use std::error::Error;
use nom::branch::alt;
use nom::bytes::complete::{take_until, take_while};
use nom::multi::many0;
use nom::{bytes::complete::tag, combinator::map, IResult};
#[derive(Debug)]
enum Token<'a> {
OpenCloze(u16),
Text(&'a str),
Hint(&'a str),
CloseCloze,
}
fn open_cloze(text: &str) -> IResult<&str, Token> {
// opening brackets and 'c'
let (text, _opening_brackets_and_c) = tag("{{c")(text)?;
// following number
let (text, digits) = take_while(|c: char| c.is_ascii_digit())(text)?;
let digits: u16 = match digits.parse() {
Ok(digits) => digits,
Err(_) => {
// not a valid number; fail to recognize
return Err(nom::Err::Error(nom::error::make_error(
text,
nom::error::ErrorKind::Digit,
)));
}
};
// ::
let (text, _colons) = tag("::")(text)?;
Ok((text, Token::OpenCloze(digits)))
}
fn close_cloze(text: &str) -> IResult<&str, Token> {
map(tag("}}"), |_| Token::CloseCloze)(text)
}
fn hint(text: &str) -> IResult<&str, Token> {
let (text, _separating_colons) = tag("::")(text)?;
let (text, hint) = take_until("}}")(text)?;
Ok((text, Token::Hint(hint)))
}
/// Match a run of text until an open/close or hint is encountered.
/// This will stop on a hint marker even outside a cloze, so the processing
/// should handle hint tokens outside of an active cloze.
fn normal_text(text: &str) -> IResult<&str, Token> {
if text.is_empty() {
return Err(nom::Err::Error(nom::error::make_error(
text,
nom::error::ErrorKind::Eof,
)));
}
let mut index = 0;
let mut other_token = alt((open_cloze, close_cloze, hint));
while other_token(&text[index..]).is_err() && index + 1 < text.len() {
index += 1;
}
Ok((&text[index..], Token::Text(&text[0..index])))
}
// todo: error handling
fn tokenize(text: &str) -> Vec<Token> {
let (remaining, tokens) =
many0(alt((open_cloze, hint, close_cloze, normal_text)))(text).unwrap();
assert!(remaining.is_empty());
tokens
}
fn main() -> Result<(), Box<dyn Error>> {
dbg!(tokenize("foo {{c1::bar {{c2::baz}}}}"));
dbg!(tokenize("foo {{c1::bar {{c2::baz}}::qux}}"));
Ok(())
}Example output:
[src/main.rs:74] tokenize("foo {{c1::bar {{c2::baz}}::qux}}") = [
Text(
"foo ",
),
OpenCloze(
1,
),
Text(
"bar ",
),
OpenCloze(
2,
),
Text(
"baz",
),
CloseCloze,
Hint(
"qux",
),
CloseCloze,
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(that should give two advantages - it refers to the original strings instead of constructing a bunch of new ones a character a time, and it allows the parsing code to deal with only 4 token kinds instead of having to ignore a bunch of other ones like HintClose1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found a bug; this line should have the + 1 removed from it:
while other_token(&text[index..]).is_err() && index < text.len() {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand but as mentioned I don't see a clean solution for tokenize()/parse_tokens() without some state during tokenizing (such as my earlier tokenize()example). If you are set on stateless tokenizing then I suggest going with your solution above and declaring "cloze open" invalid inside hints? I.e. I will look at a resolver for your tokenizing()/parse_tokens()`?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why tokenize would need changing - couldn't it be solved using something like this?
fn parse_tokens<'a>(tokens: &'a [Token]) -> Vec<ParseOutput<'a>> {
let mut open_clozes: Vec<ExtractedCloze> = vec![];
// nested clozes are placed here until the stack is empty, so they are gathered in the correct order
let mut nested_clozes: Vec<ExtractedCloze> = vec![];
let mut output = vec![];
let mut current_cloze_has_hint = false;
for token in tokens {
match token {
Token::OpenCloze(number) => {
if current_cloze_has_hint {
// text_fragments would need to become a Cow for us to encode the original number
open_clozes
.last_mut()
.unwrap()
.text_fragments
.push("{{cx::")
} else {
open_clozes.push(ExtractedCloze {
number: *number,
text_fragments: Vec::with_capacity(1), // common case
});
}
}
Token::Text(text) => {
if let Some(cloze) = open_clozes.last_mut() {
if text.contains("::") {
current_cloze_has_hint = true;
}
cloze.text_fragments.push(text);
} else {
output.push(ParseOutput::Text(text));
}
}
Token::CloseCloze => {
let nested = open_clozes.len() > 1;
if let Some(cloze) = open_clozes.pop() {
current_cloze_has_hint = false;
if nested {
nested_clozes.push(cloze);
} else {
output.push(ParseOutput::Cloze(cloze));
while let Some(cloze) = nested_clozes.pop() {
output.push(ParseOutput::Cloze(cloze))
}
}
} else {
output.push(ParseOutput::Text("}}"))
}
}
}
}
output
}But that said, I'd lean towards just declaring this case as unsupported for now, as I suspect we don't actually need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why tokenize would need changing - couldn't it be solved using something like this?
It can definitely be solved by something like your solution, I just meant I have a hard time coming up with a solution that I find "clean" (not several layers of parsing etc.), but that is just my personal taste. If you are ok with declaring {{c1:: unsupported in the hint I will update the PR with your earlier suggestion (and the other issues) and add the "rendering" code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please, let's declare it unsupported until we know we need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've updated the PR, I don't know if you wanted another layer of functions to take the result of parse_tokens() or to implement the rendering logic "inside" it. I went with the latter and moved parse_tokens() inside reveal_cloze_text() and reveal_cloze_text_only() as the logic the required steps are fewer in the latter. I also put "local local" functions and types inside their only user function (e.g. the different tokenizing functions) to make it clearer where they are and aren't used.
Sorry for all the back and forth on the "cloze open" inside a hint, it is now supported because it made sense in the way the logic turned out.
As I am no rust expert: someone who is good with rust memory management might want to look at string handling in reveal_cloze_text(), possibly there is some unnecessary string copying going on there (I had to create String from &str earlier than I would like). I can see some solutions to it but they involve growing the code so I don't know if a slight performance gain is worth a larger code base.
rslib/src/cloze.rs
Outdated
| )), | ||
| // Answer - active cloze | ||
| (false, false, true, _) => last.text.push_str(&format!( | ||
| r#"<span class="cloze active" data-ordinal="{}">{}</span>"#, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While a separate active class on only the active clozes may make sense in a new implementation, I'm not sure we can do that here - most users will have lines like .cloze { color: blue; } in their card templates, and this change will lead to the inactive clozes to be colored as well, which I'm not sure we want. Perhaps an approach like #2140 might better preserve the existing behaviour?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so how about class="cloze" for active and class="cloze-inactive" for inactive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that's less likely to break things
Sounds like you haven't configured PATH correctly: https://github.com/ankitects/anki/blob/main/docs/windows.md#more |
Thanks but I already have bazel and msys on the path and still get |
Ah, ok, it is there to emulate the original regex functionality, i.e. |
|
Ah, I see. We should get that for free with the nom parser. |
|
Ok, firstly, I need to apologize for my previous example. It is indeed difficult to achieve the desired outcomes from a flat list of parsed tokens, as I discovered when I went to try implement routines on top of it. What I should have been doing is creating a tree of parsed tokens instead. I'm afraid I'm not super-fond of your current approach, as while the low-level tokenizing is now handled separately, the logic that maps the flat tokens into logical structures has crept into each of our routines like I've created an example that shows how the routines above could be implemented with a token tree. It's not complete: it doesn't properly wrap the text with HTML yet, and I have not tested it thoroughly. And I tested it in a separate Rust project, so it includes some copy+pasted routines from the Anki codebase that aren't relevant. But hopefully it better illustrates the direction I was hoping to go in than my previous example. What do you think? use std::borrow::Cow;
use std::collections::HashSet;
use std::error::Error;
use lazy_static::lazy_static;
use nom::branch::alt;
use nom::bytes::complete::take_while;
use nom::{bytes::complete::tag, combinator::map, IResult};
use regex::Regex;
#[derive(Debug)]
enum Token<'a> {
OpenCloze(u16),
Text(&'a str),
CloseCloze,
}
// todo: error handling
fn tokenize(mut text: &str) -> impl Iterator<Item = Token> {
fn open_cloze(text: &str) -> IResult<&str, Token> {
// opening brackets and 'c'
let (text, _opening_brackets_and_c) = tag("{{c")(text)?;
// following number
let (text, digits) = take_while(|c: char| c.is_ascii_digit())(text)?;
let digits: u16 = match digits.parse() {
Ok(digits) => digits,
Err(_) => {
// not a valid number; fail to recognize
return Err(nom::Err::Error(nom::error::make_error(
text,
nom::error::ErrorKind::Digit,
)));
}
};
// ::
let (text, _colons) = tag("::")(text)?;
Ok((text, Token::OpenCloze(digits)))
}
fn close_cloze(text: &str) -> IResult<&str, Token> {
map(tag("}}"), |_| Token::CloseCloze)(text)
}
/// Match a run of text until an open/close marker is encountered.
fn normal_text(text: &str) -> IResult<&str, Token> {
if text.is_empty() {
return Err(nom::Err::Error(nom::error::make_error(
text,
nom::error::ErrorKind::Eof,
)));
}
let mut index = 0;
let mut other_token = alt((open_cloze, close_cloze));
while other_token(&text[index..]).is_err() && index < text.len() {
index += 1;
}
Ok((&text[index..], Token::Text(&text[0..index])))
}
std::iter::from_fn(move || {
if text.is_empty() {
None
} else {
let (remaining_text, token) =
alt((open_cloze, close_cloze, normal_text))(text).unwrap();
text = remaining_text;
Some(token)
}
})
}
#[derive(Debug)]
enum TextOrCloze<'a> {
Text(&'a str),
Cloze(ExtractedCloze<'a>),
}
#[derive(Debug)]
struct ExtractedCloze<'a> {
ordinal: u16,
nodes: Vec<TextOrCloze<'a>>,
hint: Option<&'a str>,
}
fn parse_text_with_clozes(text: &str) -> Vec<TextOrCloze<'_>> {
let mut open_clozes: Vec<ExtractedCloze> = vec![];
let mut output = vec![];
for token in tokenize(text) {
match token {
Token::OpenCloze(ordinal) => open_clozes.push(ExtractedCloze {
ordinal,
nodes: Vec::with_capacity(1), // common case
hint: None,
}),
Token::Text(mut text) => {
if let Some(cloze) = open_clozes.last_mut() {
// extract hint if found
if let Some((head, tail)) = text.split_once("::") {
text = head;
cloze.hint = Some(tail);
}
cloze.nodes.push(TextOrCloze::Text(text));
} else {
output.push(TextOrCloze::Text(text));
}
}
Token::CloseCloze => {
// take the currently active cloze
if let Some(cloze) = open_clozes.pop() {
let target = if let Some(outer_cloze) = open_clozes.last_mut() {
// and place it into the cloze layer above
&mut outer_cloze.nodes
} else {
// or the top level if no other clozes active
&mut output
};
target.push(TextOrCloze::Cloze(cloze));
} else {
// closing marker outside of any clozes
output.push(TextOrCloze::Text("}}"))
}
}
}
}
output
}
impl ExtractedCloze<'_> {
/// Return the cloze's hint, or "..." if none was provided.
fn hint(&self) -> &str {
self.hint.unwrap_or("...")
}
fn clozed_text(&self) -> Cow<str> {
// happy efficient path?
if self.nodes.len() == 1 {
if let TextOrCloze::Text(text) = self.nodes.last().unwrap() {
return (*text).into();
}
}
let mut buf = String::new();
for node in &self.nodes {
match node {
TextOrCloze::Text(text) => buf.push_str(text),
TextOrCloze::Cloze(cloze) => buf.push_str(&cloze.clozed_text()),
}
}
buf.into()
}
}
pub fn cloze_numbers_in_string(html: &str) -> HashSet<u16> {
let mut set = HashSet::with_capacity(4);
add_cloze_numbers_in_string(html, &mut set);
set
}
#[allow(clippy::implicit_hasher)]
pub fn add_cloze_numbers_in_string(html: &str, set: &mut HashSet<u16>) {
add_cloze_numbers_in_text_with_clozes(&parse_text_with_clozes(html), set)
}
fn add_cloze_numbers_in_text_with_clozes(nodes: &[TextOrCloze], set: &mut HashSet<u16>) {
for node in nodes {
if let TextOrCloze::Cloze(cloze) = node {
set.insert(cloze.ordinal);
add_cloze_numbers_in_text_with_clozes(&cloze.nodes, set);
}
}
}
pub fn reveal_cloze_text_only(text: &str, cloze_ord: u16, question: bool) -> String {
let mut output = Vec::new();
for node in &parse_text_with_clozes(text) {
reveal_cloze_text_in_nodes(node, cloze_ord, question, &mut output);
}
output.join(", ")
}
fn reveal_cloze_text_in_nodes(
node: &TextOrCloze,
cloze_ord: u16,
question: bool,
output: &mut Vec<String>,
) {
if let TextOrCloze::Cloze(cloze) = node {
if cloze.ordinal == cloze_ord {
if question {
output.push(cloze.hint().into())
} else {
output.push(cloze.clozed_text().into())
}
}
for node in &cloze.nodes {
reveal_cloze_text_in_nodes(&node, cloze_ord, question, output);
}
}
}
pub fn reveal_cloze_text(text: &str, cloze_ord: u16, question: bool) -> String {
let mut buf = String::new();
let mut active_cloze_found_in_text = false;
for node in &parse_text_with_clozes(text) {
match node {
// top-level text is indiscriminately added
TextOrCloze::Text(text) => buf.push_str(text),
TextOrCloze::Cloze(cloze) => reveal_cloze(
cloze,
cloze_ord,
question,
&mut active_cloze_found_in_text,
&mut buf,
),
}
}
if active_cloze_found_in_text {
buf
} else {
String::new()
}
}
fn reveal_cloze(
cloze: &ExtractedCloze,
cloze_ord: u16,
question: bool,
active_cloze_found_in_text: &mut bool,
buf: &mut String,
) {
let active = cloze.ordinal == cloze_ord;
*active_cloze_found_in_text |= active;
match (question, active) {
(true, true) => {
// question side with active cloze; all inner content is elided
// (but will probably need to recurse to gather the text to include in data-cloze)
buf.push_str("<div class=active>");
buf.push_str(&format!("[{}]", cloze.hint()));
buf.push_str("</div>");
return;
}
(false, true) => {
// answer side with active cloze; content from inner clozes is included
buf.push_str("<div class=active>");
buf.push_str(&cloze.clozed_text());
buf.push_str("</>");
}
(true, false) => {
// question side with inactive cloze; text shown normally, but child clozes may be elided
buf.push_str("<div class=inactive>");
for node in &cloze.nodes {
match node {
TextOrCloze::Text(text) => buf.push_str(text),
TextOrCloze::Cloze(cloze) => {
reveal_cloze(cloze, cloze_ord, question, active_cloze_found_in_text, buf)
}
}
}
buf.push_str("</div>")
}
(false, false) => {
// answer side with inactive cloze; inner clozes may be active
buf.push_str("<div class=inactive>");
for node in &cloze.nodes {
match node {
TextOrCloze::Text(text) => buf.push_str(text),
TextOrCloze::Cloze(cloze) => {
reveal_cloze(cloze, cloze_ord, question, active_cloze_found_in_text, buf)
}
}
}
buf.push_str("</div>");
}
}
}
fn contains_cloze(text: &str) -> bool {
parse_text_with_clozes(text)
.iter()
.any(|node| matches!(node, TextOrCloze::Cloze(_)))
}
pub(crate) trait CowMapping<'a, B: ?Sized + 'a + ToOwned> {
/// Returns [self]
/// - unchanged, if the given function returns [Cow::Borrowed]
/// - with the new value, if the given function returns [Cow::Owned]
fn map_cow(self, f: impl FnOnce(&B) -> Cow<B>) -> Self;
fn get_owned(self) -> Option<B::Owned>;
}
impl<'a, B: ?Sized + 'a + ToOwned> CowMapping<'a, B> for Cow<'a, B> {
fn map_cow(self, f: impl FnOnce(&B) -> Cow<B>) -> Self {
if let Cow::Owned(o) = f(&self) {
Cow::Owned(o)
} else {
self
}
}
fn get_owned(self) -> Option<B::Owned> {
match self {
Cow::Borrowed(_) => None,
Cow::Owned(s) => Some(s),
}
}
}
pub fn decode_entities(html: &str) -> Cow<str> {
if html.contains('&') {
match htmlescape::decode_html(html) {
Ok(text) => text.replace('\u{a0}', " ").into(),
Err(_) => html.into(),
}
} else {
// nothing to do
html.into()
}
}
lazy_static! {
static ref HTML: Regex = Regex::new(concat!(
"(?si)",
// wrapped text
r"(<!--.*?-->)|(<style.*?>.*?</style>)|(<script.*?>.*?</script>)",
// html tags
r"|(<.*?>)",
))
.unwrap();
}
pub fn strip_html(html: &str) -> Cow<str> {
strip_html_preserving_entities(html).map_cow(decode_entities)
}
pub fn strip_html_preserving_entities(html: &str) -> Cow<str> {
HTML.replace_all(html, "")
}
fn main() -> Result<(), Box<dyn Error>> {
dbg!(parse_text_with_clozes("foo {{c1::bar {{c2::baz}}}}"));
dbg!(parse_text_with_clozes(
"foo {{{c1::bar {{c2::baz}}::qux}} hello ::"
));
dbg!(parse_text_with_clozes("foo }} :: {{c1::bar}} baz"));
assert_eq!(
cloze_numbers_in_string("test"),
vec![].into_iter().collect::<HashSet<u16>>()
);
assert_eq!(
cloze_numbers_in_string("{{c2::te}}{{c1::s}}t{{"),
vec![1, 2].into_iter().collect::<HashSet<u16>>()
);
// new addition
assert_eq!(
cloze_numbers_in_string("foo {{c1::bar {{c2::baz}}}}"),
vec![1, 2].into_iter().collect::<HashSet<u16>>()
);
// cloze only
assert_eq!(reveal_cloze_text_only("foo", 1, true), "");
assert_eq!(reveal_cloze_text_only("foo {{c1::bar}}", 1, true), "...");
assert_eq!(
reveal_cloze_text_only("foo {{c1::bar::baz}}", 1, true),
"baz"
);
assert_eq!(reveal_cloze_text_only("foo {{c1::bar}}", 1, false), "bar");
assert_eq!(reveal_cloze_text_only("foo {{c1::bar}}", 2, false), "");
assert_eq!(
reveal_cloze_text_only("{{c1::foo}} {{c1::bar}}", 1, false),
"foo, bar"
);
// new additions
assert_eq!(
reveal_cloze_text_only("foo {{c1::bar {{c2::baz}}}}", 1, true),
"..."
);
assert_eq!(
reveal_cloze_text_only("foo {{c1::bar {{c2::baz}}}}", 1, false),
"bar baz"
);
assert_eq!(
reveal_cloze_text_only("foo {{c1::bar {{c2::baz}}}}", 2, true),
"..."
);
assert_eq!(
reveal_cloze_text_only("foo {{c1::bar {{c2::baz}}}}", 2, false),
"baz"
);
assert_eq!(
reveal_cloze_text_only("foo {{c1::bar {{c1::baz::hint2}}::hint}}", 1, true),
"hint, hint2"
);
assert_eq!(
reveal_cloze_text_only("foo {{c1::bar {{c2::baz::hint2}}::hint}}", 2, true),
"hint2"
);
assert_eq!(
reveal_cloze_text_only("foo {{c1::bar {{c1::baz}}}}", 1, false),
"bar baz, baz"
);
assert_eq!(
strip_html(reveal_cloze_text("foo {{c1::bar {{c2::baz}}}}", 1, true).as_ref()),
"foo [...]"
);
assert_eq!(
strip_html(reveal_cloze_text("foo {{c1::bar {{c2::baz}}}}", 1, false).as_ref()),
"foo bar baz"
);
assert_eq!(
strip_html(reveal_cloze_text("foo {{c1::bar {{c2::baz}}::qux}}", 2, true).as_ref()),
"foo bar [...]"
);
assert_eq!(
strip_html(reveal_cloze_text("foo {{c1::bar {{c2::baz}}::qux}}", 2, false).as_ref()),
"foo bar baz"
);
assert_eq!(
strip_html(reveal_cloze_text("foo {{c1::bar {{c2::baz}}::qux}}", 1, true).as_ref()),
"foo [qux]"
);
assert_eq!(
strip_html(reveal_cloze_text("foo {{c1::bar {{c2::baz}}::qux}}", 1, false).as_ref()),
"foo bar baz"
);
Ok(())
} |
If state is acceptable ( As I understand it you have a fairly specific design in mind so if you want to keep your sample code verbatim please let me know, otherwise I would like to change some of it to include hint tokenizing/parsing. As for the inclusion of "parsing logic" into Thoughts? |
Please note I was not suggesting state can be avoided everywhere; I just wanted to keep it focused at the convert-tokens-to-logical-structure level, and not at the tokenizing level, or the upper level that consumes the logical structure that has been pulled from the text.
I would lean towards not trying to handle this at the moment. For one, I'm not sure we really need to - I suspect it's not something most users would even notice/encounter. Do you have particular cases in mind that I'm overlooking, or are you just trying to be cautious? If my assumption proves incorrect, then we'd be able to add the extra logic to the code in the future, with the knowledge that it's actually required. If we add it now "just in case", we may end up doing extra work for no benefit, and we lose the ability to assume that a hint will only occur in the final fragment of a cloze.
While that's a valid approach, I'd prefer to start simple and avoid making readability/DRY trade-offs unless benchmarks show a big difference. |
|
Updated the PR with your code. |
dae
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @triaeiou, just one minor point I noticed. Otherwise looks good, and should be able to merge this in once 2.1.55 is out the door.
rslib/src/cloze.rs
Outdated
| )); | ||
| } | ||
| (false, true) => { | ||
| buf.push_str(&format!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little more efficient here to do
writeln!(buf, "...", ...).unwrap()
Same with other lines which call push_str(&format...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you forget to push your changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, seems like I missed that and just saw the rebuild triggered from update merge from main.
dae
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all your work on this @triaeiou!
|
If you could address the conflict, I'll merge this into main. If .56 needs to be pushed out quickly this may not get cherry-picked, but it'll make it into a stable release once it's had some time in testing. |
Ok, resolved (was only a merge conflict in CONTRIBUTORS). |
|
Thanks! |
As per https://forums.ankiweb.net/t/ability-to-generate-nested-cloze/9743/6: implementation to allow nested clozes. Also included some additional cloze meta data as per #1968.