Codestin Search App

tats-u · 2025-01-22T13:54:57Z

Initial checklist

I read the support docs
I read the contributing guide
I agree to follow the code of conduct
I searched issues and discussions and couldn’t find anything or linked relevant results below
I made sure the docs are up to date
I included tests (or that’s not needed)

Description of changes

Fixes #189
We might need more tests to deal with abnormal surrogate pair patterns.

codecov · 2025-01-22T13:56:16Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (2edb5e7) to head (ba3ae01).
Report is 46 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##              main      #190    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files           57        58     +1     
  Lines        11932     12496   +564     
==========================================
+ Hits         11932     12496   +564

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

wooorm · 2025-01-22T15:30:06Z

Thanks for working on this! Appreciate it!

I would like to see a test case in the next version of CommonMark for this. Then things can be changed here

tats-u · 2025-01-26T13:11:57Z

Do you know which repository I should submit a PR to, https://github.com/commonmark/commonmark-spec-web or https://github.com/commonmark/commonmark-spec?

Also 'Should handle lonely surrogate pair around emphasis' is not suitable for common tests shared with implementations in other than JS/Java/C# because it contains lonely surrogates, which are not valid for UTF-8 (and possibly UTF-32).

wooorm · 2025-01-27T08:31:22Z

PR goes to commonmark/commonmark-spec.

Maintainers are conservative with breaking changes like this.
I recommend making it as easy as possible to merge. As short and clear as possible.
You do not have to do everything in one big PR: if someone finds something controversial, that would block the whole PR.
If things are blocked, try and get something in there first, and improve on it later.

For the algorithm in the appendix (https://spec.commonmark.org/0.31.2/#phase-2-inline-structure), I think that is very complex, you might want to ask John to do that?
From what I heard, John already has a (local?) branch for CJK+emphasis in cmark? So perhaps John can develop/merge that together with a PR to the spec to change the appendix?

wooorm · 2025-01-27T08:32:44Z

I’d recommend not testing lonely surrogates for now then. You can also ask john on how best to test that. The CM spec does not mention UTF8, so perhaps this is something that is out of scope to CM.

tats-u · 2025-02-16T15:43:00Z

I’d recommend not testing lonely surrogates for now then.

Indeed it may be better left as implementation-defined. ~~I will remove it from the test case in this repository later.~~ Update: should I just delete assert.equal to assure only that micromark does not throw a runtime exception for such ill-formed inputs?
In the first place, it seems to be officially called isolated surrogate code unit, and strings containing it returns false for .isWellFormed().

wooorm · 2025-02-17T09:16:21Z

I was talking about the CM spec. If it is impossible to add a test there, then I do not recommend trying to add a test there.
It is possible to have a test here, so we can have a test here.

tats-u · 2025-02-17T09:37:00Z

I was wrong; I think that the spec treats isolated surrogate code units as non-punctuation now because a character there is an Unicode code point, which includes surrogate code points (D800-DFFF), and the category of surrogate code points is Cs (not P or S).
However, I think that the spec should be revised to stop implementations from overthinking about surrogate code units or other ill-formed code unit sequences. (e.g. allows implementations to replace them with FFFD, which is a punctuation, in advance at their discretion)
I think the test cases for isolated surrogate code units may remain but is not so much recommended.

tats-u · 2025-03-23T10:38:02Z

I rebased this PR to main, but I got in CI:

Error: test/util/slow-stream.js(23,49): error TS2345: Argument of type 'BufferEncoding | undefined' is not assignable to parameter of type 'BufferEncoding'.

I have not modified this file. It is strange.

tats-u · 2025-03-30T14:43:07Z

I have not modified this file. It is strange.

Probably will be fixed by #196

tats-u · 2025-04-02T14:32:21Z

Rebased to main. Please tell me if I need to squash commits into one.

github-actions bot added 👋 phase/new Post is being triaged automatically 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Jan 22, 2025

tats-u mentioned this pull request Jan 22, 2025

Recognize non-BMP punctuation & symbols (to prepare for CJK support in the future) #189

Open

4 tasks

tats-u changed the title ~~Recognize non-BMP punctuation & symbols~~ Recognize supplementary (non-BMP) punctuation & symbols Feb 16, 2025

tats-u mentioned this pull request Feb 24, 2025

Make output for emphasis (_, __, *, or **) adjacent to ill-formed code unit sequence (e.g. isolated surrogate code unit) unspecified behavior commonmark/commonmark-spec#791

Open

tats-u force-pushed the non-bmp branch from e942807 to 0275667 Compare March 23, 2025 08:37

tats-u mentioned this pull request Mar 23, 2025

Add supplementary (non-BMP) currency symbol in Unicode symbol example commonmark/commonmark-spec#794

Merged

tats-u added 5 commits April 2, 2025 23:16

Recognize non-BMP punctuation & symbols

9f46233

Improve tests

28d7c4c

Test return type of invalid input

18ea7e9

Fix comment cases

2eb6f7c

Fix terms in comments

ba3ae01

tats-u force-pushed the non-bmp branch from 0275667 to ba3ae01 Compare April 2, 2025 14:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Recognize supplementary (non-BMP) punctuation & symbols#190

Recognize supplementary (non-BMP) punctuation & symbols#190
tats-u wants to merge 5 commits intomicromark:mainfrom
tats-u:non-bmp

tats-u commented Jan 22, 2025

Uh oh!

codecov bot commented Jan 22, 2025 •

edited

Loading

Uh oh!

wooorm commented Jan 22, 2025

Uh oh!

tats-u commented Jan 26, 2025

Uh oh!

wooorm commented Jan 27, 2025

Uh oh!

wooorm commented Jan 27, 2025

Uh oh!

tats-u commented Feb 16, 2025 •

edited

Loading

Uh oh!

wooorm commented Feb 17, 2025

Uh oh!

tats-u commented Feb 17, 2025 •

edited

Loading

Uh oh!

tats-u commented Mar 23, 2025 •

edited

Loading

Uh oh!

tats-u commented Mar 30, 2025

Uh oh!

tats-u commented Apr 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

tats-u commented Jan 22, 2025

Initial checklist

Description of changes

Uh oh!

codecov bot commented Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

wooorm commented Jan 22, 2025

Uh oh!

tats-u commented Jan 26, 2025

Uh oh!

wooorm commented Jan 27, 2025

Uh oh!

wooorm commented Jan 27, 2025

Uh oh!

tats-u commented Feb 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wooorm commented Feb 17, 2025

Uh oh!

tats-u commented Feb 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tats-u commented Mar 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tats-u commented Mar 30, 2025

Uh oh!

tats-u commented Apr 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

codecov bot commented Jan 22, 2025 •

edited

Loading

tats-u commented Feb 16, 2025 •

edited

Loading

tats-u commented Feb 17, 2025 •

edited

Loading

tats-u commented Mar 23, 2025 •

edited

Loading