Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Fixed non utf-8 strings recognition #7442

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Fixed non utf-8 strings recognition #7442

wants to merge 1 commit into from

Conversation

dinitrol
Copy link
Contributor

No description provided.

@pborreli
Copy link
Contributor

this has already been fixed by #7392 now the tests are failing, no surprise, what is your failing use case ?

@jfsimon
Copy link
Contributor

jfsimon commented Mar 22, 2013

I made the following tests:

php > var_dump(preg_match('#[^\p{L}\p{N} ]#u', "\x7F\xFF"));
bool(false)
php > var_dump(preg_match('#[^\p{L}\p{N} ]#u', "abc\x7F\xFF"));
bool(false)
php > var_dump(preg_match('#[^\p{L}\p{N} ]#u', "abc"));
int(0)

This cant work. What was your initial goal?

@mvrhov
Copy link

mvrhov commented Mar 22, 2013

afaik \x7F\xFF is not a valid utf8 sequence. Let me reiterate. Neither \x7f or \xff are valid. The max hex value of a 1st utf-8 byte is 0xDF (0b110xxxxx) and of the second 0xBF(0b10xxxxxx)

@dinitrol
Copy link
Contributor Author

Sorry it is wrong PR. Please delete it.

@dinitrol dinitrol closed this Mar 22, 2013
@jfsimon
Copy link
Contributor

jfsimon commented Mar 22, 2013

@mvrhov this is the point, test valid & invalid utf-8 sequences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants