Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Umlauts are incorrectly separated by the OutputFormatter when creating a table for the console #42034

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Moskito89 opened this issue Jul 8, 2021 · 14 comments · Fixed by #51223

Comments

@Moskito89
Copy link

Moskito89 commented Jul 8, 2021

Symfony version(s) affected: 5.3

Description
I noticed that umlauts lead to errors when creating a table for the Symfony console. If a row breaks into several lines and the break occurs at an umlaut, it will be destroyed. The table is then too short by a few characters.

How to reproduce

$table = new \Symfony\Component\Console\Helper\Table($output);
$table
    ->setColumnMaxWidth(1, 10)
    ->setHeaders(['ISBN', 'Title'])
    ->setRows([
        ['99921-58-10-7', 'A really long title that could need multiple lines'],
        new \Symfony\Component\Console\Helper\TableSeparator(),
        ['99921-58-10-7', 'Â rèälly löng tîtlè thät cöüld nèêd múltîplê línès']
    ])
;
$table->render();

A sample code with a command class and multiple tests can be found here: https://gist.github.com/Moskito89/7774c29805248a2d2568a01997ad1bfb

Possible Solution
The problem is caused in the Symfony\Component\Console\Formatter\OutputFormatter class. In the method applyCurrentStyle to be precise. The problem is that preg_replace on line 259 doesn't work with UTF8. (https://github.com/symfony/console/blob/d927f5564049730e2589d4d53c403ede528d6967/Formatter/OutputFormatter.php#L259)
This can be solved by decoding $text and $prefix beforehand with utf8_decode and later encoding them again with utf8_encode.

Please tell me if you can confirm that behaviour. Also please let me know if you want me to change that and create a PR to solve it. Thanks! 👍

@stof
Copy link
Member

stof commented Jul 9, 2021

This can be solved by decoding $text and $prefix beforehand with utf8_decode and later encoding them again with utf8_encode.

this solution is likely wrong, because utf8_decode and utf8_encode are quite confusing (and converting UTF-8 to ISO-8859-1 cannot be done for all inputs as ISO-8859-1 cannot represent all of Unicode)

@nicolas-grekas
Copy link
Member

Indeed, this preg_replace is wrong. It not only doesn't handle utf8, but it doesn't handle character width also.
@Moskito89 would you like give this issue a try?

@Moskito89
Copy link
Author

Thanks for pointing this out @stof, I actually overlooked the fact that not all characters can be converted. So it seems like preg_replace needs to be replaced.
Yes @nicolas-grekas, you can assign the issue to me. I will look for a solution.

@Moskito89
Copy link
Author

I have now looked into the problem and found two possible solutions.

The simplest approach would first be to replace preg_replace with mb_ereg_replace_callback. It can process UTF-8 and otherwise works almost identically. So instead of writing:

$text = $prefix.preg_replace('~([^\\n]{'.$width.'})\\ *~', "\$1\n", $text);

Could be there:

$text = mb_ereg_replace_callback(
    '([^\\n]{'.$width.'})\\ *', 
    function ($matches) {
        $line = rtrim($matches[0]);
        return $line.PHP_EOL;
    }, 
    $text
);

The problem is that mb_ereg_replace_callback is very slow: in my tests it was ten times slower than preg_replace, which I find very unsatisfactory. In addition, mb_ereg_replace_callback requires, as far as I know, ext-mbstring.

That's why I wrote another piece of code where I changed the functionality completely so that the breaks are created manually. The commit with the changes and the second approach can be found here: BitAndBlack@63ac82d

I have added some tests, the previous and the new ones work. The process is also slower than the original preg_replace, but not as much as mb_ereg_replace_callback: in my test (with 1 million iterations) it was about three to four times slower, which I would still consider acceptable.

Please write me your thoughts and let me know how we can move on. Thanks!

@stof
Copy link
Member

stof commented Jul 15, 2021

Well, if we want to work with UTF-8, using preg_replace with a regexp using the u modifier might work. But this will then cause issues if the string is not UTF-8

@Moskito89
Copy link
Author

Yes that's true, unfortunately. Which encodings do we have to deal with? Would you tell me some characters or strings for testing?

@carsonbot
Copy link

Hey, thanks for your report!
There has not been a lot of activity here for a while. Is this bug still relevant? Have you managed to find a workaround?

@Moskito89
Copy link
Author

This bug still exists. I'm currently waiting for @stof's reply.

@carsonbot carsonbot removed the Stalled label Jan 24, 2022
@stof
Copy link
Member

stof commented Jan 24, 2022

Well, we don't know which encodings are used by apps using the component. But I suspect that windows-1252 might be used for instance, as that's the encoding used by cmd.exe by default for its output

@Moskito89
Copy link
Author

I'm sorry, but I can't solve that — I know too little about character encoding...

@carsonbot
Copy link

Hey, thanks for your report!
There has not been a lot of activity here for a while. Is this bug still relevant? Have you managed to find a workaround?

@Moskito89
Copy link
Author

Bug still exists.

@carsonbot carsonbot removed the Stalled label Oct 30, 2022
@carsonbot
Copy link

Hey, thanks for your report!
There has not been a lot of activity here for a while. Is this bug still relevant? Have you managed to find a workaround?

@Moskito89
Copy link
Author

Bug still exists.

@carsonbot carsonbot removed the Stalled label May 3, 2023
nicolas-grekas added a commit that referenced this issue Aug 14, 2023
This PR was merged into the 5.4 branch.

Discussion
----------

[Console] Fix linewraps in `OutputFormatter`

| Q             | A
| ------------- | ---
| Branch?       | 5.4
| Bug fix?      | yes
| New feature?  | no
| Deprecations? | no
| Tickets       | Fix #42034
| License       | MIT
| Doc PR        | n/a

Fix output for tables with linebreaks and special chars;
```php
$table = new \Symfony\Component\Console\Helper\Table($output);
$table
    ->setColumnMaxWidth(1, 10)
    ->setHeaders(['ISBN', 'Title'])
    ->setRows([
        ['99921-58-10-7', 'A really long title that could need multiple lines'],
        new \Symfony\Component\Console\Helper\TableSeparator(),
        ['99921-58-10-7', 'Â rèälly löng tîtlè thät cöüld nèêd múltîplê línès']
    ])
;
$table->render();
```

**Before**
```
+---------------+------------+
| ISBN          | Title      |
+---------------+------------+
| 99921-58-10-7 | A really l |
|               | ong title  |
|               | that could |
|               | need multi |
|               | ple lines  |
+---------------+------------+
| 99921-58-10-7 | Â rèäll    |
|               | y löng t |
|               | tlè thä |
|               | t cöüld    |
|               | nèêd mú    |
|               | ltîplê l   |
|               | ínès       |
+---------------+------------+
```

**After**
```
+---------------+------------+
| ISBN          | Title      |
+---------------+------------+
| 99921-58-10-7 | A really   |
|               | long title |
|               | that could |
|               | need       |
|               | multiple   |
|               | lines      |
+---------------+------------+
| 99921-58-10-7 | Â rèälly   |
|               | löng tîtlè |
|               | thät cöüld |
|               | nèêd       |
|               | múltîplê   |
|               | línès      |
+---------------+------------+

```

Commits
-------

fcf86b3 [Console] Fix linewraps in OutputFormatter
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants