Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Symfony DomCrawler doesn't allow get html5Parser's errors #42255

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Kvizer opened this issue Jul 26, 2021 · 1 comment
Closed

Symfony DomCrawler doesn't allow get html5Parser's errors #42255

Kvizer opened this issue Jul 26, 2021 · 1 comment

Comments

@Kvizer
Copy link

Kvizer commented Jul 26, 2021

Symfony version(s) affected:
symfony/dom-crawler 5.2.10

Description
<meta http-equiv="Content-Type" content="text/html; charset=unicode">
Parsing an HTML document with Crawler and meta http-equiv mentioned above, then retrieving the result, and all nodes are empty although *Symfony\Component\DomCrawler\Crawler*html5Parser has errors - Line 0, Col 0: Unexpected text. Ignoring... no chance to get this error, crawler doesn't allow us to get error

How to reproduce

use Symfony\Component\DomCrawler\Crawler;

$html = <<<'HTML'
<!doctype html>
<html>
 <head> 
  <meta http-equiv="Content-Type" content="text/html; charset=unicode"> 
  <meta name="ProgId" content="Word.Document"> 
  <meta name="Generator" content="Microsoft Word 14"> 
  <meta name="Originator" content="Microsoft Word 14"> 
 </head> 
  <body>
        <p class="message">Hello World!</p>
        <p>Hello Crawler!</p>
  </body>
</html>
HTML;

$crawler = new Crawler($html);

$htmlResult = $crawler->outerHtml();

Possible Solution

add getter to $html5Parser or make it protected

Additional context
image_2021_07_26T09_37_36_181Z

@carsonbot
Copy link

Hey, thanks for your report!
There has not been a lot of activity here for a while. Is this bug still relevant? Have you managed to find a workaround?

nicolas-grekas added a commit that referenced this issue Jan 31, 2022
This PR was merged into the 4.4 branch.

Discussion
----------

[DomCrawler] ignore bad charsets

| Q             | A
| ------------- | ---
| Branch?       | 4.4
| Bug fix?      | yes
| New feature?  | no
| Deprecations? | no
| Tickets       | Fix #42255
| License       | MIT
| Doc PR        | -

Commits
-------

7802c1f [DomCrawler] ignore bad charsets
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants