-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
[DomCrawler] filterXPath limited usage since 2.3.14 #11503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
ping @stof |
The issue isn't limited to the |
…erXPath() (xabbuh) This PR was merged into the 2.3 branch. Discussion ---------- [Component][DomCrawler] fix axes handling in Crawler::filterXPath() | Q | A | ------------- | --- | Bug fix? | yes | New feature? | no | BC breaks? | no | Deprecations? | no | Tests pass? | yes | Fixed tickets | #11503 | License | MIT | Doc PR | Due to some limitations in the ``relativize()`` method, it was not possible to use XPath axes other than ``descendant`` or ``descendant-or-self`` in the ``filterXPath()`` method of the ``Crawler`` class. This commit adds support for the ``ancestor``, ``ancestor-or-self``, ``attribute``, ``child``, ``following``, ``following-sibling``, ``parent``, ``preceding``, ``preceding-sibling`` and ``self`` axes. The only axis missing after this is the ``namespace`` axis. Filtering for namespace nodes returns ``DOMNameSpaceNode`` instances which can't be passed to the ``add()`` method. Commits ------- 8dc322b fix axes handling in Crawler::filterXPath()
@fabpot + @xabbuh Thank you for the fix of the axis. Can someone explain the idea behind the relativize of the XPath expressions? To get more test cases, I extracted the XPath expression from this test suite. I removed XSLT specific XPath expressions. My test case file can be found at this gist. All the XPath expression in 01_xpath_test_suite.txt are valid when used with A good example of the complexity is the XPath expression With the extended test case file it's IMHO possible to fix all broken cases. But I'm not convinced if the complexity to relativize an XPath is worth the benefit. |
@stof Can you explain the reasons that made it necessary to introduce the |
@xabbuh the issue is that
And depending of how you write the XPath, it actually mixes both. For instance, when using a CSS query the previous API was written this way because it was copying nodes to a separate document with a special root tag. So the collection elements were actually children of the root were the XPath was applied. This means that if you query for the The way to avoid the need for relativize would be to change the API of the Crawler to have 2 different methods:
If you look at jQuery, this is the equivalent of |
@stof We could remove the But, the question is then, how you would handle invalid values passed to |
@xabbuh yes. and for other axes, we need to relativize them to match the behavior of the previous implementation. for other axes, we need to turn the query in a query matching nothing, to be BC with the way it worked in previous version (using |
@stof Obviously, I do miss something, but I don't see why the other axes have to be relativized? Do you have an example in mind (we should at least add one or more tests for them to avoid future regressions). |
@xabbuh read again how the behavior of XPath is working. |
This PR was merged into the 2.3 branch. Discussion ---------- [DomCrawler] fix the axes handling in a bc way | Q | A | ------------- | --- | Bug fix? | yes | New feature? | no | BC breaks? | no | Deprecations? | no | Tests pass? | yes | Fixed tickets | #11503 | License | MIT | Doc PR | The previous fix in #11548 for handling XPath axes was not backward compatible. In previous Symfony versions the Crawler handled nodes by holding a "fake root node". This must be taken into account when evaluating (relativizing) XPath expressions. Commits ------- d26040f [DomCrawler] fix the axes handling in a bc way
@stof Thank you for the BC fix. 🎁 |
I used the DomCrawler::filterXPath method to do some filtering. Since this update 80438c2, the filterXPath method can't handle a simple queries like
'child::div'
.The error is the same as in #10986 .
Most of the listed XPath axes have a problem with this change.
The main cause to this problem is, that the method 'relativize' will append 'self::' to every expression, that's not matched in here.
As a result to this, the XPath expression is invalid.
E.g:
Input
relativized Xpath
The W3C spec don't allow Xpath expressions like the relativized one.
The text was updated successfully, but these errors were encountered: