-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
[DomCrawler] return empty string on Crawler::text()
and Crawler::html()
instead of an exception
#28581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DomCrawler] return empty string on Crawler::text()
and Crawler::html()
instead of an exception
#28581
Conversation
I am not sure if the BC is allowed or if we should add an argument to control the behavior and deprecate (but this kinda defeats this change as goal seems to make this more developer friendly) |
null
on text
and html
methods when empty in…null
on text
and html
methods when empty instead of an exception
We shouldn't break BC, so a constructor argument to enable this behavior would be needed. |
19c9818
to
fc5effd
Compare
Here is a better proposal @nicolas-grekas to avoid BC as you suggested (way better), not sure about the new argument name. Now, for the |
We still have a bunch of other methods which fail for an empty node list. And btw, making return types nullable will not save you for doing a check to handle that case. Instead of checking whether the crawler is empty or no, you will have to check whether the value is |
and regarding BC, this is still changing the contract of the class. Some code receiving a Crawler as argument will not control this boolean, and so would have to be updated to support the new signature as well. So this approach is still a BC break. |
@stof looking at the original issue, it could also be empty, let's say an empty string, in this way it keeps the signature. |
Well, returning the empty string is possible for these 2 methods. But then, we still cannot remove the exception from other methods (the ones returning an object for instance) |
571d6bc
to
e0d0842
Compare
A test fails but seems to be a travis problem with composer? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm fine with this PR (just one minor comment)
->text() & ->html() not throwing is good for DX (but throwing for the other methods is OK also)
@respinoza how would you leverage this new argument? Do you create your crawlers manually? |
With |
@jakzal the |
@nicolas-grekas we will have to pass it to the BrowserKit Client constructor and then pass it around in https://github.com/symfony/browser-kit/blob/68f233bc3a0fb6b64f48ff6318038087b49a1639/Client.php#L516` Currently there is no way to inject a DomCrawler instance. This can also be solved by improving the DomCrawler/BrowserKit as follows
|
so let's allow it via a new constructor arg to BrowserKit's Client? |
hum, looks like a crawler factory would be needed, so now I get your proposal. |
@nicolas-grekas having a factory will be way better. I will get on that. |
I'm not sure passing a Crawler factory just to control this setting (which looks transitional) makes sense. Injecting a Crawler indeed does not make sense, as the Crawler holds some content, so it does not make sense to inject one that the Client would not know what is already in it. |
@stof the current code in BrowserKit creates a new instance or null if the class doesn't exist. The factory will be called instead, behavior is the same (new instance on method call), we will just show to return a configured instance. |
Well, the question is whether we need to create a concept of factory when the only meaningful configuration to be customized by the extension point is this boolean. Currently, this PR makes that setting a permanent option. It does not deprecate the old behavior. Is it intended ? Variable signatures based on an option are harder to use (as any code not creating the crawler could not rely on whether an exception could be thrown or no for empty crawlers, and so should keep handling the case throwing exceptions too, defeating this PR).
Thus, if the goal is to deprecate the old behavior, I think a constructor argument is the wrong way to opt-in for the new behavior: the code needing to deal with the new behavior (and so knowing whether it can opt-in or no) may not be the same than the code instantiating the crawler (and when using BrowserKit, that would mean the code instantiating the client, to configure it to configure the crawler creation...). |
I don't believe we want to get rid of the old behavior so there won't be any deprecation. I agree with your point on the variable signature. We could always have two new methods and get rid of the parameter. (No idea how to make those methods) |
@jakzal creating a decorator will be a pain:
At that point, it is much better to have this text extraction utility in your own code rather than in a crawler decorator. And regarding the discussion about BC, a configurable signature is a bigger BC break than a variable signature based on a new optional argument. A variable signature creates a BC break for child class (and the migration path is clear, with the possibility to trigger a deprecation). A configurable signature creates a BC break for consuming code (and with no way to warn them) due to the signature change (as they might not be the one controlling the contrustor-based opt-in). |
Option 1. is possible, using func_get_arg() to prevent any BC break. We already did so in other places. |
a6858eb
to
b3505ef
Compare
I added the argument to the function through Also again not sure about the parameter name (naming things is hard), ideas welcome 😅 |
null
on text
and html
methods when empty instead of an exceptionCrawler::text()
and Crawler::html()
instead of an exception
@respinoza what about |
Passing true for every method is also a hectic approach. It would have been better if we could somehow transfer this approach from the Crawler constructor. |
A default value, rather than a boolean flag, would be more flexible. |
I would say no to a default value as we don't want to change the signature (return single), we could check it is a string but in practice is someone going to have another value than empty string? Maybe the idea of having new methods to handle this might be better. Any more ideas? |
Yes, two :)
|
b3505ef
to
3c399cf
Compare
@nicolas-grekas sorry for the delay, changes have been done and hopefully it can be included for 4.3 |
…`html()` instead of throwing an exception when node is empty.
3c399cf
to
6a4ce38
Compare
Thank you @respinoza. |
…and `Crawler::html()` instead of an exception (respinoza) This PR was merged into the 4.3-dev branch. Discussion ---------- [DomCrawler] return empty string on `Crawler::text()` and `Crawler::html()` instead of an exception …stead of an exception | Q | A | ------------- | --- | Branch? | master | Bug fix? | no | New feature? | yes | BC breaks? | yes | Deprecations? | no | Tests pass? | yes | Fixed tickets | #28313 | License | MIT | Doc PR | symfony/symfony-docs#... <!-- required for new features --> This changes the behavior when using text and html methods from the Crawler by returning null instead of throwing an exception. Commits ------- 6a4ce38 [DomCrawler] Added ability to return a default value in `text()` and `html()` instead of throwing an exception when node is empty.
…ereguiluz) This PR was merged into the master branch. Discussion ---------- Documented the default values of text() and html() Documents symfony/symfony#28581 See also https://symfony.com/blog/new-in-symfony-4-3-domcrawler-improvements Commits ------- 851956d Documented the default values of text() and html()
This changes the behavior when using text and html methods from the Crawler by returning null instead of throwing an exception.