Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[DomCrawler] return empty string on Crawler::text() and Crawler::html() instead of an exception #28581

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

respinoza
Copy link
Contributor

@respinoza respinoza commented Sep 24, 2018

Q A
Branch? master
Bug fix? no
New feature? yes
BC breaks? yes
Deprecations? no
Tests pass? yes
Fixed tickets #28313
License MIT
Doc PR symfony/symfony-docs#...

This changes the behavior when using text and html methods from the Crawler by returning null instead of throwing an exception.

@respinoza
Copy link
Contributor Author

I am not sure if the BC is allowed or if we should add an argument to control the behavior and deprecate (but this kinda defeats this change as goal seems to make this more developer friendly)

@nicolas-grekas nicolas-grekas changed the title [DomCrawler] Return null on text and html methods when empty in… [DomCrawler] Return null on text and html methods when empty instead of an exception Sep 24, 2018
@nicolas-grekas
Copy link
Member

nicolas-grekas commented Sep 24, 2018

We shouldn't break BC, so a constructor argument to enable this behavior would be needed.
Then there should be a similar argument to BrowserKit\Client so that this could be also enabled there. Before doing so, do we all agree returning null is the best for DX and we should move out of the exception?

@nicolas-grekas nicolas-grekas added this to the next milestone Sep 24, 2018
@respinoza respinoza force-pushed the domcrawler-null-on-empty-text-html branch 2 times, most recently from 19c9818 to fc5effd Compare September 24, 2018 13:01
@respinoza
Copy link
Contributor Author

Here is a better proposal @nicolas-grekas to avoid BC as you suggested (way better), not sure about the new argument name.

Now, for the BrowserKit, new PR or two commits same PR ?

@stof
Copy link
Member

stof commented Sep 24, 2018

We still have a bunch of other methods which fail for an empty node list.

And btw, making return types nullable will not save you for doing a check to handle that case. Instead of checking whether the crawler is empty or no, you will have to check whether the value is null.

@stof
Copy link
Member

stof commented Sep 24, 2018

and regarding BC, this is still changing the contract of the class. Some code receiving a Crawler as argument will not control this boolean, and so would have to be updated to support the new signature as well. So this approach is still a BC break.

@respinoza
Copy link
Contributor Author

@stof looking at the original issue, it could also be empty, let's say an empty string, in this way it keeps the signature.
Do you have any suggestions? Or do you propose to drop this change?

@stof
Copy link
Member

stof commented Sep 24, 2018

Well, returning the empty string is possible for these 2 methods. But then, we still cannot remove the exception from other methods (the ones returning an object for instance)

@respinoza respinoza force-pushed the domcrawler-null-on-empty-text-html branch 2 times, most recently from 571d6bc to e0d0842 Compare September 24, 2018 14:41
@respinoza
Copy link
Contributor Author

A test fails but seems to be a travis problem with composer?

Copy link
Member

@nicolas-grekas nicolas-grekas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm fine with this PR (just one minor comment)
->text() & ->html() not throwing is good for DX (but throwing for the other methods is OK also)

@nicolas-grekas
Copy link
Member

Now, for the BrowserKit, new PR or two commits same PR ?

@respinoza how would you leverage this new argument? Do you create your crawlers manually?

@jakzal
Copy link
Contributor

jakzal commented Sep 25, 2018

With text() and html() methods returning an empty string how can I tell the difference between an empty text and element not found?

@respinoza
Copy link
Contributor Author

@jakzal the count() method still return the matches, so 0 for not found.

@respinoza
Copy link
Contributor Author

@nicolas-grekas we will have to pass it to the BrowserKit Client constructor and then pass it around in https://github.com/symfony/browser-kit/blob/68f233bc3a0fb6b64f48ff6318038087b49a1639/Client.php#L516`

Currently there is no way to inject a DomCrawler instance. This can also be solved by improving the DomCrawler/BrowserKit as follows

  • Add method to DomCrawler to create a new instance (similar like the current createSubCrawlermethod)
  • Add setter to pass the DomCrawler to the BrowserKit client
  • Get a fresh instance if we set a custom DomCrawler instance or create a new like we do right now

@nicolas-grekas
Copy link
Member

Currently there is no way to inject a DomCrawler instance

so let's allow it via a new constructor arg to BrowserKit's Client?
should be in the same PR to make it complete (including the DI configuration)

@nicolas-grekas
Copy link
Member

so let's allow it via a new constructor arg to BrowserKit's Client?

hum, looks like a crawler factory would be needed, so now I get your proposal.
I'd still prefer passing this factory as a constructor arg instead of a setter.

@respinoza
Copy link
Contributor Author

@nicolas-grekas having a factory will be way better. I will get on that.

@stof
Copy link
Member

stof commented Sep 25, 2018

I'm not sure passing a Crawler factory just to control this setting (which looks transitional) makes sense.

Injecting a Crawler indeed does not make sense, as the Crawler holds some content, so it does not make sense to inject one that the Client would not know what is already in it.

@respinoza
Copy link
Contributor Author

@stof the current code in BrowserKit creates a new instance or null if the class doesn't exist. The factory will be called instead, behavior is the same (new instance on method call), we will just show to return a configured instance.
Sub crawlers are created as usual.
We won't reuse the crawler.
I don't see a problem

@stof
Copy link
Member

stof commented Sep 25, 2018

Well, the question is whether we need to create a concept of factory when the only meaningful configuration to be customized by the extension point is this boolean.

Currently, this PR makes that setting a permanent option. It does not deprecate the old behavior. Is it intended ? Variable signatures based on an option are harder to use (as any code not creating the crawler could not rely on whether an exception could be thrown or no for empty crawlers, and so should keep handling the case throwing exceptions too, defeating this PR).
If this becomes a deprecated API with a migration path, making people replace the whole instantiation of the crawler is too much:

  • it makes the migration harder as they have to duplicate the core logic in 99% of cases
  • it introduces an extension point that we then have to maintain even if we don't need it for the BC layer. We cannot remove it in 5.0 (as we cannot deprecate it in 4.x if using it is mandatory to avoid another deprecation).

Thus, if the goal is to deprecate the old behavior, I think a constructor argument is the wrong way to opt-in for the new behavior: the code needing to deal with the new behavior (and so knowing whether it can opt-in or no) may not be the same than the code instantiating the crawler (and when using BrowserKit, that would mean the code instantiating the client, to configure it to configure the crawler creation...).

@respinoza
Copy link
Contributor Author

I don't believe we want to get rid of the old behavior so there won't be any deprecation.

I agree with your point on the variable signature. We could always have two new methods and get rid of the parameter.

(No idea how to make those methods)

@stof
Copy link
Member

stof commented Sep 26, 2018

@jakzal creating a decorator will be a pain:

  • adding any method is a BC break, unless we also provide a base class delegating all methods in DomCrawler itself and consider that a decorator must extend our own base class to be subject to BC. Otherwise, the newly introduced method would get called on the parent class rather than on the inner object
  • things become even more complex due to returning sub-crawlers from many methods
  • changing the signature in a decorator class is still breaking the contract for any code consuming a Crawler.

At that point, it is much better to have this text extraction utility in your own code rather than in a crawler decorator.

And regarding the discussion about BC, a configurable signature is a bigger BC break than a variable signature based on a new optional argument. A variable signature creates a BC break for child class (and the migration path is clear, with the possibility to trigger a deprecation). A configurable signature creates a BC break for consuming code (and with no way to warn them) due to the signature change (as they might not be the one controlling the contrustor-based opt-in).

@nicolas-grekas
Copy link
Member

Option 1. is possible, using func_get_arg() to prevent any BC break. We already did so in other places.

@respinoza
Copy link
Contributor Author

I added the argument to the function through func_args as discussed for option one.
Looking at the related issues, I guess it will be a good idea to extend this to the attr() method?
That one already has an argument, should still do a func_args and check for the second one?

Also again not sure about the parameter name (naming things is hard), ideas welcome 😅

@respinoza respinoza changed the title [DomCrawler] Return null on text and html methods when empty instead of an exception [DomCrawler] return empty string on Crawler::text() and Crawler::html() instead of an exception Sep 29, 2018
@ro0NL
Copy link
Contributor

ro0NL commented Sep 29, 2018

@respinoza what about $throw = true :) then we can also apply it on parents() etc. maybe, returning an empty list in case throw=false.

@amer-flix
Copy link

Passing true for every method is also a hectic approach. It would have been better if we could somehow transfer this approach from the Crawler constructor.

@jakzal
Copy link
Contributor

jakzal commented Sep 29, 2018

A default value, rather than a boolean flag, would be more flexible.

@respinoza
Copy link
Contributor Author

I would say no to a default value as we don't want to change the signature (return single), we could check it is a string but in practice is someone going to have another value than empty string?

Maybe the idea of having new methods to handle this might be better.

Any more ideas?

@nicolas-grekas
Copy link
Member

Yes, two :)

  1. name the new argument "$default" and return it if provided (and throw when not provided)
  2. do nothing and keep things as is

@respinoza respinoza force-pushed the domcrawler-null-on-empty-text-html branch from b3505ef to 3c399cf Compare December 16, 2018 10:57
@respinoza
Copy link
Contributor Author

@nicolas-grekas sorry for the delay, changes have been done and hopefully it can be included for 4.3

…`html()` instead of throwing an exception when node is empty.
@nicolas-grekas nicolas-grekas force-pushed the domcrawler-null-on-empty-text-html branch from 3c399cf to 6a4ce38 Compare December 17, 2018 14:57
@nicolas-grekas
Copy link
Member

Thank you @respinoza.

@nicolas-grekas nicolas-grekas merged commit 6a4ce38 into symfony:master Dec 17, 2018
nicolas-grekas added a commit that referenced this pull request Dec 17, 2018
…and `Crawler::html()` instead of an exception (respinoza)

This PR was merged into the 4.3-dev branch.

Discussion
----------

[DomCrawler] return empty string on `Crawler::text()` and `Crawler::html()` instead of an exception

…stead of an exception

| Q             | A
| ------------- | ---
| Branch?       | master
| Bug fix?      | no
| New feature?  | yes
| BC breaks?    | yes
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | #28313
| License       | MIT
| Doc PR        | symfony/symfony-docs#... <!-- required for new features -->

This changes the behavior when using text and html methods from the Crawler by returning null instead of throwing an exception.

Commits
-------

6a4ce38 [DomCrawler] Added ability to return a default value in `text()` and `html()` instead of throwing an exception when node is empty.
javiereguiluz added a commit to symfony/symfony-docs that referenced this pull request Jan 22, 2019
…ereguiluz)

This PR was merged into the master branch.

Discussion
----------

Documented the default values of text() and html()

Documents symfony/symfony#28581

See also https://symfony.com/blog/new-in-symfony-4-3-domcrawler-improvements

Commits
-------

851956d Documented the default values of text() and html()
@nicolas-grekas nicolas-grekas modified the milestones: next, 4.3 Apr 30, 2019
@fabpot fabpot mentioned this pull request May 9, 2019
@elgandoz
Copy link

elgandoz commented Nov 26, 2023

I'm following after #28614, in the end this has not been applied to attr() which still throws the exception.
Apologies, It was not available in my local version, but it has been added in 7.0.0-RC1
94af00e

@stloyd

This comment was marked as outdated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants