-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
[DomCrawler] Abstract URI logic and crawl images #17585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -1,6 +1,12 @@ | |||
CHANGELOG | |||
========= | |||
|
|||
2.7.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be 3.1.0 :)
About the tests, the only error I can see is about the framework bundle, any idea what I can do? |
👍 |
{ | ||
public function __construct(\DOMElement $node, $currentUri) | ||
{ | ||
parent::__construct($node, $currentUri); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this constructor overriden?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakzal the goal is to forbid passing the third argument. An Image is always a GET request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@valeriangalliat yes, I think it would be more clear. I'm not gonna block this PR if you don't do this though ;)
@jakzal I got two other comments notification that you seem to have removed, I assume everything's okay? It was about existing code that I just moved around but still, if we can improve stuff here, I can update the PR. :) |
@valeriangalliat yeah, sorry for the noise. I noticed the code was moved and removed those comments. |
throw new \InvalidArgumentException('The current node list is empty.'); | ||
} | ||
|
||
$node = $this->getNode(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make sure $node
is an instance of \DOMElement
, just like we do in the link()
method.
Final remark: why do we need the |
Indeed, I would remove the interface. |
protected $currentUri; | ||
|
||
/** | ||
* Constructor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Useless comment IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same opinion, but I kept consistency with
* Constructor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be removed safely :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fabpot sure, I see it's present also in Crawler.php
, Form.php
and Field/FormField.php
, should I remove it there too? Or I just edit AbstractUriElement.php
and Crawler.php
that are the only ones I changed in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could remove them in the existing class, but not in this PR. It should be done in older branches too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just remove them in your own code for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done!
@jakzal @fabpot Not sure why I added the interface, I just removed it. Also updated the PR according to comments. Just the |
All the URI parsing logic is externalized in the AbstractUriElement class. This class have two abstract methods: * setNode: validate the DOMElement node according to the concrete class rules, and set $this->node. * getRawUri: get the raw URI from $this->node. The Link classs now extends AbstractUriElement. This refactor is desirable for #12429.
A new Image class is added, extending AbstractUriElement, to leverage URI methods for the HTML img src attribute. Two methods are added to the Crawler class, image and images, that are the equivalent of link and links for images.
Thanks @valeriangalliat for working on this feature, this is much appreciated. |
…e a merge (jakzal) This PR was merged into the 3.1-dev branch. Discussion ---------- [Crawler] Remove a mention of an interface removed before a merge | Q | A | ------------- | --- | Bug fix? | no | New feature? | no | BC breaks? | no | Deprecations? | no | Tests pass? | yes | Fixed tickets | - | License | MIT | Doc PR | - Missed it while merging #17585. Commits ------- 79a6a27 [Crawler] Remove a mention of an interface removed before a merge
This is a backward-compatible version of #13620, and a rebase of #13649 on current
master
.