Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[DomCrawler] Abstract URI logic and crawl images #17585

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

[DomCrawler] Abstract URI logic and crawl images #17585

wants to merge 2 commits into from

Conversation

valeriangalliat
Copy link
Contributor

Q A
Bug fix? no
New feature? yes
BC breaks? no
Deprecations? no
Tests pass? yes
Fixed tickets #12429
License MIT
Doc PR symfony/symfony-docs#4971

This is a backward-compatible version of #13620, and a rebase of #13649 on current master.

@@ -1,6 +1,12 @@
CHANGELOG
=========

2.7.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be 3.1.0 :)

@valeriangalliat
Copy link
Contributor Author

About the tests, the only error I can see is about the framework bundle, any idea what I can do?

@fabpot
Copy link
Member

fabpot commented Jan 29, 2016

👍

{
public function __construct(\DOMElement $node, $currentUri)
{
parent::__construct($node, $currentUri);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this constructor overriden?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakzal the goal is to forbid passing the third argument. An Image is always a GET request

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakzal, @stof, I can explicitly pass GET as third argument to make it more obvious maybe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@valeriangalliat yes, I think it would be more clear. I'm not gonna block this PR if you don't do this though ;)

@valeriangalliat
Copy link
Contributor Author

@jakzal I got two other comments notification that you seem to have removed, I assume everything's okay? It was about existing code that I just moved around but still, if we can improve stuff here, I can update the PR. :)

@jakzal
Copy link
Contributor

jakzal commented Feb 2, 2016

@valeriangalliat yeah, sorry for the noise. I noticed the code was moved and removed those comments.

throw new \InvalidArgumentException('The current node list is empty.');
}

$node = $this->getNode(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make sure $node is an instance of \DOMElement, just like we do in the link() method.

@jakzal
Copy link
Contributor

jakzal commented Feb 3, 2016

Final remark: why do we need the UriElementInterface?

@fabpot
Copy link
Member

fabpot commented Feb 3, 2016

Indeed, I would remove the interface.

protected $currentUri;

/**
* Constructor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useless comment IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same opinion, but I kept consistency with

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be removed safely :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabpot sure, I see it's present also in Crawler.php, Form.php and Field/FormField.php, should I remove it there too? Or I just edit AbstractUriElement.php and Crawler.php that are the only ones I changed in this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could remove them in the existing class, but not in this PR. It should be done in older branches too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just remove them in your own code for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done!

@valeriangalliat
Copy link
Contributor Author

@jakzal @fabpot Not sure why I added the interface, I just removed it.

Also updated the PR according to comments. Just the Constructor. comment to be removed, but since it's present in files not affected by this PR, I aggree with @stof that this should be done in another PR.

All the URI parsing logic is externalized in the AbstractUriElement
class. This class have two abstract methods:

* setNode: validate the DOMElement node according to the concrete class
  rules, and set $this->node.
* getRawUri: get the raw URI from $this->node.

The Link classs now extends AbstractUriElement.

This refactor is desirable for #12429.
A new Image class is added, extending AbstractUriElement, to leverage
URI methods for the HTML img src attribute.

Two methods are added to the Crawler class, image and images, that are
the equivalent of link and links for images.
@jakzal
Copy link
Contributor

jakzal commented Feb 4, 2016

Thanks @valeriangalliat for working on this feature, this is much appreciated.

@jakzal jakzal closed this in 1183aca Feb 4, 2016
fabpot added a commit that referenced this pull request Feb 4, 2016
…e a merge (jakzal)

This PR was merged into the 3.1-dev branch.

Discussion
----------

[Crawler] Remove a mention of an interface removed before a merge

| Q             | A
| ------------- | ---
| Bug fix?      | no
| New feature?  | no
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | -
| License       | MIT
| Doc PR        | -

Missed it while merging #17585.

Commits
-------

79a6a27 [Crawler] Remove a mention of an interface removed before a merge
@valeriangalliat valeriangalliat deleted the dom-crawler-image branch February 4, 2016 14:09
@fabpot fabpot mentioned this pull request May 13, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants