44The DomCrawler Component
55========================
66
7- DomCrawler Component eases DOM navigation for HTML and XML documents.
7+ The DomCrawler Component eases DOM navigation for HTML and XML documents.
88
99Installation
1010------------
2121The :class: `Symfony\\ Component\\ DomCrawler\\ Crawler ` class provides methods
2222to query and manipulate HTML and XML documents.
2323
24- Instance of the Crawler represents a set (:phpclass: `SplObjectStorage `)
25- of :phpclass: `DOMElement ` objects:
26-
27- .. code-block :: php
24+ An instance of the Crawler represents a set (:phpclass: `SplObjectStorage `)
25+ of :phpclass: `DOMElement ` objects, which are basically nodes that you can
26+ traverse easily::
2827
2928 use Symfony\Component\DomCrawler\Crawler;
3029
@@ -43,16 +42,14 @@ of :phpclass:`DOMElement` objects:
4342 print $domElement->nodeName;
4443 }
4544
46- More specialized :class: `Symfony\\ Component\\ DomCrawler\\ Link ` and
45+ Specialized :class: `Symfony\\ Component\\ DomCrawler\\ Link ` and
4746:class: `Symfony\\ Component\\ DomCrawler\\ Form ` classes are useful for
48- interacting with html links and forms.
47+ interacting with html links and forms as you traverse through the HTML tree .
4948
5049Node Filtering
5150~~~~~~~~~~~~~~
5251
53- Using XPath expressions is really simplified:
54-
55- .. code-block :: php
52+ Using XPath expressions is really easy::
5653
5754 $crawler = $crawler->filterXPath('descendant-or-self::body/p');
5855
@@ -61,15 +58,12 @@ Using XPath expressions is really simplified:
6158 :phpmethod: `DOMXPath::query ` is used internally to actually perform
6259 an XPath query.
6360
64- Filtering is even easier if you have CssSelector Component installed:
65-
66- .. code-block :: php
61+ Filtering is even easier if you have the ``CssSelector `` Component installed.
62+ This allows you to use jQuery-like selectors to traverse::
6763
6864 $crawler = $crawler->filter('body > p');
6965
70- Anonymous function can be used to filter with more complex criteria:
71-
72- .. code-block :: php
66+ Anonymous function can be used to filter with more complex criteria::
7367
7468 $crawler = $crawler->filter('body > p')->reduce(function ($node, $i) {
7569 // filter even nodes
@@ -86,35 +80,25 @@ To remove a node the anonymous function must return false.
8680Node Traversing
8781~~~~~~~~~~~~~~~
8882
89- Access node by its position on the list:
90-
91- .. code-block :: php
83+ Access node by its position on the list::
9284
9385 $crawler->filter('body > p')->eq(0);
9486
95- Get the first or last node of the current selection:
96-
97- .. code-block :: php
87+ Get the first or last node of the current selection::
9888
9989 $crawler->filter('body > p')->first();
10090 $crawler->filter('body > p')->last();
10191
102- Get the nodes of the same level as the current selection:
103-
104- .. code-block :: php
92+ Get the nodes of the same level as the current selection::
10593
10694 $crawler->filter('body > p')->siblings();
10795
108- Get the same level nodes after or before the current selection:
109-
110- .. code-block :: php
96+ Get the same level nodes after or before the current selection::
11197
11298 $crawler->filter('body > p')->nextAll();
11399 $crawler->filter('body > p')->previousAll();
114100
115- Get all the child or parent nodes:
116-
117- .. code-block :: php
101+ Get all the child or parent nodes::
118102
119103 $crawler->filter('body')->children();
120104 $crawler->filter('body > p')->parents();
@@ -127,43 +111,35 @@ Get all the child or parent nodes:
127111Accessing Node Values
128112~~~~~~~~~~~~~~~~~~~~~
129113
130- Access the value of the first node of the current selection:
131-
132- .. code-block :: php
114+ Access the value of the first node of the current selection::
133115
134116 $message = $crawler->filterXPath('//body/p')->text();
135117
136- Access the attribute value of the first node of the current selection:
137-
138- .. code-block :: php
118+ Access the attribute value of the first node of the current selection::
139119
140120 $class = $crawler->filterXPath('//body/p')->attr('class');
141121
142- Extract attribute and/or node values from the list of nodes:
143-
144- .. code-block :: php
122+ Extract attribute and/or node values from the list of nodes::
145123
146124 $attributes = $crawler->filterXpath('//body/p')->extract(array('_text', 'class'));
147125
148- .. note :: Special attribute ``_text`` represents a node value.
126+ .. note ::
149127
150- Call an anonymous function on each node of the list:
128+ Special attribute `` _text `` represents a node value.
151129
152- .. code-block :: php
130+ Call an anonymous function on each node of the list::
153131
154132 $nodeValues = $crawler->filter('p')->each(function ($node, $i) {
155133 return $node->nodeValue;
156134 });
157135
158136The anonymous function receives the position and the node as arguments.
159- Result is an array of values returned by anonymous function calls.
137+ The result is an array of values returned by the anonymous function calls.
160138
161139Adding the Content
162140~~~~~~~~~~~~~~~~~~
163141
164- Crawler supports multiple ways of adding the content:
165-
166- .. code-block :: php
142+ The crawler supports multiple ways of adding the content::
167143
168144 $crawler = new Crawler('<html><body /></html>');
169145
@@ -176,7 +152,7 @@ Crawler supports multiple ways of adding the content:
176152 $crawler->add('<html><body /></html>');
177153 $crawler->add('<root><node /></root>');
178154
179- As Crawler's implementation is based on the DOM extension it is also able
155+ As the Crawler's implementation is based on the DOM extension, it is also able
180156to interact with native :phpclass: `DOMDocument `, :phpclass: `DOMNodeList `
181157and :phpclass: `DOMNode ` objects:
182158
@@ -196,10 +172,132 @@ and :phpclass:`DOMNode` objects:
196172 Form and Link support
197173~~~~~~~~~~~~~~~~~~~~~
198174
199- todo:
175+ Special treatment is given to links and forms inside the DOM tree.
176+
177+ Links
178+ .....
179+
180+ To find a link by name (or a clickable image by its ``alt `` attribute), use
181+ the ``selectLink `` method on an existing crawler. This returns a Crawler
182+ instance with just the selected link(s). Calling ``link() `` gives us a special
183+ :class: `Symfony\\ Component\\ DomCrawler\\ Link ` object::
184+
185+ $linksCrawler = $crawler->selectLink('Go elsewhere...');
186+ $link = $linksCrawler->link();
187+
188+ // or do this all at once
189+ $link = $crawler->selectLink('Go elsewhere...')->link();
190+
191+ The :class: `Symfony\\ Component\\ DomCrawler\\ Link ` object has several useful
192+ methods to get more information about the selected link itself::
193+
194+ // return the raw href value
195+ $href = $link->getRawUri();
196+
197+ // return the proper URI that can be used to make another request
198+ $uri = $link->getUri();
199+
200+ The ``getUri() `` is especially useful as it cleans the ``href `` value and
201+ transforms it into how it should really be processed. For example, for a
202+ link with ``href="#foo" ``, this would return the full URI of the current
203+ page suffixed with ``#foo ``. The return from ``getUri() `` is always a full
204+ URI that you can act on.
205+
206+ Forms
207+ .....
208+
209+ Special treatment is also given to forms. A ``selectButton() `` method is
210+ available on the Crawler which returns another Crawler that matches a button
211+ (``input[type=submit] ``, ``input[type=image] ``, or a ``button ``) with the
212+ given text. This method is especially useful because you can use it to return
213+ a :class: `Symfony\\ Component\\ DomCrawler\\ Form ` object that represents the
214+ form that the button lives in::
215+
216+ $form = $crawler->selectButton('validate')->form();
217+
218+ // or "fill" the form fields with data
219+ $form = $crawler->selectButton('validate')->form(array(
220+ 'name' => 'Ryan',
221+ ));
222+
223+ The :class: `Symfony\\ Component\\ DomCrawler\\ Form ` object has lots of very
224+ useful methods for working with forms:
225+
226+ $uri = $form->getUri();
227+
228+ $method = $form->getMethod();
229+
230+ The :method: `Symfony\\ Component\\ DomCrawler\\ Form::getUri ` method does more
231+ than just return the ``action `` attribute of the form. If the form method
232+ is GET, then it mimics the browsers behavior and returns a the ``action ``
233+ attribute followed by a query string of all of the form's values.
234+
235+ You can virtually set and get values on the form
236+
237+ // set values on the form internally
238+ $form->setValues(array(
239+ 'registration[username]' => 'symfonyfan',
240+ 'registration[terms]' => 1,
241+ ));
242+
243+ // get back an array of values - in the "flat" array like above
244+ $values = $form->getValues();
245+
246+ // returns the values like PHP would see them, where "registration" is its own array
247+ $values = $form->getPhpValues();
248+
249+ This is great, but it gets better! The ``Form `` object allows you to interact
250+ with your form like a browser, selecting radio values, ticking checkboxes,
251+ and uploading files::
252+
253+ $form['registration[username]']->setValue('symfonyfan');
254+
255+ // check or uncheck a checkbox
256+ $form['registration[terms]']->tick();
257+ $form['registration[terms]']->untick();
258+
259+ // select an option
260+ $form['registration[birthday][year]']->select(1984);
261+
262+ // select many options from a "multiple" select or checkboxes
263+ $form['registration[interests]']->select(array('symfony', 'cookies'));
264+
265+ // even fake a file upload
266+ $form['registration[photo]']->upload('/path/to/lucas.jpg');
267+
268+ What's the point of doing all of this? If you're testing internally, you
269+ can grab the information off of your form as if it had just been submitted
270+ by using the PHP values::
271+
272+ $values = $form->getPhpValues();
273+ $files = $form->getPhpFiles();
274+
275+ If you're using an external HTTP client, you can use the form to grab all
276+ of the information you need to create a POST request for the form::
277+
278+ $uri = $form->getUri();
279+ $method = $form->getMethod();
280+ $values = $form->getValues();
281+ $files = $form->getFiles();
282+
283+ // now use some HTTP client and post using this information
284+
285+ One great example of an integrated system that uses all of this is `Goutte `_.
286+ Goutte understands the Symfony Crawler object and can use it to submit forms
287+ directly::
288+
289+ use Goutte\Client;
290+
291+ // make a real reqeust to an external site
292+ $client = new Client();
293+ $crawler = $client->request('GET', 'https://github.com/login');
294+
295+ // select the form and fill in some values
296+ $form = $crawler->selectButton('Log in')->form();
297+ $form['login'] = 'symfonyfan';
298+ $form['password'] = 'anypass';
299+
300+ // submit that form
301+ $crawler = $client->submit($form);
200302
201- * selectLink()
202- * selectButton()
203- * link()
204- * links()
205- * form()
303+ .. _`Goutte` : https://github.com/fabpot/goutte
0 commit comments