Thanks to visit codestin.com
Credit goes to github.com

Skip to content

MBIStaffing/Data-Importer

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple PHP Web Scraper

screen scraping and web crawling library for PHP.

Requirements

Goutte depends on PHP 5.4+ and Guzzle 4+.

Tip

If you need support for PHP 5.3 or Guzzle 3, use Goutte 1.0.6.

Installation

Add fabpot/goutte as a require dependency in your composer.json file:

php composer.phar require fabpot/goutte:~2.0

Tip

You can also download the Goutte.phar file:

require_once '/path/to/goutte.phar';

Usage

Create a Goutte Client instance (which extends Symfony\Component\BrowserKit\Client):

use Goutte\Client;

$client = new Client();

Make requests with the request() method:

// Go to the symfony.com website
$crawler = $client->request('GET', 'http://www.symfony.com/blog/');

The method returns a Crawler object (Symfony\Component\DomCrawler\Crawler).

Click on links:

// Click on the "Security Advisories" link
$link = $crawler->selectLink('Security Advisories')->link();
$crawler = $client->click($link);

Extract data:

// Get the latest post in this category and display the titles
$crawler->filter('h2.post > a')->each(function ($node) {
    print $node->text()."\n";
});

Submit forms:

$crawler = $client->request('GET', 'http://github.com/');
$crawler = $client->click($crawler->selectLink('Sign in')->link());
$form = $crawler->selectButton('Sign in')->form();
$crawler = $client->submit($form, array('login' => 'fabpot', 'password' => 'xxxxxx'));
$crawler->filter('.flash-error')->each(function ($node) {
    print $node->text()."\n";
});

More Information

Read the documentation of the BrowserKit and DomCrawler Symfony Components for more information about what you can do with Goutte.

Technical Information

Goutte is a thin wrapper around the following fine PHP libraries:

  • Symfony Components: BrowserKit, ClassLoader, CssSelector, DomCrawler, Finder, and Process;
  • Guzzle HTTP Component.

License

Goutte is licensed under the MIT license.

About

PHP Web Scraper

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • PHP 100.0%