Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[DomCrawler] Terrible perfomance #12298

Closed
@icaine

Description

@icaine

When trying to parse info from some website i discovered that DomCrawler is terribly slow and unoptimized. With a few little changes i improved performance more than 150+ times (11500 ms -> 65 ms).

The main culprit is
https://github.com/symfony/symfony/blob/master/src/Symfony/Component/DomCrawler/Crawler.php#L862
because creation of DOMXPath is damn expensive.

I quick fixed it by (i know its very very dirty)

    private function createDOMXPath(\DOMDocument $document, array $prefixes = [])
    {
        static $domxpath;
        if (empty($domxpath)) {
            $domxpath = new \DOMXPath($document);
            foreach ($prefixes as $prefix) {
                $namespace = $this->discoverNamespace($domxpath, $prefix);
                if (null !== $namespace) {
                    $domxpath->registerNamespace($prefix, $namespace);
                }
            }
        }

        return $domxpath;
    }

Another performance improvements (cca 200 ms in my environment) can be gained if results from CssSelector::toXPath($selector) at
https://github.com/symfony/symfony/blob/master/src/Symfony/Component/DomCrawler/Crawler.php#L675
are cached.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions