Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[DomCrawler] Terrible perfomance filterXpath - new Crawler #39067

Closed
@benito103e

Description

@benito103e

Symfony version(s) affected: 3.4 // 4.4

Description
I have terrible performance issue when I follow documentation to parse quiet large XML (3Mo) with nested query.

How to reproduce

$crawler = new Crawler();
$crawler->addXmlContent(file_get_contents(__DIR__.'/rfc1767_33000.xml'));

foreach ($crawler->filterXPath('//transaction') as $nodeTransaction) {
    $crawlerTransaction = new Crawler($nodeTransaction);
    $actionCode = $crawlerTransaction->filterXPath('transaction/documentCommand/documentCommandHeader/@type')->text();
    $transactionId = $crawlerTransaction->filterXPath('transaction/transactionIdentification/entityIdentification')->text();

    foreach ($crawlerTransaction->filterXPath('//catalogue_item_notification:catalogueItemNotification') as $nodeItem) {
        $crawlerItem = new Crawler($nodeItem);
        $documentId = $crawlerItem->filterXPath('//catalogueItemNotificationIdentification/entityIdentification')->text();
        $recipientGln = $crawlerItem->filterXpath('//catalogue_item_notification:catalogueItemNotification/catalogueItem/dataRecipient')->text();
        $highestLevelGtin = $crawlerItem->filterXPath('//catalogue_item_notification:catalogueItemNotification/catalogueItem/tradeItem/gtin')->text();
        $targetMarket = $crawlerItem->filterXPath('//catalogue_item_notification:catalogueItemNotification/catalogueItem/tradeItem/targetMarket/targetMarketCountryCode')->text();
        $documentStatusCode = strtolower($crawlerItem->filterXPath('//catalogue_item_notification:catalogueItemNotification/documentStatusCode')->text());

        foreach ($crawlerItem->filterXPath('catalogue_item_notification:catalogueItemNotification//catalogueItem') as $nodeCatalogueItem) {
            $crawlerCatalogueItem = new Crawler($nodeCatalogueItem);
            $gtin = $crawlerCatalogueItem->filterXPath('//catalogueItem/tradeItem/gtin')->text();
            dump($gtin);
        }
    }
}

=> Memory used 1.2Go 30sec for execution

Only With Php \Dom

$dom = new \DOMDocument('1.0');
$dom->validateOnParse = true;
@$dom->loadXML(file_get_contents(__DIR__.'/rfc1767_33000.xml'), \LIBXML_NONET);
$xpath = new \DOMXpath($dom);

foreach ($xpath->query('transaction') as $nodeTransaction) {
    $actionCode = $xpath->query('documentCommand/documentCommandHeader/@type', $nodeTransaction)->item(0)->nodeValue;
    $transactionId = $xpath->query('transactionIdentification/entityIdentification', $nodeTransaction)->item(0)->nodeValue;

    /** @var \DOMNode $nodeItem */
    foreach ($xpath->query('.//catalogue_item_notification:catalogueItemNotification', $nodeTransaction) as $nodeItem) {
        $documentId = $xpath->query('//catalogueItemNotificationIdentification/entityIdentification', $nodeItem)->item(0)->nodeValue;
        $recipientGln = $xpath->query('//catalogue_item_notification:catalogueItemNotification/catalogueItem/dataRecipient', $nodeItem)->item(0)->nodeValue;
        $highestLevelGtin = $xpath->query('//catalogue_item_notification:catalogueItemNotification/catalogueItem/tradeItem/gtin', $nodeItem)->item(0)->nodeValue;
        $targetMarket = $xpath->query('//catalogue_item_notification:catalogueItemNotification/catalogueItem/tradeItem/targetMarket/targetMarketCountryCode', $nodeItem)->item(0)->nodeValue;
        $documentStatusCode = $xpath->query('//catalogue_item_notification:catalogueItemNotification/documentStatusCode', $nodeItem)->item(0)->nodeValue;

        /** @var \DOMNode $nodeCatalogueItem */
        foreach ($xpath->query('.//catalogueItem', $nodeItem) as $nodeCatalogueItem) {
            $gtin = $xpath->query('tradeItem/gtin', $nodeCatalogueItem)->item(0)->nodeValue;
            dump($gtin);
        }
    }
}

=> Memory used 20Mo, 500ms for execution

Possible Solution
I don't know where is the memory leak. It seems to be Crawler initializations.

Additional context
Did I miss something when using Crawler ?

Thanks in advance

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions