Thanks to visit codestin.com
Credit goes to github.com

Skip to content

DomCrawler memory leak with filter method #10879

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fliespl opened this issue May 9, 2014 · 5 comments
Closed

DomCrawler memory leak with filter method #10879

fliespl opened this issue May 9, 2014 · 5 comments

Comments

@fliespl
Copy link
Contributor

fliespl commented May 9, 2014

Hi,
recently I have noticed that DomCrawler might leak memory while using filter method (in contrary to filterXpath method).

Here's an example memory usage for both methods (after 100 loops):

Before $crawler->filterXPath("//tr/td"): 9175040
elements:800
After $crawler->filterXPath("//tr/td"): 9175040

Before $crawler->filter("tr > td"): 9175040
elements:800
After $crawler->filter("tr > td"): 12320768

Example code for both methods:

$html = '<tr><td>abc</td><td>dce</td><td>abc</td><td>dce</td><td>abc</td><td>dce</td><td>abc</td><td>dce</td><tr>';

        $elements = 0;

        print 'Before $crawler->filterXPath("//tr/td"): ' . memory_get_usage(true) . "\n";

        for ($i = 0; $i < 100; $i++) {
            $crawler = new Crawler($html);
            $columns = $crawler->filterXPath('//tr/td');

            $elements += sizeof($columns);

            $columns->clear();
            $crawler->clear();
        }

        print "elements:" . $elements . "\n";

        print 'After $crawler->filterXPath("//tr/td"): ' . memory_get_usage(true) . "\n";

        print "\n\n";

        $elements = 0;

        print 'Before $crawler->filter("tr > td"): ' . memory_get_usage(true) . "\n";

        for ($i = 0; $i < 100; $i++) {
            $crawler = new Crawler($html);
            $columns = $crawler->filter('tr > td');

            $elements += sizeof($columns);

            $columns->clear();
            $crawler->clear();
        }

        print "elements:" . $elements . "\n";

        print 'After $crawler->filter("tr > td"): ' . memory_get_usage(true) . "\n";
@stof
Copy link
Member

stof commented May 9, 2014

This is weird, because filter() uses filterXPath() internally.

@stof stof added the DomCrawler label May 9, 2014
@fliespl
Copy link
Contributor Author

fliespl commented May 9, 2014

In this case it would appear that CssSelector component is leaking somewhere inside toXPath method.

@geoffrey-brier
Copy link
Contributor

I have taken a look at this issue. I have added a "removeAll" method on the Translator class which looks like this:

function removeAll()
    {
        unset($this->mainParser);
        unset($this->shortcutParsers);
        unset($this->extensions);
        unset($this->nodeTranslators);
        unset($this->combinationTranslators);
        unset($this->functionTranslators);
        unset($this->pseudoClassTranslators);
        unset($this->attributeMatchingTranslators);
    }

I also updated the toXPath method of the CssSelector class to call this method and I managed to reduce the memory leak quite well but it still leaks. Hope it's gonna help!

NB: I think that @jpauli could be of great help here

@stof
Copy link
Member

stof commented Jun 19, 2014

@geoffrey-brier you are unsetting the properties from the class definitions here, not only unsetting their value. This means that later usages of the class would increase the memory usage (on PHP 5.4+ at least). unset($this->foo) is not equivalent to $this->foo = null.

I think one of the issue is that the CSSSelector object graph contains circular references in many places, thus making the reference counting unable to garbage collect the objects after the CSS is turned into XPath. The garbage collector handling circular references likely runs less often.

@jderusse
Copy link
Member

@geoffrey-brier I implement the same behavior in the pull request #11221, but in my case, it fully fix this leak. (see https://github.com/symfony/symfony/pull/11221/files#diff-116347d1689bd54d19e1f9a6901ef8a7R401)

What do you mean by "the memory leak quite well but it still leaks", do you have some test case ?

fabpot added a commit that referenced this issue Jun 27, 2014
…cular object graph (stof)

This PR was merged into the 2.3 branch.

Discussion
----------

[CssSelector] Refactored the CssSelector to remove the circular object graph

| Q             | A
| ------------- | ---
| Bug fix?      | yes
| New feature?  | no
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | #10879, replaces  #11221
| License       | MIT
| Doc PR        | n/a

This allows the translator and its extensions to be garbage collected based on the refcount rather than requiring the garbage collector run, making it much more likely to happen at the end of the ``CssSelector::toXPath`` call.

Node translators now receive the Translator as second argument, instead of requiring to inject it in the extension to keep a reference to it. This way, the Translator is referenced nowhere inside it, only by the caller, and so will be destructed at the end of the usage (and extensions will then be destructed after it when not used anymore).

Commits
-------

994f81f Refactored the CssSelector to remove the circular object graph
@fabpot fabpot closed this as completed Jun 27, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants