Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbage collector not cleaning roots / leaking memory #9239

Closed
olsavmic opened this issue Aug 3, 2022 · 5 comments
Closed

Garbage collector not cleaning roots / leaking memory #9239

olsavmic opened this issue Aug 3, 2022 · 5 comments

Comments

@olsavmic
Copy link
Contributor

olsavmic commented Aug 3, 2022

Description

The following code:

<?php declare(strict_types = 1);

gc_enable();

class Bar
{
    public ?Foo $foo = null;
}

class Foo
{
    public Bar $bar;
}

function ff()
{
    $arr = [];
    for ($i = 0; $i < 10000000; $i++) {
        $b = new Bar();
        $a = new Foo();
        $a->bar = $b;
        $b->foo = $a;

        $arr[] = $a;
    }
}

ff();

function memory_usage_mb(): float
{
    return memory_get_usage() / 1024 / 1024;
}

echo "GC Status after first run:\n";
var_dump(gc_status());
echo 'Memory usage: ' . memory_usage_mb() . "MB\n";

ff();

echo "GC Status after second run:\n";
var_dump(gc_status());
echo 'Memory usage: ' . memory_usage_mb() . "MB\n";

gc_collect_cycles();
echo "GC Status after gc_collect_cycles:\n";
var_dump(gc_status());
echo 'Memory usage: ' . memory_usage_mb() . "MB\n";

ff();
echo "GC Status after third run:\n";
var_dump(gc_status());
echo 'Memory usage: ' . memory_usage_mb() . "MB\n";

Resulted in this output:

GC Status after first run:
array(4) {
  ["runs"]=>
  int(81)
  ["collected"]=>
  int(19580004)
  ["threshold"]=>
  int(440001)
  ["roots"]=>
  int(209999)
}
Memory usage: 278.76216888428MB
GC Status after second run:
array(4) {
  ["runs"]=>
  int(129)
  ["collected"]=>
  int(38860006)
  ["threshold"]=>
  int(620001)
  ["roots"]=>
  int(569998)
}
Memory usage: 317.21421051025MB
GC Status after gc_collect_cycles:
array(4) {
  ["runs"]=>
  int(130)
  ["collected"]=>
  int(40000000)
  ["threshold"]=>
  int(620001)
  ["roots"]=>
  int(0)
}
Memory usage: 256.33196258545MB
GC Status after third run:
array(4) {
  ["runs"]=>
  int(168)
  ["collected"]=>
  int(58540004)
  ["threshold"]=>
  int(760001)
  ["roots"]=>
  int(729999)
}
Memory usage: 334.30416107178MB

As you can see, escaping from the first ff() call or calling the ff() function for the second time did not free the roots of GC even though the GC was supposedly called (the number of runs increased) although there were more than 209999 roots in the buffer.

Calling manually gc_collect_cycles seems to clear the root buffer but it actually does not do anything - calling ff() for the third time actually shows total of 729999 roots in the buffer.

Perhaps this is somehow related to the issue mentioned in https://www.php.net/manual/en/features.gc.collecting-cycles.php

If the root buffer becomes full with possible roots while the garbage collection mechanism is turned off, further possible roots will simply not be recorded. Those possible roots that are not recorded will never be analyzed by the algorithm. If they were part of a circular reference cycle, they would never be cleaned up and would create a memory leak.

The presented case here though has GC enabled all the time, the number of roots seems to be counted and it still leaks memory.


After removing the assignment $arr[] = $a; from the ff function, the garbage collector is triggered every 10 000 roots (as documented) and the roots are gone as expected:

GC Status after first run:
array(4) {
  ["runs"]=>
  int(1999)
  ["collected"]=>
  int(19990000)
  ["threshold"]=>
  int(10001)
  ["roots"]=>
  int(10000)
}
Memory usage: 0.99087524414062MB
GC Status after second run:
array(4) {
  ["runs"]=>
  int(3999)
  ["collected"]=>
  int(39990000)
  ["threshold"]=>
  int(10001)
  ["roots"]=>
  int(10000)
}
Memory usage: 0.99087524414062MB
GC Status after gc_collect_cycles:
array(4) {
  ["runs"]=>
  int(4000)
  ["collected"]=>
  int(40000000)
  ["threshold"]=>
  int(10001)
  ["roots"]=>
  int(0)
}
Memory usage: 0.45681762695312MB

I got the same output for PHP 7.4, PHP 8.0, and PHP 8.1.

PHP Version

PHP 8.1.7

Operating System

Debian 10

@olsavmic olsavmic changed the title Garbage collector not triggered on large number of roots Garbage collector not cleaning roots / leaking memory Aug 3, 2022
@cmb69
Copy link
Contributor

cmb69 commented Aug 3, 2022

I don't see a memory leak here. While the script is executing ff() (which is most of the time), nothing must be freed, since $arr holds all the $a values (as such these zvals have a refcount > 1, and the cyclic GC won't release them). Only when the GC kicks in between calls to ff(), the garbage can (and will) be collected.

@olsavmic
Copy link
Contributor Author

olsavmic commented Aug 3, 2022

@cmb69

Perhaps I'm missing something obvious but that's not what's going on if you look carefully at the output.

The root buffer count keeps increasing between ff() calls and the total memory as well.

What you're saying is exactly what I'd expect to happen but that's NOT THE CASE.

If the total number of objects marked as root stored in the $arr is lower than 10 000, the memory is properly freed. I'll provide an example output for that as well.

Please take a look at the issue once again.

@cmb69
Copy link
Contributor

cmb69 commented Aug 3, 2022

The root buffer count keeps increasing between ff() calls and the total memory as well.

Yes, of course, since you are creating 10,000,000 objects in ff() and keep them from being garbage collected. This causes the root buffer to increase (the info about GC in the PHP manual is grossly outdated) to keep track of so many objects. Only if you call gc_collect_cycles() outside of ff() (or don't prevent the objects from being GC'd), memory is freed, and the root buffer emptied.

@olsavmic
Copy link
Contributor Author

olsavmic commented Aug 3, 2022

I understand that the root buffer grows in size but the gc_status()["roots"] number is supposed to represent current number of possible roots, not the size of the buffer. And if you look at the output of third (after gc_collect) and fourth call to ff, the number of reported roots grew from 0 to 700 000+ which corresponds to the fact that nothing was actually freed.

The total memory usage also decreased only slightly after gc_collect and is the highest after fourth call.

@olsavmic
Copy link
Contributor Author

olsavmic commented Aug 3, 2022

Ah, I see, if the documentation is outdated and it's no longer true that the gc is triggered for every 10 000 roots, then it makes sense that most of the memory is just allocated for the buffer :)

Thank you for the hints! I'll investigate it further for myself just to understand the logic behind it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants