Britney’s Memory Management
One of the things about the whole money thing and free software is the question of whether it’ll take all the fun and spontaneity out of hax0ring. As it turns out, that doesn’t even work when you try; so instead of doing dak work last weekend, like I’d planned and like the market was indicating, I ended up hacking on britney until about 5am Sunday morning, leaving the dak stuff to be spread over the week, instead. (After the last post, dak might seem cursed, but trust me it’s not. More on that RSN, but not just yet)
For those playing along at home, britney is the name of the scripts used to update Debian’s testing distribution; basically it automates some of the hard problems of Debian release management. Naturally, this is a pretty tricky job — there’re over 200,000 packages in Debian unstable (spread across eleven architectures), with fairly complicated sets of interdependencies amongst them. It gets complicated enough, in fact, that the core analysis britney undertakes isn’t just too processor intensive to do in perl or python, it’s too memory intensive to use standard malloc for its data structures.
(Cynical and Coding, and the UQ CS228 class of ’98 will probably have some idea of the fun to come at this point…)
The problem is that britney uses an absurd number of linked lists to resolve dependency issues, with each entry in the list being an independent block of a dozen or so bytes; and adding the usual malloc overhead to this can effectively double the memory usage. Worse, britney also likes to reuse memory quite a lot, does a fair bit of backtracking, and generally acts like the prima donna she is. Up until last week, britney’s memory was managed by the use of a parallel bitfield indicating whether memory was free or not: so britney would first allocate a 128kB block using standard malloc, note it’s all free, then every request for 12 or 52 bytes would result in some bits being twiddled; and every free would twiddle the bits back, so they could be reused later. This means two things: one is that the free invocation has to include the size of the memory to be freed as well as the address, and the other is that finding an appropriate sized block of freed memory can be awkward, leading to some degree of fragmentation.
This fragmentation seemed to lead to britney having a gradual memory leak, which would mean that if the release managers were doing some complicated transitions gigabytes of memory would end up being used and OOMing; or, since britney runs on ftp-master and it’s probably not a good idea to have the OOM killer start on the machine that ensures the Debian archive’s integrity, the ulimit is reached, and britney aborts unhappily. Unfortunately, though, the use of a “roll your own” malloc means you can’t use the standard memory checkers to see what’s going on, so you’re left with guessing and hoping. And as a consequence the memory leak’s been around for ages, causing fairly regular minor annoyance.
This changed last week when Micha Nelissen (neli) got into the act:
<neli> why does it need so much memory ?
<aba> neli: because it needs to check how the changes affect the installability count
<neli> aba: ok…but still does not seem like a big deal
<neli> how large trees are we talking then?
<aj> it has to do a lot of backtracking
<aj> how large is the longest dependency tree in debian?
<neli> backtracking only takes time, not space, right?
<aj> you need a list of things to backtrack over
<aba> neli: it also consumes memory, at least with the current dpkg-implem ntation
<aba> (well, there are ideas how to avoid this memory fragmentation, but that’s something else)
<aj> (it’s not actually memory usage, it’s memory fragmentation)
<neli> hmm, bad memory manager ?
<aj> imperfect, *shrug*
After poking around at getting britney to run locally, and doing some basic analysis on how it was used, neli decided to rewrite the memory manager to use chunk-size pools — so that a separate pool would be used for 4 byte allocations, for 8 byte allocations, for 12 byte allocations, etc. That’s possible due to two factors: one is that the special memory allocation is only needed for data structures which are all made from fairly small sizes, and are a multiple of the word size. It also means that fragmentation can be avoided completely — since you can use the “free” elements to store a free list taking up no extra space. Happily, that change also made the memory allocation much simpler.
(It also shows the benefits of a fresh look at code; I’d been thinking of doing something similar for ages, but always been stopped because I couldn’t figure how to make it cope with the various different sized blocks I’d need. As it turned out though, I’d solved that problem ages ago when I wrote a separate allocator to deal with strings, which would’ve required my malloc to cope with a much larger variety of sizes)
What it didn’t do was actually solve the memory leak. Which was pretty confusing, since the old memory checker had some tests to ensure that memory wasn’t really being leaked, and if the memory’s not being lost due to a missing free, or being lost to fragmentation, where is it going?
At this point, the real benefit of having a simpler malloc implementation came across; because suddenly it was plausible to add some debugging code to allocate a separate pool for every memory allocation, and to report how much memory is being used at any point, per line of code. That gave me output like:
pool 8B/1:751; 1962683 used 1965960 allocated (99.8% of 22 MiB)
So I knew that line 751 (of dpkg.c), was allocating 8B chunks, had used up to 22MiB in the past, and was currently using 99.8% of that. (Actually, I made an off by one error initially, so the “8B” above is actually 12B, as you can see if you divide 22MiB by 1965960)
That let me see two “leaks”. One turned out to be fairly easy to diagnose; as an optimisation, when I say “package foo is installable”, I note that “but we used bar to see that, so if we later update/remove bar, note that we’ll have to see if foo is still installable”. So the datastructure for bar ends up with a linked list of all the packages, foo, that relied on it. But I don’t check that list for dupes, so if I note that both bar and baz were required for foo to be installable, then update bar, when I recheck foo, I’ll add it to both bar’s and baz’s lists; which is fine for bar, whose list I cleared when I updated it; but not so fine for baz, because it’s now got two entries for foo. Having already added a check for dupes when trying to see if my guess was right, extending that check to skip the addition was pretty easy.
The other leak required a little more investigation. It was in a fairly tight loop which could reasonably be using a fair amount of memory; but by the end of the loop it should all be freed. Adding some more debug code to the allocation functions confirmed the problem:
<aj> pool 8B/11:1231; 1618157 used 1703936 allocated (95.0% of 13 MiB, 9.05% current)
<aj> so it looks like ~9% of those allocations are never freed to me
<aj> 9.05% = used/alloc
<aj> where used is incrememnted on b_m, decremented on b_f
<aj> alloc is incrememneted on b_m, never decremented
The code was meant to either add the allocated block to a larger list that’s being traversed, or to free it immediately — the problem turned out to be that there were some nested ifs, and an internal if was lacking an else that should have freed the block. Fixing that changed the results to:
pool 8B/11:1228; 160 used 131072 allocated (0.1% of 1 MiB, 0.01% current)
Which both dropped the leak — only allocating a maximum of 130,000 nodes instead of almost 2,000,000 — and resulted in almost all the allocations that had ever been made, having been freed once britney had been running for even a fairly short while.
As it turned out, though, the cure for the first leak was worse than the disease — checking for duplicates in a linked list turned out to take too long, and the memory wasted in that section wasn’t that big a deal. Using a red-black tree or similar instead would’ve been a solution, but so far doesn’t seem necessary. There haven’t been any OOMs since, happily; and even better, the reduced bloat seems to have made the overall checking a bit faster. Sweet.