Britney’s Memory Management

One of the things about the whole money thing and free software is the question of whether it’ll take all the fun and spontaneity out of hax0ring. As it turns out, that doesn’t even work when you try; so instead of doing dak work last weekend, like I’d planned and like the market was indicating, I ended up hacking on britney until about 5am Sunday morning, leaving the dak stuff to be spread over the week, instead. (After the last post, dak might seem cursed, but trust me it’s not. More on that RSN, but not just yet)

For those playing along at home, britney is the name of the scripts used to update Debian’s testing distribution; basically it automates some of the hard problems of Debian release management. Naturally, this is a pretty tricky job — there’re over 200,000 packages in Debian unstable (spread across eleven architectures), with fairly complicated sets of interdependencies amongst them. It gets complicated enough, in fact, that the core analysis britney undertakes isn’t just too processor intensive to do in perl or python, it’s too memory intensive to use standard malloc for its data structures.

(Cynical and Coding, and the UQ CS228 class of ’98 will probably have some idea of the fun to come at this point…)

The problem is that britney uses an absurd number of linked lists to resolve dependency issues, with each entry in the list being an independent block of a dozen or so bytes; and adding the usual malloc overhead to this can effectively double the memory usage. Worse, britney also likes to reuse memory quite a lot, does a fair bit of backtracking, and generally acts like the prima donna she is. Up until last week, britney’s memory was managed by the use of a parallel bitfield indicating whether memory was free or not: so britney would first allocate a 128kB block using standard malloc, note it’s all free, then every request for 12 or 52 bytes would result in some bits being twiddled; and every free would twiddle the bits back, so they could be reused later. This means two things: one is that the free invocation has to include the size of the memory to be freed as well as the address, and the other is that finding an appropriate sized block of freed memory can be awkward, leading to some degree of fragmentation.

This fragmentation seemed to lead to britney having a gradual memory leak, which would mean that if the release managers were doing some complicated transitions gigabytes of memory would end up being used and OOMing; or, since britney runs on ftp-master and it’s probably not a good idea to have the OOM killer start on the machine that ensures the Debian archive’s integrity, the ulimit is reached, and britney aborts unhappily. Unfortunately, though, the use of a “roll your own” malloc means you can’t use the standard memory checkers to see what’s going on, so you’re left with guessing and hoping. And as a consequence the memory leak’s been around for ages, causing fairly regular minor annoyance.

This changed last week when Micha Nelissen (neli) got into the act:

<neli> why does it need so much memory ?
<aba> neli: because it needs to check how the changes affect the installability count
<neli> aba: ok…but still does not seem like a big deal
<neli> how large trees are we talking then?
<aj> it has to do a lot of backtracking
<aj> how large is the longest dependency tree in debian?
<neli> backtracking only takes time, not space, right?
<aj> you need a list of things to backtrack over
<aba> neli: it also consumes memory, at least with the current dpkg-implem ntation
<aba> (well, there are ideas how to avoid this memory fragmentation, but that’s something else)
<aj> (it’s not actually memory usage, it’s memory fragmentation)
<neli> hmm, bad memory manager ?
<aj> imperfect, *shrug*

After poking around at getting britney to run locally, and doing some basic analysis on how it was used, neli decided to rewrite the memory manager to use chunk-size pools — so that a separate pool would be used for 4 byte allocations, for 8 byte allocations, for 12 byte allocations, etc. That’s possible due to two factors: one is that the special memory allocation is only needed for data structures which are all made from fairly small sizes, and are a multiple of the word size. It also means that fragmentation can be avoided completely — since you can use the “free” elements to store a free list taking up no extra space. Happily, that change also made the memory allocation much simpler.

(It also shows the benefits of a fresh look at code; I’d been thinking of doing something similar for ages, but always been stopped because I couldn’t figure how to make it cope with the various different sized blocks I’d need. As it turned out though, I’d solved that problem ages ago when I wrote a separate allocator to deal with strings, which would’ve required my malloc to cope with a much larger variety of sizes)

What it didn’t do was actually solve the memory leak. Which was pretty confusing, since the old memory checker had some tests to ensure that memory wasn’t really being leaked, and if the memory’s not being lost due to a missing free, or being lost to fragmentation, where is it going?

At this point, the real benefit of having a simpler malloc implementation came across; because suddenly it was plausible to add some debugging code to allocate a separate pool for every memory allocation, and to report how much memory is being used at any point, per line of code. That gave me output like:

pool 8B/1:751; 1962683 used 1965960 allocated (99.8% of 22 MiB)

So I knew that line 751 (of dpkg.c), was allocating 8B chunks, had used up to 22MiB in the past, and was currently using 99.8% of that. (Actually, I made an off by one error initially, so the “8B” above is actually 12B, as you can see if you divide 22MiB by 1965960)

That let me see two “leaks”. One turned out to be fairly easy to diagnose; as an optimisation, when I say “package foo is installable”, I note that “but we used bar to see that, so if we later update/remove bar, note that we’ll have to see if foo is still installable”. So the datastructure for bar ends up with a linked list of all the packages, foo, that relied on it. But I don’t check that list for dupes, so if I note that both bar and baz were required for foo to be installable, then update bar, when I recheck foo, I’ll add it to both bar’s and baz’s lists; which is fine for bar, whose list I cleared when I updated it; but not so fine for baz, because it’s now got two entries for foo. Having already added a check for dupes when trying to see if my guess was right, extending that check to skip the addition was pretty easy.

The other leak required a little more investigation. It was in a fairly tight loop which could reasonably be using a fair amount of memory; but by the end of the loop it should all be freed. Adding some more debug code to the allocation functions confirmed the problem:

<aj> pool 8B/11:1231; 1618157 used 1703936 allocated (95.0% of 13 MiB, 9.05% current)
<aj> so it looks like ~9% of those allocations are never freed to me
<aj> 9.05% = used/alloc
<aj> where used is incrememnted on b_m, decremented on b_f
<aj> alloc is incrememneted on b_m, never decremented

The code was meant to either add the allocated block to a larger list that’s being traversed, or to free it immediately — the problem turned out to be that there were some nested ifs, and an internal if was lacking an else that should have freed the block. Fixing that changed the results to:

pool 8B/11:1228; 160 used 131072 allocated (0.1% of 1 MiB, 0.01% current)

Which both dropped the leak — only allocating a maximum of 130,000 nodes instead of almost 2,000,000 — and resulted in almost all the allocations that had ever been made, having been freed once britney had been running for even a fairly short while.

As it turned out, though, the cure for the first leak was worse than the disease — checking for duplicates in a linked list turned out to take too long, and the memory wasted in that section wasn’t that big a deal. Using a red-black tree or similar instead would’ve been a solution, but so far doesn’t seem necessary. There haven’t been any OOMs since, happily; and even better, the reduced bloat seems to have made the overall checking a bit faster. Sweet.

Debugging Debootstrap

Contrary to expectations, last week’s AJ Market project turned out to be debootstrap, not dak. Just goes to show a single person can make a difference in today’s world: debootstrap popped into the lead from nearly the bottom thanks to a single contribution. (I wonder if it makes more sense to make contributions anonymous or not?)

For those playing along at home, debootstrap is a tool to build a Debian system from (almost) nothing — it just requires some basic POSIX shell functionality, things like sed, grep, sort, ar, tar, gzip, and wget. So it’s useful when you’re initially installing Debian (or derived distros like Ubuntu), or if you’re creating a chroot environment for dedicated build environments.

Anyway, the debootstrap changes were pretty much bugfixes only, so there’s no fancy new features (okay, there’s one: a --make-tarball option) but a few of the fixes are worth a quick look.

  * Don't create empty available files, since old dpkg and new kernels can't
    deal with them. (Closes: Bug#308169, Bug#329468)

Reasons why POSIX is no fun: it standardises things that don’t match what programs actually do, then people recode their software to match POSIX, and programs that relied on the old behaviour break. In this case, Linux kernel’s mmap() behaviour on empty files changed — it’s now invalid to try to mmap an empty file rather than just giving you an empty buffer. Unfortunately dpkg tries to mmap its available file, and debootstrap hands dpkg an empty available file, leading to unhappiness. Reportedly dpkg was the only app relying on the old mmap behaviour…

  * Turn on --resolve-deps by default. Add --no-resolve-deps as an option.

An alternative to debootstrap is cdebootstrap, which is written in C instead of shell, and whose main claim to fame is the ability to work out which packages to download entirely dynamically, rather than needing to be updated when the base system changes. debootstrap finally got that feature too in 0.3.0, but the default was to do it in only a limited way, which was to expect the Priority: field in the Packages file to correctly tell you which packages you need. That’s turned out to be a bit of a nuisance, though, so we’ll see what happens with doing it the cdebootstrap way instead.

  * Catch failures in "dpkg --status-fd" (Closes: Bug#317447, Bug#323661)

One of the problems of writing debootstrap in sh is that it’s a horribly kludgy language. So when you try doing something a little bit intricate, in this case trying to interface dpkg and debootstrap so debootstrap can summarise dpkg’s progress at install packages, you get into all sorts of problems. We had the problem here that we need to (a) get the regular output of postinsts and so forth for the user, (b) get the –status-fd output, parse it, and paraphrase it in a form suitable for either the user or another tool like debian-installer, and (c) pass through the error status returned by dpkg and possibly immediately abort the install. To do (b) you need to pipe dpkg somewhere, but that can only be done with stdout, which is where the output for (a) was going to go, so you have to reroute those two things, then route them back: that means you need two spare FDs, and you need to say: (dpkg --status-fd N N>&1 >&M | parser) M>&1. But that’s not enough, since the pipe loses dpkg’s exit code and the subshell stops you from automatically aborting, either of which violate (c). The solution ended up being to instead say (dpkg --status-fd 8 8>&1 1>&7 || echo EC $?) | parser having made sure &7 goes to stdout earlier (exec 7>&1) and making sure the parser would note the EC string and use that for its exit status.

  * Use partial/ directory when downloading. (Closes: Bug#109176)

One thing that was particularly pleasing was to be able to fix the second oldest open debootstrap bug (well, the oldest non-wishlist bug), filed just over four years ago by apt author, Jason Gunthorpe, about how I was misusing apt’s cache, by not coping with multiple versions of packages being present, and not using the partial directory. The latter actually has some (fairly mild) security implications, in that if debootstrap downloads a hacked deb or Packages file, but is aborted before it has time to see that the md5sum doesn’t match what it should, it’ll leave the partial download somewhere apt will blithely trust. That got mostly fixed in the 0.3.0 upload, and the final bit — using the partial directory — is now done too. Sweet.

In any event, that’s what last week’s contribution bought. Now’s the time for you to decide what this week will bring!

AJ Market Update

Hrm, I’m going a bit single issue; I should fix that. But not right now.

So it’s been a couple of weeks since I first posted about my little market experiment, which seems as good a time as any to take a look at how it’s working out. On the one hand it’s going fairly well; it’s pleasing to finally have tiffani done, and I’m pretty satisfied with how usercategories have turned out, and Joey Hess has started using them for the copious bugs the debian-boot team have to deal with, providing both an archfirst ordering and a categorised ordering for the by-maintainer views.

Of course, creating those views didn’t turn out as straightforward as it should have, and ended up involving fixing quite a few bugs in my initial implementation of usercategories (and, happily, coming up with a more pleasant algorithm for handling the ordering specification in request@ mails). There were a few bugs introduced for people who didn’t use usercategories that turned up as a result of their implementation which also needed fixing.

To me, though, the striking difference between taking a “volunteer” attitude to Debian and a “professional” was in how long it took usercategories and tiffani to go from conception to implementation. I don’t think they’re terribly different in complexity; there’s a mental leap in working out what you want to do (people have been thinking about making Packages files faster to download for years, and using ed diffs isn’t particularly obvious; likewise I’ve been thinking about generalising the categories the BTS shows you since I started playing with debbugs in ’99 or so, but usercategories didn’t really come together even as an idea until debconf this year), but given that, the actual implementation for both ideas requires a little care, but doesn’t have all that many twists and turns. The result? tiffani, the amateur project, took three and a half years to do; usercategories, the amateur project treated professionally, took two months.

So on that score, this still seems completely worth doing. On the other side of the scale, there’s been people’s interest in actually contributing. Which seems as good a place as any for a fold.

I’ve added a list of both contributions and deductions to the market page, and as of the time of writing there’s been six contributions. The first was my initial “virtual” $50 to try and provide some scale to the donations, and so I wouldn’t have to worry about division by zero, either literally or figuratively. The first couple of days say a $2 donation (devalued to $1.63 thanks to Paypal’s handling fees), and the next week saw a $5 donation (devalued to $4.53 thanks again to Paypal). So about $3.08 a day, unless I want to actually get at the money, which would attract another $1 in fees. Mmm, the world of high low finance. (My understanding is that moneybookers is supposed to do a better job than paypal; the fees above would’ve been 7c instead of 84c for the transfers, and $2.89 instead of $1 for the withdrawal, though, so for amounts in that range, it’s not that much better)

That wasn’t entirely unsurprising; given I’d already seen the three eurocents offering in relation to Martin Krafft’s Debian book, or that there’s probably more interest in having me be a camchicken than work on backend Debian stuff (and that’s just continuing the fine tradition begun by the very same Michael’s dunk tank last year). The free beer element of free software tends to be more fundamental than people like to admit, and even in essays about how to put money into free software there’s usually a strong focus on making it clear we’re not talking about too much money.

Given that sort of expectations, those results aren’t too disappointing but they’re not enough to make the concept self-sustaining either, so on the weekend I decided I’d set a floor of $15 and if that wasn’t reached, just work on darcs hacking which (the way I do it, anyway) is a fun mix of free software and proofs involving set-theory. Sadly, or not, that floor’s already been beaten for this week, with a couple of donations of $10, appropriately modified by currency conversions and Paypal fees.

I’m not quite sure what “self-sustaining” actually is. I’m particularly interested in both markets and Debian, so putting the two together makes me overly inclined to be like the Playboy photographer of jest: “$500 a day, you say? How about $450 and I bring my own camera?” My guess is that $100 a week for a day/week’s work is probably the point at which there’d be some evidence of viability; though comparing that to either the $3 per week so far, or the amount “real” IT work pays, that might be a bit off — either being implausibly high, or still too low to really say anything.

Either way, we’ll see what it looks like in another couple of weeks, especially since this week’ll lift the average quite significantly. By that point, by the looks of things, I’ll have done a bit more work on dak. My top three items there are (a) get our revision control working again”, (b) do up a prototype “dak” super-script, so that you can say things like “dak ls” instead of “madison”, “dak process-unchecked” instead of jennifer, “dak gen-diffs” instead of “tiffani” and so on, and (c) implement SCC, ie the mirror split that’ll make it plausible to substantially increase the size of the complete Debian archive. Both the first two should be fairly easy, but the third is a fair bit more complicated. In particular, while we’ve got a fairly good general idea of what SCC involves, the particulars haven’t ever really been nutted out, and once that’s done and the appropriate scripts are finished and integrated, there’s intended to be a few weeks to allow mirror operators to work out what they want to mirror and, if necessary, reconfigure their scripts so they don’t end up running out of disk.

What that likely means is that once cvs and dak(1) are fixed up, SCC will remain on the dak list on an ongoing basis as it gradually gets put in place, rather than being crossed off as soon as I get to it. that’s something I’ve been trying to avoid to date, and it’s putting a little bit more pressure on the whole “market dynamic” thing than I’m entirely comfortable with while the market’s so small. Hence, I’m not sure how that’ll actually work out, but hey, the whole point of experiments is to find that out, no? So we’ll see what happens.

It would, of course, be particularly helpful if more folks would contribute. But then, I would think that, wouldn’t I? Feedback would probably be helpful too (and there’re even comment fields for both Moneybookers and Paypal in which to provide it! ;)

Tiffani

So this week’s project was working on dak, in particular getting the tiffani implemention included. What’s tiffani, you ask? It lets you just download the changes to Sources.gz and Packages.gz files, instead of the whole damn thing — if you have main, contrib, and non-free for unstable in your sources.list, this means your apt-get update only needs to download 27.5 KiB instead of 2.85 MiB (for bz2) or 3.75 MiB (for gz). Two orders of magnitude saving’s not bad.

I’ve mentioned this previously a few times if you want to see the history, in January I noted some quick hacks which have now been obsoleted by Michael Vogt’s integration into apt (which is in a personal repository, not uploaded to unstable yet…), back in August last year I did a little promotion of the concept, referring back to my post in December 2003 that summarised the analysis done previously, which takes us back to March and April 2002.

The server-side implementation, tiffani, is thus a little python script that groks through the list of suites and components dak knows abouts, looks for Packages, Sources and Contents files, and creates a directory names something like Packages.diff which in turn contains an Index file and a number of gzipped patches, named after the time they were created. The process for updating an existing Packages file is to look for the matching SHA1-History entry in the Index file, download the corresponding patch, apply that patch to your existing Packages file, and repeat until your Packages file matches the one you were looking for (which you can determine by looking at the checksum in either the Release or Index file).

Anyway, tiffani’s been co-developed with Andreas Barth (aka Andi) after the following exchange back in November 2003 on the RM channel:

<aba> elmo: If I find some time to take a look at katies TODO-file, what would be most appreciated? Deleting udebs with melanie? “Right” handling of orig.tar.gz when jumping out of NEW / changing of the archive section? Something else?
<elmo> aba: the TODO list is a bit of a mess to be honest
<aba> elmo: So, what should I do if I want to help the ftp-masters with katie?
<elmo> aba: I wouldn’t necessarily do anything based off that unless it’s obviously correct/wanted. none of the really urgent stuff is even on there (e.g. testing-security changes)
<aba> elmo: If you tell me something urgent, I can put some time into it.
<aj> hrm, surely i have some wishlists
<aba> urgent wishlist? :)
<aj> actually, you could critique something for me if you’ve got half an hour free now?
<aba> “critique” means?
<elmo> tell him how crap his ideas are. as brutually as possible
<elmo> ;)

<aj> ooo!
<aj> aba: around?
<aj> aba: one useful thing you could work on that’s kinda katie related is implementing pdiffs into katie/apt-ftparchive/apt/whatever
<Kamion> aj: mvo’s been working on that over the last couple of days, see #128818 logs
<aj> apt side or katie side?
<Kamion> apt side
<aj> cool
<aj> aba: hacking up a katie-side implementation for that would probably be interesting, worthwhile and straightforward

(aj = me, aba = Andi, elmo = James Troup, Kamion = Colin Watson)

That ended up with me finding an old script I made up called update.py (from Dec 2003 apparently) that did the core patch and Index generation, Andi updating that to some more sane pythonic style and making it work on an actual archive, then both of us cleaning it up so it would actually work sensibly without handholding.

(For those playing along at home, the idea needing critiquing was trying to fix up dak’s idea of what should happen if a new version of a package is uploaded to stable or testing, when it hasn’t changed in unstable. At the time, dak would REJECT it, because it violated the requirement that you can’t downgrade packages between stable and testing or testing and unstable. That’s still somewhat broken, unfortunately)

TTBOMK, Tiffani’s named after Tiffani Amber Thiessen, apparently of Saved by the Bell and Beverly Hills 90210 fame. I’m not sure who of aba or elmo is more to blame for that one, though I guess it is at least a little more distinctive than update

Usercategories and other miscellania

So, this week’s AJ Market project was the first couple of items on my debbugs TODO list, viz:

  1. Finish off usertag support
  2. Implement usercategory support

Both of these are essentially followup for the initial usertags announcement from last month. The usertags cleanup amounted to adding some basic documentation which will hopefully make it to the website soon, adding some users to various views by default (currently the @packages.debian.org address for package and source queries, and the maintainer and submitter address for maintainer and submitter queries), limiting the characters that can be set in usertags so they don’t conflict with the characters the CGIs are willing to display, and suchlike.

Some other minor niceties, such as docs for the block command, summaries of blocking issues on the pkgreport pages, and a cute mrtg graph for monitoring the BTS’s spam related delays, also made it in.

The bulk of the work though went into the usercategory feature. The idea of usercategories is to let you sort bugs into more suitable categories than the defaults, and to leave working out what those are to individual maintainers and users. The previous possibilities were to stick them into the URL as CGI parameters (possibly via a tinyurl redirect, or a link on some other page), or to use cookies locally after filling in a form. Neither worked really well, so the solution instead is to allow you to create a named usercategory via the email request/control interface, and reference it by name in the URL. The URL syntax is just to add an

;ordering=development-view

option to the pkgreport.cgi URL; and the request bot syntax is:

To: request@bugs.debian.org

user bugs.debian.org@packages.debian.org
usercategory dev-priorities [hidden]
 * Development Priorities [tag=]
  + Sorting Features (should close) [sorting-close]
  + Usertag Related Features (should close) [usertags-close]
  + CGI Related Features [cgi]
  + Changes to rewrite rules [rewrite-rules]
  + Bug Subscriptions [bug-subscriptions]
  + Bugscan Problems [bugscan]
  + Index problems [indexes]
  + Spam Control [spam]
  + Documentation [doc]
  + Re-education Camp Required [inflexible-view-of-the-world]
  + Random Features [random]
  + Can be closed? [should-close]
  + Unprioritised bugs []

usercategory development-view
 * dev-priorities
 * status

The leading spaces are optional; the asterisks and plusses aren’t. Asterisks denote a new level, and plusses denote a section within the level. You can define a prefix for the level as a syntactic convenience. You can either define each level in full, or by reference to a different category, in which case that category’s level(s) will be inserted. If you later change “dev-priorities”, “development-view” will be correspondingly changed. Usercategories are listed in the drop down box in the Options form, though only as long as they’re not marked “[hidden]”.

The above lists the tags in the order they should be looked at, so if a bug has both “random” and “should-close” as usertags, it’ll be listed under “Random Features” not “Can be closed?”. That’s the order those sections are listed in too — but that can be changed by adding an explicit ordering, such as:

  + Random Features [7:random]
  + Can be closed? [5:should-close]

A nifty trick, which won’t necessarily continue to work, is the possibility of creating a usercategory called “normal”, which will overwrite your default view of the bugs, particularly if its assigned to a user whose tags and categories are automatically available, such as the packages.debian.org, maintainer and submitter addresses.

In theory that should be enough to make usercategories actually usable. There are certainly some bugs remaining, but they don’t seem to get in the way too much.

UPDATE 2005/10/10:

Hrm, request@ changes actually updated on bugs.debian.org now. I knew committing to debbugs CVS promptly would come at a cost…

The AJ Market

Where to begin?

One of the things that’s most struck me about Ubuntu is how far it’s progressed with little more than Debian as a base, some reasonable cash to cover a professional level of work, and some dedication to promoting itself and community building.

As an experiment, in July, August and September I tried doing something similar with debbugs — ie, actually committing myself to spend some real time on it as a professional (which I guess ended up being a day or two a week on average, but was still fairly irregular unfortunately), and promoting it both by giving a talk about it at debconf, and involving more people in its development and trying to get some of the feature requests that’d been hanging around finished with so we could move on to new stuff.

I think that’s actually had pretty impressive results — there’s a lot more interest, some fairly serious improvements in both its look an functionality, and for a project that’s been essentially moribund for half a decade, it’s even gained a little momentum. If I hadn’t already been amazed by how well Ubuntu’s done with relatively little effort, I’d’ve been utterly shocked, and heck, maybe I am even so.

Of course, the problem with this is that it really does rely on some real, professional-level commitment; it’s hard to be enthusiastic and active if you’ve just had a stressful day doing paying work, and it’s hard to be responsive if your Debian time doesn’t have any set schedule, and ends up competing against other hobbies, like sleep. But on the other hand, dedicating 40% of your potential income to free software isn’t really something that’s that easy to justify on an ongoing basis, unless perhaps you’re already ridiculously wealthy, or comfortably retired. Even Richard Stallman has a couple of awards worth a few hundred thousand each, to justify his time spent.

Adding this and my relatively recent fascination with market dynamics, I’ve been pondering over the last few weeks whether it’s not worth taking my longstanding amenability to bribes a little more seriously, and trying to construct a real justification for treating Debian work as a professional venture rather than an entertaining hobby that lets me see the world, both virtually, and occassionally for real.

Hence, the AJ market.

The idea is I dedicate some real time to work on free software, and you contribute money to tell me what’s worth working on.

I think it makes sense from both a “free software” point of view, and an “economics” point of view. On the free software side, it avoids getting entangled with proprietary software, promotes development, and provides an easy way to ensures my “users” are actually my priority without giving up my judgement on what’s actually a sensible way of doing things. On the economics side of things, for the time being at least the supply side’s okay, since at worst, I’m willing to throw away some time to see how this works out, and on the demand side, there seem to be enough people who think I should be doing more work on one area or another, that some of them might think that’s worth more than just talking about it. In theory, one or two hundred folks liked what I do for Debian enough to vote for me as DPL, I guess it’ll be interesting to see if that translates to cash rewards. :)

Anyway, that’s the theory. There’re a reasonable number of links from the market page to explanatory stuff, but if that’s too complicated I guess the simple summary is something like this: work on debbugs makes fixing bugs in Debian easier; work on dak makes organising Debian easier; work on britney makes releasing Debian easier; work on debootstrap makes installing Debian easier; work on ifupdown makes networking Debian machines easier.

I’ve also added a little chart on my blog, for those of you who don’t get this via RSS. No more Google ads or paypal buttons.

#debian-tech

From my irclogs of last month:

<aj> vorlon: sounds like you should write up a OFTC #dd code of conduct :)

In the tradition of all good free software hackers, Steve naturally managed to palm that back off onto me. In the end, we’ve decided to put together a new channel for Debian development discussion, called #debian-tech on OFTC. It’ll probably be quite a bit different from #debian-devel on either OFTC or FreeNode; hopefully that’ll turn out to be in a good way. We’ve got (I think) a pretty good variety of ops, who have (I think) some pretty good ideas on encouraging good productive activities on Debian. We’ll see what happens!

There’s a wiki page about it, including the charter/conduct guidelines up at http://wiki.debian.org/IRC/debian-tech. Please do read the charter before joining.

UPDATE 2005/09/23:

From the #debian-tech charter:

Motivation

It’s often difficult to have a civilised discussion about improving Debian, since invariably someone will be annoyed by any change, and will either deliberately or unintentionally seek to shut down discussion by attacking the participants, or acting in a way that is seen as an attack by the participants. […]

First commment on #debian-tech, from Wouter, via Planet Debian:

Playing police will not work

So there’s now a Debian Tech channel. Apparently, some people thought there was need for a channel where we’re all nice and friendly rather than start attacking eachother; and if you’re not like that, you might get kicked out of there.

This isn’t the first proposal from aj that involves forcefully being nice to eachother and policing those who’re not doing so, but I don’t think it’s going to work. […]

The question you should ask before implementing police states is “is this police going to add value to whatever we do,” not “is this police going to make us happier?”

Watch Out For The White Male

Pia asks:

Why is it that older, heterosexual, Christian, married, white males, who probably only make up ~16% of our totaly population are making the decisions for all of us? So much for representative politics :)

Let’s consider the alternatives. Younger rather than older means less life experience, which means you’re probably electing a party hack who knows how to stack a branch, and mouth a few cliches, but doesn’t really have any idea what it’s like to actually do something productive, which makes it difficult to do the job of actually helping people be productive. Being non-heterosexual usually comes with a feeling of being discriminated about it and a not-unreasonable desire to do something about that; according to some statistics, about 2.5% of people in Australia identify as homosexual or bisexual, and same sex couples apparently account for only 0.46% of couples.

Going on from that point, according to the 2001 census, about 51% of Australians are married, another 17% separated, divorced or widowed, and 32% of Australians who have never married; I don’t think it’s much of a leap to exclude a fair chunk of the latter group as “not married yet, but will be” (the average age to get married is around 30, apparently, and the 15-24 age group is about half the size of the unmarried group), or much of a stretch to think that the married folks are more likely to be better able to deal with the highs and lows of politics.

As far as “white” goes, according to the 2001 census, around 80% of Australians considered their ancestry to be Australian, English or Irish; the other 20% isn’t stated at all, but presumably includes at least a few Europeans. As far as Christian goes, 70% of Australians identify as Christians, followed by 16% as non-religious, and 10% who didn’t answer the question. The next most popular religions are Islam and Buddhism on a little over 1% each. (There are five times more pagans in Australia than scientologists, apparently, though there are fewer Rastafarians)

And then, if you look through the current Federal ministry, there are six women ministers (of 30, for a 20% contribution, and an additional five of elevent if you count the parliamentary secretaries for a 27% total contribution).

Which is to say, I don’t think what we’ve got is particularly unrepresentative. But then, I don’t think the point of representative politics is about representing your skin colour, religion, gender, sexual proclivities, or favourite sporting team anyway; why should they even be a consideration?

Bush and Brown

So it appears Michael Brown’s been moved aside from managing the Katrina response, and will probably be leaving FEMA entirely soon. No big surprise there; what’s interesting (to me) is this quote:

Asked ahead of the announcement if he was being made a scapegoat, Brown told The Associated Press after a long pause: “By the press, yes. By the president, no.”

“I’m anxious to get back to D.C. to correct all the inaccuracies and lies that are being said,” Brown said.

Asked if the move was a demotion, Brown said: “No. No. I’m still the director of FEMA.”

That squares pretty well with my post from just over a year ago about Bush’s policy on moving people on, summarised (by Dick Cheney in 2000) as:

You will never see him pointing the finger of blame for failure…you will only see him sharing the credit for success.

I’m still not sure whether I find it more surprising that that policy could work for the leader of the free world, or that random praise from your sidekick in an acceptance speech could actually be useful information.

UPDATE 2005/09/28:

Shortly after I posted the above, Mike Brown resigned, saying

“As I told the president, it is important that I leave now to avoid further distraction from the ongoing mission of FEMA,” Brown said in a news release.

“It has been an honor and a privilege to serve this president and to work shoulder to shoulder with the hard working men and women of FEMA. […]”

Pretty hard to tell from that whether or not he really did just decide that on his own, or whether he was pushed; after all if he was pushed, he’d be saying much the same thing, but on the other hand, would he still be emphasising the privilege of serving under “this” president, rather than the presidency in general?

Lending weight to the “wasn’t pushed” line, is the following from the ABC/Agence France-Presse/Reuters:

His resignation as FEMA chief embarrassed under-fire President George W Bush, who had stood up for Mr Brown in the immediate aftermath of the hurricane, telling him: “Brownie, you’re doing a heck of a job.”

If he was pushed, you wouldn’t expect his resignation to be an embarassment. That’s still not terribly convincing, though — if you assume that they were going to add some slur against Bush no matter what, claiming they’re embarrassed by the resignation is probably more effective than the only real alternative of claiming they’re turning him into a scapegoat, since making the latter into a bad thing would require making Brown a sympathetic character, when the article’s focussing on how “his defense failed to impress”.

More Bad News on the Security Front

Today’s issue of Linux Weekly News includes a security response time comparison amongst major distros. Debian comes last on all the vulnerabilities examined bar one; here’s a summary of response times:

Debian Fedora GenToo Red Hat SuSE Ubuntu
Average days 19.8 5.8 7.4 12.0 12.7 5.0
Maximum days 35 16 14 28 16 12
Minimum days 9 0 3 4 7 1
Number n/a 0 0 0 2 0 1
Number apparently unfixed 3 3 2 1 6 2

Read the article for the details (subscribers only for a couple of weeks), the above summary’s my own.

Debian’s security support has been in the press a fair bit recently, from the snafu in the installer at sarge’s release, to the failure to be ready to support the sarge release and ongoing problems with the availability of people on the security team, and a brief article in German magazine Heise late last month about security.debian.org briefly being unavailable.

Some discussion on an IRC channel following the Heise article concerned the manpower issues of the security team (of the 195 posts to Debian’s security advisory list this year, 176 have been from Martin Schulze and the remaining 19 from Michael Stone; and of the five security team members who are able to do updates, two haven’t even logged into the security archive host in over six months), included the following comments:

<aj> has there been any call for help with the security team?
<Overfiend> aj: Put out by whom?
<aj> anyone at all?
<Overfiend> aj: No. Joey won't answer my questions about delegation, so I'm not sure I have the power.
<pitti> aj: I offered to help, but Joey told me to just continue to send patches; that's fine for me
<stockholm> joey tells me there are no problems and everything goes as planned. right. :-(

(I’m aj, Branden Robinson (the Debian Project Leader) is Overfiend, Martin Pitt (who does a lot of security work for Ubuntu, which has some of the best totals above) is pitti, and Andreas Schuldei is stockholm. The Joey referred to above is Martin Schulze)

More recently, we’ve seen the establishment of the “testing-security” infrastructure entirely separately to the regular Debian security infrastructure. This is in spite of security.debian.org having had some degree of support for handling updates to testing since its creation in 2002 (just before the woody release, when it became apparent that in spite of their work on “rbuilder”, the security team simply was not able to maintain their own infrastructure).

Then there’s been the internal bickering, such as Joey’s assertions in March that the number of ports isn’t an issue in handling security support, or his public complaints when it turned out to take a while to get the various buildds updated for sarge’s release, and again and again and again afterwards.

Then, of course, there was the disavowal of security support for popular packages such as Mozilla, Firefox, and Thunderbird.

What fun.

UPDATE 2005/09/09:

There are three security bugs that LWN lists as unfixed for Debian. One’s the vim modelines bug, filed as Bug#320017 fixed in unstable on the 28th of July and with a fix uploaded to proposed-updates for the next stable release on the 30th of July which makes for a response time of three days (it was announced on the 25th of July). The second is an evolution bug, filed as Bug#322535 fixed in unstable as of the 25th of August, 15 days after it was announced on the 10th of August (also of interest may be Bug#295548, about a security update to woody in February removing evolution’s SSL functionality). The third bug is an issue in apache-ssl, which was announced on the 2nd of September, and was filed today as Bug#327210, but is as yet unfixed.

Isn’t America Exciting?

Some interesting notes from American news. Below the fold, because politics isn’t what’s important.

It seems the current meme is that President Bush isn’t doing enough to help resolve the chaos left in Katrina’s wake. Interestingly, one thing that apparently he did do, was ensure the evacuation of New Orleans was a mandatory event, rather than a voluntary one; which presumably would’ve lead to even more horrendous results — and this was, afaict, at the point when it was known Katrina was a category four or five hurricane that would hit New Orleans, and that it’s levees were only built to cope with a category three cyclone. It’s probably unfortunate that Dubya didn’t also insist on actually using the school busses to evacuate people who didn’t have their own transportation (Jabbor Gibson in 2008, perhaps — unless he ends up with a conviction for looting that disqualifies him of course). Given both the New Orleans mayor and Lousiana governor are Democrats, I wonder if folks like Joseph Cannon are being entirely sane with the whole “Katrina: Yes, You CAN Blame Bush” and “But let us make one thing clear: We WILL politicize this issue” spin — a screed second only in its vileness to Jesse Jackson’s claims that the insufficient response is because everyone hates black people, or Fred Phelps claims that the hurricane was God’s way of punishing fags. (Both men claim the title “Reverend”, which I’m certainly not going to grant them, and both crave attention, which they don’t deserve either as far as I’m concerned, so no links)

But back to the lesser vileness, partisan politics. Both the governor and mayor do seem to be fairly conservative Democrats — Mayor Nagin apparently switched parties in order to get elected; so maybe that’ll just get incorporated into the spin in order to get a “true” liberal as the Democrat nominee for the next position that opens up. That strikes me as more likely to encourage voters (in the Deep South at that) to switch from a conservative Democrat to an out and out Republican, but hey, what do I know.

Mirroring that strategy on the right seems to be Rick Santorum, at least according to Jonathan Rauch. He reports that Senator Santrum’s new book, It Takes a Family attempts to redefine conservative thought in terms of the loving family instead of the rugged individualist; and sets himself up in competition with conservatives in the Reagan (or Friedman, or Schwarzenegger) mould. I could probably have bought into the “family” idea, except that Santorum has pretty strict ideas on what is a “valid” family and what isn’t — he’s probably most loved for his man on child, man on dog comments on homosexual relationships, after all. Families are important, but if you’re going to use that as an excuse to tell people how to live their lives, get out of government and become a minister instead; “Rev Santorum” has a ring to it, don’t you think?

The political implications could be interesting; there’s potentially two simultaneous “wedge” issues there driving both the Democrats and the Republicans apart internally, I wonder if it won’t end up driving the non-raving-lunatic left and the non-peeping-tom-division of the right into some form of coalition of the sensible. Hey, it worked in California.

LaunchPad

Back in June, I noted that LaunchPad isn’t free sotware, and because of that concluded:

And that’s pretty much the point where Canonical’s not a free software company, but a vendor providing proprietary services for the free software community.

I got a couple of private comments (which are reflected in an update to that post) to that which gave some fairly non-specific assertions that “the plan” was LaunchPad will be free software eventually, and that complaints would prove pretty redundant.

Since then, my referer logs pointed me at Joachim Breitner’s post on LaunchPad from July, that expressed some similar concerns. More interesting is the follow-on comment on the post from Mark Shuttleworth:

I can only say that I hope, in the fullness of time, you’ll be very happy with the way we handle Launchpad. Over time, it will be open sourced. Right now we compete with Progeny and Red Hat and other companies, so we need to have a unique offering to do so effectively, and that’s Launchpad.

I’d encourage you to read the whole thing to get the context, but to me, the only logical conclusion to that is that LaunchPad will be free when Canonical/Ubuntu are the only players in the market, or when Canonical’s current business model fails and they switch to a different one. Which is fine: if you write some software from scratch, it’s your choice what you do with it; but unless you’re an underpants gnome or a slashdot commenter, the above doesn’t qualify as a “plan” to free LaunchPad.

Birthdays!

It’s Debian’s 12th birthday, and bubbles’ alphabetical birthday, and what better way to celebrate than with some recipe blogging?

Bubbles Bread

Ingredients:
1 slice of bread
mustard
mayonnaise

Cover the bread with the mustard and mayonnaise. Be artistic! Serves 1.

Debbugs Pops The Trunk

Mikal writes:

Why do I use Debian? Well, one of the reasons is the bug reporting.

People in Mikal’s shoes might’ve noticed a few changes in the Debian bug tracking system (BTS) lately, such as the long awaited roll out of version tracking to help us deal with tracking bugs amongst the multiple versions of packages across stable, testing, unstable and experimental, and bug subscriptions, allowing you to track an individual bug by email, should you so desire. We’ve also added both Don Armstrong and Pascal Hakim to the BTS team (though at the time of writing the Organisation page hasn’t been updated).

These changes are, of course, just part of a vicious plot by my fellow team members to make my debbugs talk at dc5 completely out of date as soon as possible; so the value of the paper and slides and the video are depreciating pretty rapidly, but not so much I won’t link them.

One nice thing is that all the features mentioned in my talk are now implemented (although all of them could do with some improvement). What’s this mean?

Well, not all that much. It means you can add &mindays=10 and &maxdays=20 to pkgreport.cgi urls to only see bugs files between 10 and 20 days ago, which lets you see things like the serious bugs filed in the last week. Bug dependencies are also implemented, with due thanks to Joey Hess. This means that by using commands like “block 1234 with 1235 1236” you can keep track of which bugs are blocking you from fixing Bug#1234. (I actually cheated a little in my talk: all the other features I mentioned had already been rolled out)

But hey, when you’re on a roll, why stop? There’ve been a few other things done recently that weren’t (entirely) pre-meditated in my talk too.

Of somewhat more limited interest is that bug indexing’s now happening again, so in theory when you look at the bugs for a particular package it’ll be a little faster because it doesn’t have to look through a list of every bug to work out which ones apply to the package first. Not sure how much this actually helps given how beefy the bugs.debian.org host is, but there seems to have been a noticable drop in the load average, so here’s hoping.

By contrast, most people who use the BTS will probably appreciate the newfound prettiness with which debbugs displays bug logs. All due kudos to the ever-stylish Erinn Clark for the CSS hacking and putting up with repeated “but, what if…?” bikeshedding. Next up, making the package indices look smashing too.

There are two other features I’ve been poking at, that aren’t quite so ready for prime time. One’s support for cookies, so that things like &reverse=yes can be set by BTS users in one place, and not have to show up on every URL. Likewise for the &repeatmerged=no option, and some other features that’re being thought about. At the moment you can set the cookies by, eg, adding &reverse=yes to bugs.debian.org/cgi-bin/cookies.cgi. This will obviously need to be cleaned up before it’s ready; there’re some internal implementation details with how cookies are handled that aren’t entirely satisfactory yet too.

The other forthcoming improvement is cleaning up the URLs debbugs presents, so that you can just say bugs.debian.org/package/dpkg instead of seeing cgi-bin and punctuation all over the place. You can currently try poking around urls like bugs.debian.org/x/300000 to see what that will end up looking like. Obviously once it’s done the x/ will disappear forever. That’s somewhat dependant on both better handling of cookies and some CSS (and possibly some javascript) so users can actually tell debbugs what they want to see by some mechanism other than adding more garbage to the URLs, so the progress on both those fronts is a good sign for this, too.

And, of course, with my old wishlist now complete, I’ve naturally come up with a new one. I wonder if it’ll take as long to get through.

Code Comments Hate My Freedom

Stewart and Michael have chimed in on whether comments in code are evil or not. Michael reiterates the industry wisdom:

Getting the level of commenting right is hard, especially if you haven’t written much code, or if you are unfamiliar with the domain or the implementation language. But commenting done right can greatly assist yourself and others when you revisit that chunk of code – whether that be to find that heisenbug, or to add new functionality, or even just understand what you were trying to achieve back 3 weeks ago.

I’ve written a reasonable amount of code, and I still find getting the “right” amount of commenting right pretty hard. I’ve even tried a few weird and whacky “commenting” styles, like programming from formal specifications where you specify what you want to achieve and then work from that to the actual code, and literate programming where your primary work is the commentary and the code is scattered about that. Admittedly, I only tried the formal specification stuff at uni, and only for non-real-world problems. In any case, the impression I got from the former was that programming from maths tends to be both harder, and similarly likely to result in mistakes (although hopefully at a more detectable level, of course), and from the latter that explaining things in english doesn’t tend to be that much easier than writing the code (so doing both is twice as much work) and probably causes as much hassle in maintenance as it solves.

On the other hand a benefit from both styles is that when you write code that way, and stick to the formula, you really do feel like you’ve done a good job when you’ve finished. It’s not a quick hack, it’s a professionally prepared piece of software. And beyond a feeling, with both techniques you end up with some assurance that you’ve actually been fairly thorough in dealing with possible gotchas.

In the end though, that’s not enough to overcome the flaws in both techniques, and generally I just end up trying to write clear code, and adding comments when I get confused (either while writing it, or when I come back to it later and find I have trouble remembering what’s going on, or worse, find myself thinking “this is buggy”, changing it, and finding out that while it wasn’t buggy before, it sure is now).

The main issue I have with comments, though, tends to be scoping. Literate programming in particular lets you restructure your code pretty arbitrarily to match your description — if you’ve got a few bits of code scattered through your program that need to be in sync (possibly in spite of good design practices, whatever), you can just put them next to each other along with some comments on what’s going on, and instruct the literate programming tool to put them in the right place later.

There are two other “commenting” variants I’m growing to like. The first is “revision history”. Thanks to the wonders of darcs I’ve become quite enamoured with making a new revision for pretty much every atomic change — what “atomic” means is perhaps hard to explain, but pretty much when I’ve implemented something that compiles, and doesn’t break functionality, I record my changes with a brief comment, and move on. I find that really pretty satisfying, and I’m hopeful that it should also provide a good record for when I get completely confused about why I made some change, which is a good mix. The especially nice thing is that the comments are historical by nature, so it’s not really possible for them to get out of date.

The other variant I like is just writing down thoughts completely separate from the code, whether in blog form, on mailing lists, on a wiki or otherwise. That’s nice in that it helps you cover the complexities of the ideas and put them down in a form that helps you think about things, but that’s easy — since it doesn’t have to deal with all the complexities code does, nor does it have to follow any particular process like “design” usually does. I guess it’s not fair to say I’m increasingly liking that; so much as increasingly leaning towards trying to keep it around and accessible, with blogs and wikis. I’m not sure what will come of that.