There's been a lot of discussion, and alas, some bad feeling, I think, about trying to balance updates versus testing in Fedora.
I believe there are many areas where we can mitigate risk for the users of Fedora without imposing extra work on package maintainers.
I don't think "one size fits all" - I believe that one of the problems we face is that no package or update is alike, and that discussion tends to lump things together without recognizing those differences.
In the hope that it's helpful, I've tried to gather some of the variables that I think are meaningful in the context of "how likely is it that a proposed update might break something?" (and there's some of my opinion in here too)
Built-in test suiteThere's great variability here between different src rpms:
- Does the upstream code have a test suite?
- Do we run it during %check ?
- If something started failing, do we actually know about it? (e.g. does an error here kill the build? is there a list of known good/known bad tests?)
I think that a package that's passed hundreds of selftests during the build and not failed any is doing better than one that has no built-in test suite, and should be in some way privileged in our update system. (It's possible to auto-detect the presence of a %check section as well)
External test suiteHow much testing does the package get via autoqa? How much testing has this proposed update had via an automated system?
Manual testingYes, having a human actually try the software is often good, and finds different types of problem to those that can be found via automated testing. Having said that, I feel we have far too little automated testing in Fedora today, and that the current way we do manual testing has flaws: people test for the bugs they wanted to see fixed, and report whether they are fixed or not. From a formal coverage perspective, we've no idea if they're hitting the important use-cases for other users of the package. But presumably we do hit a lot of coverage with the current approach.
Can multiple versions be installed?The kernel gets to install multiple copies of itself, and this is the ultimate escape hatch for when a kernel update hoses your system: you still have a known-good version installed (I hope) and get to reboot with that.
To what extent are other packages allowed to do that? (or would the maintainers want to?) Would extending it be a way to mitigate risk on those packages that want to rebase more often?
Meaningful test coverageWith my former professional QA hat on I think the ideal for software testing is:
- The "functional" side: to have a set of personas describing who will be using the software, what they will be using it to do, and to use that to come up with a set of test cases that cover that functionality
- The "non-functional" side: to know about the types of flaw expected based on the technology (e.g. I expect buffer overruns in C), and to use this to come up with appropriate test cases to prevent these
This should give an idea of what test coverage is appropriate, and you can _then_ think about automating them.
So I think that a package that has some test cases on the wiki is "in better shape" for doing updates than one that doesn't, and I hope that's a lightweight way of guiding testing. I hope there's a way of streamlining this within our processes so that we do smarter testing without needing extra work for package maintainers. (I don't expect anyone wants to adopt IEEE 829 in Fedora QA; see p133-136 of
"Lessons Learned in Software Testing", Kaner et al (2002) for excellent arguments for
not using it; a great book, BTW).
Lines of code overallSome packages are small, some are huge. I did some stats on this for RHEL when I worked on RHEL QA, using "sloccount". I believe the largest by SLOC was openoffice, closely followed by the kernel (in the millions of SLOC), then a big dropoff to the 100k SLOC packages, then a long tail.
Amount of code touchedWhat is the build-time difference between old and new versions of the src.rpm? This isn't the whole story (a one-line bug can still kill you), but it's part of the story. A rebase might contain a fix for bugs you care about, but might also touch 50 other subsystems.
Amount of testing elsewhereOne advantage of a rebase is that you are sharing source code changes with other people, and so if there is a problem, someone else might have already run into it. This isn't a panacea: yes, there are plenty of ways in which we can have Fedora-specific bugs, but it is one difference between a tarball rebase versus cherry-picking patches.
(random thought: could Bodhi have integration with other distributions update systems and warn us about analogous updates that are breaking other people? or is Fedora always the first into the minefield, finding the bugs for other distributions?)
Noarch vs architecture independentThe former are typically much simpler than the latter. The latter has specific risks (e.g. word-size assumptions). To what extent can we mitigate these risks with automated testing?
Programming LanguageEach programming language exhibits its own sets of emergent behavior. For example (and this is grossly oversimplifying):
- C code tends to exhibit buffer-overflow bugs, poor unit testing, poor error-handling
- C++ code can be more prone to compiler/static linker/dynamic linkage bugs than C code
etc. I don't want to populate this list too much as this kind of thing is prone to unhelpful programming language flamewars.
Problems inherent to packagingEach software delivery system exhibits its own set of flaws, and our RPM/yum is no exception. To what extent does, say, rpmlint cover the types of thing that go wrong, and to what extent can we extend it to help us?
Build systemAlthough the Fedora packaging guidelines manage to impose some sanity on this, there are many ways in which packages get configured and built.
Some examples:
- the GNU autotools: configure.in, Makefile.am, leading to a "configure" used during the build to generate a Makefile. This can be prone to "silently" dropping functionality when the buildroot changes. It's sometimes possible to detect such breakage by looking at the "Requires" metadata of the built packages (can we automate this in Bodhi?)
- hand-written one-of-a-kind Makefile written by upstream. Given that each is unique, each will have unique problems
- python setup.py, using distutils/setuptools.
- cmake
etc
Security fixesSecurity fixes probably should be treated differently from non-security fixes: many people expect that the former should happen as a matter or course, that if someone has distributed software, they should also promptly distribute security fixes. This seems to be regarded as some kind of natural entitlement within software in a way that other kinds of update aren't, and so our update process probably should reflect this special quality ascribed to security flaws (I suspect I'm getting grumpy and middle-aged in my attitudes here, sorry)
Critical versus Speciality packagesIs this a package that needs to work for an essential Fedora functionality to work, or is it more of a "leaf" within the dependency graph. For example, if this package breaks, could it prevent a user from running yum, or running a graphical browser to search for information on the bug?
I like our "critical path" approach: some packages are definitely more critical than others. The exact details might need tuning, of course.
Paid versus VolunteerI'm in the very fortunate position that I'm being paid to work on Fedora, and thus I'm professionally responsible for doing some tasks that aren't fun. Others volunteer their time and effort on Fedora, and I think it's important that their time should be fun, or, at least satisfying for some of the higher levels of Maslow's hierarchy of needs. (I happen to enjoy most of what I do in Fedora, and I do spend evenings and weekends on it too).
I hope this is a constructive addition to the debate. What other variability is meaningful in the context of "candidate updates"? I probably missed some.