Given Enough Eyeballs, Not All Bugs Are Shallow

I have been asked to write a chapter for a book about the experiences of people involved in Open Source with the idea of “If I knew what I know today”. I asked if I could re-print my contribution here. I hope it is interesting for people concerned about Open Source testing.

Dogfooding Is Not Enough

I have been involved with Open Source since my early days at University, in Granada. There, with some friends, we founded the local Linux User Group and organized several activities to promote Free Software. But, since I left university, and until I started working at Canonical, my professional career had been in the proprietary software industry, first as a developer and after that as a tester.

When working in a proprietary software project, testing resources are very limited. A small testing team continues the work that developers started with unit testing, using their expertise to find as many bugs as possible, to release the product in good shape for end user consumption. In the free software world, however, everything changes.

When I was hired at Canonical, apart from fulfilling the dream of having a paid job in a free software project, I was amazed by the possibilities that testing a free software project brought. The development of the product happens in the open, and users can access to the software in the early stages, test it and file bugs as they encounter them. For a person passioned about testing, this is a new world with lots of new possibilities. I wanted to make the most of it.

As many people do, I thought that dogfooding, or using the software that you are aiming to release, was the most important testing activity that we could do in open source. But, if “given enough eyeballs all the bugs are shallow”, (one of the key lessons of Raymond’s “The Cathedral & The Bazaar”), and Ubuntu had millions of users, why very important bugs were still slipping into the release?

First thing that I found when I started working at Canonical was that the organized testing activities were very few or nonexistent. The only testing activities that were somehow organized were in the form of emails sent to a particular mailing list calling for testing a package in the development version of Ubuntu. I don’t believe that this can be considered a proper testing activity, but just another form of dogfooding. This kind of testing generates a lot of duplicated bugs, as a really easy to spot bug will be filed by hundreds of people. Unfortunately, the really hard to spot but potentially critical bug, if someone files it, is likely to remain unnoticed, due to the noise created by the other hundreds of bugs.

Looking better

Is this situation improving? Are we getting better at testing in FLOSS projects? Yes, I really believe so.

During the latest Ubuntu development cycles we have started several organized testing activities. The range of topics for these activities is wide, including areas like new desktop features, regression testing, X.org drivers testing or laptop hardware testing. The results of these activities are always tracked, and they prove to be really useful for developers, as they are able to know if the new features are working correctly, instead of guessing that they work correctly because of the absence of bugs.

Regarding tools that help testing, many improvements have been made:

  • Apport has contributed to increase the level of detail of the bugs reported against Ubuntu: crashers include all the debugging information and their duplicates are found and marked as such; people can report bugs based on symptoms, etc.
  • Launchpad, with its upstream connections, has allowed having a full view of the bugs, knowing that bugs happening in Ubuntu are usually bugs in the upstream projects, and allowing developers to know if the bugs are being solved there.
  • Firefox, with its Test Pilot extension and program, drives the testing without having to leave the browser. This is, I believe, a much better way to reach testers than a mailing list or an IRC channel.
  • The Ubuntu QA team is testing the desktop in an automated fashion and reporting results every week, allowing developers to have a very quick way to check that there have not been any major regressions during the development.

Although testing FLOSS projects is getting better, there is still a lot to be done.

Looking ahead

Testing is a skilled activity that requires lots of expertise, but in the FLOSS community is still seen as an activity that doesn’t require much effort. One of the reasons could be that the way we do testing is still very old fashioned and does not reflect the increase of complexity in the free software world in the last decade. How can it be possible that with the amount of innovation that we are generating in open source communities, testing is still done like it was in the 80s? Let’s face it, fixed testcases are boring and get easily outdated. How are we going to grow a testing community, who is supposed to find meaningful bugs if their main required activity is updating testcases?

But, how do we improve testing? Of course, we cannot completely get rid of testcases, but we need to change the way we create and maintain them. Our testers and users are intelligent, so, why creating step-by-step scripts? Those could easily get replaced by an automated testing tool. Instead of that, let’s just have a list of activities you perform with the application and some properties it should have, for example, “Shortcuts in the launcher can be rearranged” or “Starting up LibreOffice is fast”. Testers will figure out how to do it, and will create their testcases as they test.

But this is not enough, we need better tools to help testers know what to test, when and how. ¬†What about having an API to allow developers to send messages to testers about updates or new features that need testing? What about an application that tell us what part of our system needs testing based on testing activity? In the case of Ubuntu we have the data in Launchpad (we would need testing data as well, but at least we have bug data). If I want to start a testing session against a particular component I would love to have the areas that haven’t been tested yet and a list of the 5 bugs with more duplicates for that particular version, so I avoid filing those again. I would love to have all this information without leaving the same desktop that I am testing. This is something that Firefox has started with Test Pilot, although they are currently mainly gathering browser activity. Google is also doing some research in this area.

Communication between downstream and upstream and vice-versa also needs to get better. During the development of a distribution, many of the upstream versions are also under development, and they already have a list of known bugs. If I am a tester of Firefox through Ubuntu, I would love to have a list of known bugs as soon as the new package gets to the archive. This could be done by having an acknowledged syntax for release notes, that could then get easily parsed and bugs automatically filed and connected to the upstream bugs. Again, all of this information should be easily available to the tester, without leaving the desktop.

Testing, if done this way, would allow the tester to concentrate on the things that really matter and that make testing a skilled activity; concentrate on the hidden bugs that haven’t been found yet, on the special configurations and environments, on generating new ways to break the software. On having fun while testing.

Wrapping Up

From what I have seen in the latest three years, testing has improved a lot in Ubuntu and the rest of FLOSS projects that I am somehow involved with, but this is not enough. If we really want to increase the quality of open source software we need to start investing in testing and innovating the ways we do it, the same way we invest in development. We cannot test 21st century software, with 20th century testing techniques. We need to react. Open Source is good because is open source is not enough anymore. Open Source will be good because it is open source and has the best quality that we can offer.

 

One comment

  1. One thing that Fedora does is test days. Basically, we say “this tuesday is ‘XYZ feature’ test day”, and at leats one QA person and one of the developers responsible for the feature will sit the whole day in IRC in #fedora-qa.

    Volunteers will come, follow the testing instructions and discuss their issues in the IRC channel. This enables some ind of proactive tbug triage, the developers can help getting more informations about the issues, and sometimes will even be able to fix them in (almost) real time.

    As it’s a special day, the athmosphere is usually friendly and relaxed, so that people can enjoy those. Live images are made ahead of time (well, not always) and a detailed plan is formalized on a wiki page by the QA team.

    Last (but not least), the test days are open to every one, not only Fedora users. Since Fedora is usually slightly ahead of the other distributions in terms of newer versions of software, and since we have quite a lot of upstream developers among our ranks, it becomes very interesting for non-Fedora users to help in thos etest days, so that the software is fixed when it lands in their own distros.

    You can find more informations below, or just ask in #fedora-qa to our awesome QA team:
    http://fedoraproject.org/wiki/QA/Test_Days

    I think it would be great if Ubuntu could champion such test days, and not only for Ubuntu-specific projects. Given how close are our release schedules, it would even probably be possible to hold test days together, giving the events an even bigger impact (or not, perhaps the small differences between the distributions would result in a lot of time being lost, I don’t know).

    Anyway, hope you find that useful. :)

    (Note: I am not part of the Fedora QA team, I’m only a Fedora user/contributor)

Leave a Reply

Required fields are marked *.