Thursday, January 24, 2008

Stories about Quality. 2. Why does software ship with bugs?

The bugs are integral part of any software product. So far in my 20+ years playing with computers I am yet to encounter one software product that did not have them.

This seems to be a marked different from the rest of the world - even from as close to home as computer hardware. Your monitor may occasionally have "bugs", but it is an aberration rather than the norm. Yet rules are somehow different for the software. How come?

There are several roots to the problem.

The economics

We ship buggy code because we can. The reasons are two - first, the software industry is relatively young. There is still certain amount of forgiveness that can be expected from the users - as long as what it does is indispensible, and solves their essential problem, they are willing to cope with the lack of quality - as long as there is no alternative.

Second, software industry is not awfully competitive (which more or less guarantees that there is no viable alternative to a lot of software products). Unlike most of the physical goods, software has virtually no manufacturing and physical distribution costs - making and shipping 10 million copies is as easy as shipping 10 thousand. Which leads to natural development of monopolies and oligopolies (I am omitting a whole lot of discussion on how exactly this happens - there are volumes and volumes written on the subject).

Having little competition at all means that there is even less competition based on the quality of the product. So there is little incentive to improve it beyond the point where the product is usable.

The process

Most projects follow a relatively rigid model of product development: planning, followed by design, followed by development, followed by testing and stabilization.

The final dates are often inflexible, and even when they are, the initial stages of the project tend to expand to fill the extra time.

The result is, when it comes time to ship, the last stage - which happens to be stabilization - is cut short. Bang! The product ships with a bunch of buggy features.

In Windows Home Server we had modified it process to do it per feature. Every feature had its own design-implement-stabilize pass, and the developer did not get to working on the next feature before the previous was all done and stabilized.

This worked wonders on smaller features, but of course for every release there are a few features on the critical path - they are big enough to fill all the time alotted for the entire version. They also often define the product itself. For these features this model does not help at all - they still end up having their testing phase cut short.

The sheer magnitude of the problem

"The deparment's motto was, "Comprehending the infinity requires infinite time", from which they derived a curious result - why work?" - A & B Strugatsky, "Monday Starts on Saturday"

Vista has on the order of 50 million lines of code, which in turn depend on amazing number of variations of hardware on which it runs, and software that runs on it.

While running the QA for Windows Home Server in the beginning of the shipping cycle, I quickly learned (and my experience at Google where almost all tests are written by engineers corroborated it) that development of just the unit test code takes approximately twice the amount of resources as the development of the code under the test.

If you tack on the costs of all other programmatic testing (stress, long-term, environmental testing, integration testing, etc), that adds another 300% on top of it. So really, truly exhaustively testing an application programmatically costs at least 5 times as much as writing it in the first place.

This of course is in the cases where you can do it programmatically - the manual testing is cheaper upfront, but you have to repeat it over, and over, and over again, so it adds up quickly - probably to about the same grand total. Since the long-standing argument between the proponents of automation vs. manual testing stands unresolved, I suspect that there is no statistically proven cost difference between the two methods.

For the bigger projects like Windows this cost becomes truly monumental!

Why is testing is so expensive?

It's because of what is called a test matrix - a set of tests that need to be written and run, which is a Cartesian product of number of all potential code paths (code coverage) by all potential data inputs (data coverage) in all potential environments. Just the code paths problem is combinatorically divergent, and the input set is for all practical purposes infinite for all but the simplest applications.

Most test organizations survive by heroic efforts to establish equivalency relationships between the test cases which allow them to prune the test matrix. Then they prioritize what's left to fill the time that they have by the most important of the test cases left over from the equivalency pruning.

This guarantees that the common case is relatively bug free, but when you step out of the common usage scenario - welcome to the bug farm!


Anonymous said...

Bugs in software is for job security... Of course this is a feature! Testers love devs for that reason: -)

Anonymous said...

Imagine quality managment Yakuza way: one bug == one finger. After you use voice interface it will be you head or prooff of gender : you choose. That might improve quality

Alex Efros said...

Yeah, this is current situation... So what you think about it - is it OK for you? Personally for you, as developer and user? If not - do you've any ideas how to change this situation? :)

Sergey Solyanik said...

Well... it depends on application, doesn't it? I would really like that the software that's flying my plane has no (or very few) bugs. It is crucially important. To achieve it, people jump through enormous hoops, such as writing everything in Ada and having a lot of people developing very little code over a long time, but it's very reliable code.

On the other hand, I personally would not want to work for such a project (or even own the company that does it), because it is really, really boring.

On the other hand, take a commercially successful product such as Windows and iPhone. Both of them are buggy. They have just enough quality to do the work that people want them to do - most of the time. Would I want to own a company/manage the team that would own either as a product? Yes of course!

So I guess personally I am on the lax side of things - I like job to be fun, and producing code that has zero bugs is not - because it just takes too much time to do it. On the other hand, of course, I do not enjoy producing garbage either, and historically the code that I wrote was not branded as such...

Alex Efros said...

I like job to be fun too! But looks like 'fun' for me include doing my best to produce defect-free software. About 'too much time to do it' - one part of 'fun' for me is looking for ways to optimize this time... and this is VERY interesting research. :)

Anyway, I think you've answered my question about quality, thanks!

Anonymous said...

I want to share some insights about a recent survey results about software defects

1. Developers take 100 times less effort to find and fix a problem than one reported by a customer.
2. Half of software project work is wasted on unnecessary rework.
3. Twenty percent of the defects account for 80% of the rework.
4. Twenty percent of modules account for 80% of the defects and half the modules have no defects.
5. Ninety percent of the downtime comes from 10% of the defects.
6. Peer reviews catch 60% of the defects.
7. Directed reviews are 35% more effective than nondirected ones.
8. Discipline can reduce defects by 75%.
9. High-dependability modules cost twice as much to produce as low-dependability ones.
10. Half of all user programs contain nontrivial defects.

Software testing and analyzing source code using static tools will help developers to minimize the risks arising out of software defects. Companies like Symbian , Juniper networks ,Research in Motion(Blackberry),Cisco are using Coverity Prevent, a Static analysis code inspection tool for analyzing source code for fixing defects .
Coverity Prevent is also used by the Department of Homeland security to scan many open source projects. you can get more info at at