Thursday, January 17, 2008

Stories about Quality. 1. Who owns it?

A while ago I was asked to post my thoughts about quality. I mulled over this request for quite a while, and realized that I did not spend enough time on the subject to produce anything really deep.

However, I was working in this industry for almost 20 years now (next year will be exactly 20 years since I have made appreciable amount of money on software, writing a piece of code for an artillery guidance system in 8080 assembly, but that's quite another story).

Not surprisingly, I have accumulated quite a few stories that could help someone maybe learn something from somebody else's experience, which is always the best way to learn. So I will be posting them here, periodically, as they come to mind.

And so it goes...

My first real experience interacting with testers was at Microsoft. The company I was working before did have a test team, but they were all black-box testers. The company produced CAD software, so the testers were part former drafters, part computer enthusiasts, but not engineers. They knew something about the product, but there was no process, no test plans, nor any accounting whatsoever. All testing was done ad-hoc by just banging on the keyboard. And they clearly could not be considered to be owners of the quality of the product.

On my first day at Microsoft, I was presented the book "Writing Solid Code" by Steve Maguire. Steve was a tester during a tumultuous period at Microsoft around late '80s, when the company was mired in bugs, and it looked like every new fix introduces two more regressions. But he also was a developer. And judging from the book, a very good one!

Unlike my previous company, Microsoft employed real devs to do testing. This was a big change.

In fact, the mantra at Microsoft was
(1) SDETs (software developers in test) should be as qualified as developers as SDEs (software design engineers)
(2) Testing is an equal participant in product development, it has the same competency and career ladder, and testers on level X are paid the same as devs on the same level.
(3) As a discipline, testers own equal share of responsibility for a product success. They do not report to developers, and are equals to developers in every respect.
(4) There should be 3 testers for every 2 developers.

This was 1998, the stock was dancing around $120, Microsoft was swimming in money and could hire the best of the best - so we could almost afford this in its full, unabridged by reality form!

The test manager in Windows CE, which was my first team at Microsoft, was Pat Copeland (he's now at Google as well, although in the Mountain View campus). He believed that the test team's job is TO PREVENT PRODUCT FROM SHIPPING. That is, the product ships when the test team can no longer prove that it is not shippable. Wow! Not only test team at Microsoft is a co-equal branch, they hold the keys to the ultimate goal of any software company - shipping the product.

This lasted for a few years, the test team was very strong, they kept the quality gates high, and I came to believe that this is how it really should be. To this day I think it was actually a good system, a system of perfect checks and balances. If there is an org that builds the product, there should be a different org to verify it - otherwise there are plenty of opportunities of the conflict of the interests.

Then Pat left for the greener pastures, and right about the same time Windows CE went through a reorganization. Windows Me just shipped, and was to be the last in Windows 9x code line, so they needed to do something with the massive number of people who were now free to do something else. I have no idea what happened with Windows PMs and devs, but we got all of their testers - lock, stock, and barrel.

Now, Windows CE at the time was strictly an arms vendor. Not to be confused with Pocket PC line which was handled by a different arm of the company, Windows CE org produced OS kernel (in the form of libraries to be linked together in various combinations) and dev tools, that could then be used, as a LEGO set, to produce more or less anything embedded. Pocket PC was one of our client orgs, so was Auto PC, Web TV, and a few other projects. The product was also available to 3rd party companies, of course.

The point is, we produced a dev kit, which had practically no UI (just a bunch of samples), and all testing was done at the API level. Which meant that all testers were SDETs.

Windows 9x was a consumer product. Almost all of its testing was UI-based, and so there was an enormous lab, and a lot of people who could bang on a keyboard, but could not program. The new test team was 3-4 times bigger than what we had before.

As it turns out, the new test manager also had a radically different approach to testing. He believed that every release cycle he is given a fixed time to test the product. He does his best to test whatever he can during this time, and then whatever is left, ships.

This was a major culture clash. Most of the original testers left, and the dev team was to deal with the change. Most of us took this as a disaster - nobody owned the quality of the product any more!

I remember having yelling matches with the new test manager in the hallways. But to no avail - the new PUM was also from Win9x, this was the life as they knew it, and that's how it was going to be. The shipping date was set at January 4th, and it shipped on the clock.

This was the worst release of Windows CE ever. Half of it didn't work, and the half that did was twice slower than the previous version. That was version 4.0, so 4.1 shipped in 6 months and was basically a gigantic service pack. But the quality was actually OK.

And then 4.2 shipped in another 6 months, and had more bug fixes and a new feature now and then, and it was very good - higher quality than any other release. Suddenly, the life was pretty good.

What happened was devs stopped thinking about the quality as strictly the test team's business. They realized that the product will work IFF they produce the code that has few bugs, and write they own tests (or samples) to ensure it. Using QA team as a safety net no longer worked, but people got it, and started working around it.

Of course, part of the test development team was still there - they were smaller, and they mostly handled the tasks that could not be handled by engineers - long-term testing, stress, certain kinds of integration testing, and automation.

So what did I learn from this experience?

  1. Quality is a feature. If nobody owns it, it won't ship.

  2. In a great team, an owner will emerge and fill the void if it exists.

  3. The sky may be falling, but in a well-functioning team things will work themselves out. "As if by magic..."

To be continued...


Anonymous said...

Apple shiped last firmware for iPhone with HUGE bugs. First time it dies compleatly on me and it frezes. Luckly there is secret handshake to do h/w reboot. I trucked these down as I belive to map location feature, even so I am not certan. I do not know who owns quality in Apple, but Jobs had to show this feature, so it shiped ;-)

John Spaith said...

This is good stuff. (I was Sergey's first minion way back in the day.) I was there for the Win98/ME merge to our group and was just out of college at the time and clueless as to this politics (uhm, I mean "quality discussion."). But a lot of the back story I never knew till today! Still being in the CE group, we're much better on quality and are more aggressive cutting features if that's what it takes. We're definitely date driven still.

Sergey - I need more gossip/back-story stuff about stuff I was too clueless to realize as I wait for MSN to tell me about Obama girl's next video :)


Alex Efros said...

Thanks, this was interesting reading. But when I asked you about quality I speak about something else - about so-called 'overquality'. Looks like my own feeling about quality many other people tend to call overquality.

Another well-known example of 'overquality' is DJB's software (qmail, djbdns, etc.). Here is cite about this which I like very much:

"Bloody DJB fanboys and their useless rants! You guys pop up every now and then on any programming-related mailing list or forum, spew forth about how nobody understands "secure" programming except for you and the celestial DJB and then vanish again because it takes you six months to code a secure way of opening a file." (c) anonymous :)

From this view, it's very interesting to hear what you think about 'enough' quality and overquality.

Sergey Solyanik said...

To Alex:

I have never been in the situation when the quality was "excessive". Every release that I have been a part of in my almost 20 years in the industry was punctuated by very painful discussion (and concessions) about the bugs we will have to ship with.

Alex Efros said...

To Sergey:

I'm not sure is this a good place for discussion, but I think a lot about quality and overquality in last years. In short, average quality of today's software is unacceptable IMO. This happens because for most software companies quality is just one of features, not critical and not even important in many cases. They can make money using buggy software (thanks to "AS IS")... so it's ok for them to ship software with a lot of known bugs.

Few people think different - they think software shouldn't be released with known bugs. Yeah, sometimes it's not clear is something a bug, or not... sometimes it's not clear how to fix some bug... sometimes it have sense to declare bug as feature :) and document it... some bugs can't be fixed at all because they are actually bugs in environment (OS, for example). But all other bugs must be fixed before release. As examples of such high-quality software I usually use OS Inferno, DJB's software, and software developed by few "DJB fanboys :)".

I believe it's possible to develop commercial software with "no known bugs", and I'm trying to do this (that's why I prefer to work as freelancer - I'm able to work only with customers who choose better quality than less time or money). I think key to defect-free software is simplicity. KISS. Yeah, it's very hard to "keep it simple", but possible (and OS Inferno is best known to me example!). And I sure it is not possible to have defect-free complex software (at least while software is developed by humans :)).

Most people don't think about this. They just work to make some money. These people call such things "overquality" or "perfectionism" - and they think this is sort of dirty name. But I know several musical bands which are surely perfectionists. I don't know how rich they are, but I surely prefer to listen their music than music created just to make some money (98% of music on TV/radio)... So, as user, I prefer things created by perfectionists; as developer, I prefer to develop software which works without bugs (and I able to earn more than enough to live working this way) - so I can't understand what's wrong with this and why nearly all people think it's acceptable to release buggy software.

MEC said...

I enjoy your writing. I'm not an engineer but I read your blog from start to finish.

Sergey Solyanik said...

To Alex:

Aha! I DO understand what you mean now! This definitely merits a separate post, and I'll try to do it today.