Skip to main content

Software reliability

Posted by cos on March 29, 2007 at 4:53 PM PDT
Seems like the process of bring Java under an open source license has
raised the question of Java platform quality even higher. In
particular the question of reliability has been discussed more and
more widely among my peers over last few months. Thus, I decided to
share a couple of thoughts on the topic. Hopefully, you'll like what
you'll about to see.

What realibility means for us.

According to IEEE definition, reliability is "The ability of a system
or component to perform its required functions under stated conditions
for a specified period of time." [1]

First of all, I'd like to emphasize words "required functions",
"stated conditions", and "specified period". Also I want to add
"repetitively". I believe later on it will be clear, why I've focused
on those.

The majority of software reliability studies are paying a good deal of
attention to the amount of time a system or component can perform
without a failure. Most of it is dealing with different kinds of
fault/time distributions, estimations of failure intensity, failure
likelihood probabilities, failure intervals, and such. I beg your
pardon for rather long citation: "...There is a lot of lore about
system testing, but it all boils down to guesswork. That is, it is
guesswork unless you can structure the problem and perform the testing
so that you can apply mathematical statistics.  If you can do this,
you can say some- thing like "No, we cannot be absolutely certain that
the software will never fail, but relative to a theoretically sound
and experimentally validated statistical model, we have done
sufficient testing to say with 95-percent confidence that the
probability of 1,OOO CPU hours of failure-free operation in a
probabilistically defined environment is at least 0.995."  When you do
this, you are applying software-reliability measurement." [2]

With no attempt to underestimate or undermine such studies ([3]), and
being in the agreement with absolute necessity of statistical modeling
and verification of the software testing, I want to talk about a wider
approach. It isn't perhaps a brand new one, but might be slightly
different from what you've seeing so far.

Different takes on quality.

I love to talk about quality, mostly because this is very vast topic
and one can sell some nonsense :-)

As I see this, there are two main quality approaches. I'll call them
hardware and software types. The main differences between those are
coming from the production cycle specifics of devices and
applications. Namely these are:
  - hardware production has much higher costs because of complex
    factory processes, complicated and costly equipment involved, et
    cetera. Thus, you'd better be careful with how a device's
    components are designed, produced, assembled together, and
    tested. It might cost a fortune to make changes in a silicon chip,
    a motherboard design, or a car once it's out

    Eventually, hardware development is addressed with more "respect"
    and precise planning because of high up front investment.

  - software, on the other hand, usually has more flexible life span,
    the targets are sometimes easily moved along the development
    process, requirement are changed, design documents might be
    somehow informal, specs changes might not be well tracked down to
    the real application defects, quality process gets stuck behind,
    and on, and on... At the end of the day a software application
    reaches its customers and they start finding bugs in it. Then an
    escalation is being arisen. And the product's sustaining team has
    to spend time to mirror customer's setup, repeat all the steps to
    reproduce a defect, etc. And consider yourself lucky if all of
    this can be done just in one interaction. However, if defect
    report wasn't detailed enough or the setup was a way too
    sophisticated you might spend months to nail down a particular
    problem. We all saw this many times, right?

Our take on the problem represents a mix between those two above as
follows. I wanted to bring the best parts of the hardware reliability
and bring it over to the software one wherever possible. Here's what I
see as necessary steps:
  1) Design and architectural reviews (many teams are doing this
  1a) Tracking correlations between architecture decisions, changes and
     discovered defects
  2) Mean-Time-To-Failure (MTTF) testing. A quality department can run
     some preferably standardized applications for a prolonged period
     of time to demonstrate the stability of the software
     platform. However a better approach would be to run scenario
     based MMTF tests.
  3) Employing statistical analysis of quality trends
  4) Enforcing static analysis valuations on the periodic basis
  5a) Scenario based MMTF testing. Normally, one can gather a few (may
     be a hundred or so) typical usage scenarios for a software
     application. The number likely to be much higher for a software
     platform like Java. These scenarios might be simulated or
     replicated with a test harness of choice and a specific set of
     existing or newly developed tests. Of course, you might not be
     able to simulate any of these real-life scenarios with 100%
     accuracy, but it's not always necessary. These scenarios then
     should be executed repeatedly and their pass/fail rate has to be
     tracked over time.
  5b) Scenarios completeness. Using a list of features, utilized
     during a scenario execution, and static analysis results one can
     tell which parts of a software application will be touched during
     a particular scenario's run. Using code coverage methods you can
     findout which parts of the scenario's functionality are covered
     or not. With something similar to BSP
     you can leverage efforts of the improvements, but this is another
     story and it's been covered already.
  6) Quality trends monitoring. The proceedings of #5a should be
     included here.

When I'm communicating these steps to my peers and colleagues I'm
hearing a number of concerns. Typically, these are:
  - how #2 is connected to the reliability
  - #4 seems to be an over stretch
  - how you can be sure that #5a is the same as running heavy weight
    applications to verity your platform stability/reliability

Hopefully, I'll be able to answer these or other questions, you might
send to me as your comments.

  1) Why design and architectural review? Long story short, you can
     keep bad solutions away from your system. Proven practices
     usually guarantee lesser amount of last minutes changes at the
     development stage. Thus, the testing burden will be lower, as
     well as amount of regressions, customer escalation, etc. 
     What about 1a)? I don't know - it just sounds cool, I guess :-)

  2) Everybody seems to be doing this, so why don't we..? Seriously,
     this one of the reliability's aspects you want to count, because
     it backed up by well developed theory and years of practice, and
     this one is meaningful quantitative metric.

  3) Not sure why? Just read some of those books, will ya? ;-)

  4) Static analysis is capable of finding types of defects, which
     aren't likely to be discovered in the runtime. That happens
     because for complex systems you can't guarantee a coverage of
     Cartesian product of input and output states sets. However, some
     of the nasty bugs are tend to be hiding right in those dusty
     corners, which you or one of your customers only might hit once
     in a while. Thus, if you are running a designated static
     analyzers cleanly on every build of yours, you at least can
     demonstrate, that it doesn't leak memory or running out of file
     handles. Consider that reliable also means trustworthy.

     One might say, that you can track memory leaks with runtime
     monitoring. True. But how you'll going to find and fix them now?

  5a) Is giving you the determinism of testing repetitiveness which is
      likely to be missed with BigApps approach, discussed later.
      5b) is complimentary to this one.

  6) You want to know if your development/quality processes are
     convergent, right?

And finally I'd like to mention several common reliability
approaches. Also I'm going to explain why I see these as

1) Stress testing
   This one is most oftenly being messed up with reliability
   concept. The reason for this is perhaps clear, because one might
   expect from a reliable system to work in a wide variety of
   conditions and perform its functions well.

   You can hear a word on the streets, that '...Microsoft Windows is
   unreliable." Hell, yes. It sure isn't if you'll try to debug a huge
   C++ project, process some statistical data, get a bunch of spam
   emails, and install 20+ security fixes from their update center at
   the same time. It will likely to crash and destroy some of your
   files, or it might hung nicely. Or you'll suffer some critical
   performance degradations. I can't tell for sure, 'cause I'm not one
   of those lucky Windows users. And I'm not trying to make fun of
   Windows - people are doing so with their computers on a daily basis
   much better than I can even dream of :-) My point is to tell, that
   the scenario above is a bit extreme and well beyond an average
   Windows' users capabilities or, perhaps, desire.

   However, normally your Visual C++ project debug session will go
   smoothly in probably 95% of cases (however, I once was doing some
   C# project, which crashed my development machine to BSoD on every
   load. But an after crash attempt to load it again was always a
   success. Weirdo...). Did you ever count how many times your email
   client worked well when you were sending your emails? Perhaps not,
   but I'm sure almost everyone has a story to tell about so badly
   corrupted was address book last time the Outlook crashed, right?

   Correctly data series processing and features usage information
   gathering can relatively easy demonstrate that the Outlook is
   reliable application. It has, say, 93.5% failure free behavior over
   every 10 hours of execution. But it hard to guarantee that this
   application will survive under some monstrous load conditions.

2) BigApps testing
   The concept of BigApps testing consists of running some bulky
   commercial applications to derive MTTF for. usually, a software
   platform. Well, I see three fold problem here (I'm sure there more
   of these, but I'll let you to deduce them on your own:)
      1. Any BigApp run is as good as the typical utilization (or
usage scenario) of features of your platform by that
      2. The correctness the exercised application itself might be
      3. Results you'll see at the completion of a run should be
considered accountable to that particular application. If you
were running a PeopleSoft system for a week and have
demonstrated a MTTF=140 hours that is great... For PeopleSoft
marketing and PR team, but not that useful for your
development organization. That gives them a little of handy
info. Although, if a crash will occur the engineering team
can discover some really bad problem in the code and fix
it. Which is the rare case of non-zero sum game!

It might be cool marketing or sales tool to use on customers,
but it might be not as great for engineers.

I hope this article helped to scratch some surface of this
problem. Please let me know what do you think, pinpoint the lacks of
logic, or just yell at me if you think I'm wrong. Let's communicate
about this. May be we'll work something out that we all can use later
for the applications and products we develop for life or for an


[1] IEEE 982.2-1988 standard. Has been withdrawn in 2002
[2] John D. Musa. A. Frank Ackerman."Quantifying Software Validation:
    When to Stop Testing?"

The post is also posted at at my permanent blog spot

Related Topics >>


We are planning to set up a formal Software

We are planning to set up a formal Software Reliability function in our organization and I like you information. Actually I have an enterprise software company providing automated source code analysis software products that automate security vulnerability and quality risk assessment, remediation, measurement for C, C++ and Java software and java static analysis.