The Source for Java Technology Collaboration
User: Password:



Konstantin I. Boudnik's Blog

Testing Archives


Software reliability

Posted by cos on March 29, 2007 at 04:53 PM | Permalink | Comments (3)

Seems like the process of bring Java under an open source license has
raised the question of Java platform quality even higher. In
particular the question of reliability has been discussed more and
more widely among my peers over last few months. Thus, I decided to
share a couple of thoughts on the topic. Hopefully, you'll like what
you'll about to see.

What realibility means for us.

According to IEEE definition, reliability is "The ability of a system
or component to perform its required functions under stated conditions
for a specified period of time." [1]

First of all, I'd like to emphasize words "required functions",
"stated conditions", and "specified period". Also I want to add
"repetitively". I believe later on it will be clear, why I've focused
on those.

The majority of software reliability studies are paying a good deal of
attention to the amount of time a system or component can perform
without a failure. Most of it is dealing with different kinds of
fault/time distributions, estimations of failure intensity, failure
likelihood probabilities, failure intervals, and such. I beg your
pardon for rather long citation: "...There is a lot of lore about
system testing, but it all boils down to guesswork. That is, it is
guesswork unless you can structure the problem and perform the testing
so that you can apply mathematical statistics.  If you can do this,
you can say some- thing like "No, we cannot be absolutely certain that
the software will never fail, but relative to a theoretically sound
and experimentally validated statistical model, we have done
sufficient testing to say with 95-percent confidence that the
probability of 1,OOO CPU hours of failure-free operation in a
probabilistically defined environment is at least 0.995."  When you do
this, you are applying software-reliability measurement." [2]

With no attempt to underestimate or undermine such studies ([3]), and
being in the agreement with absolute necessity of statistical modeling
and verification of the software testing, I want to talk about a wider
approach. It isn't perhaps a brand new one, but might be slightly
different from what you've seeing so far.

Different takes on quality.

I love to talk about quality, mostly because this is very vast topic
and one can sell some nonsense :-)

As I see this, there are two main quality approaches. I'll call them
hardware and software types. The main differences between those are
coming from the production cycle specifics of devices and
applications. Namely these are:
  - hardware production has much higher costs because of complex
    factory processes, complicated and costly equipment involved, et
    cetera. Thus, you'd better be careful with how a device's
    components are designed, produced, assembled together, and
    tested. It might cost a fortune to make changes in a silicon chip,
    a motherboard design, or a car once it's out

    Eventually, hardware development is addressed with more "respect"
    and precise planning because of high up front investment.

  - software, on the other hand, usually has more flexible life span,
    the targets are sometimes easily moved along the development
    process, requirement are changed, design documents might be
    somehow informal, specs changes might not be well tracked down to
    the real application defects, quality process gets stuck behind,
    and on, and on... At the end of the day a software application
    reaches its customers and they start finding bugs in it. Then an
    escalation is being arisen. And the product's sustaining team has
    to spend time to mirror customer's setup, repeat all the steps to
    reproduce a defect, etc. And consider yourself lucky if all of
    this can be done just in one interaction. However, if defect
    report wasn't detailed enough or the setup was a way too
    sophisticated you might spend months to nail down a particular
    problem. We all saw this many times, right?

Our take on the problem represents a mix between those two above as
follows. I wanted to bring the best parts of the hardware reliability
and bring it over to the software one wherever possible. Here's what I
see as necessary steps:
  1) Design and architectural reviews (many teams are doing this
     already)
  1a) Tracking correlations between architecture decisions, changes and
     discovered defects
  2) Mean-Time-To-Failure (MTTF) testing. A quality department can run
     some preferably standardized applications for a prolonged period
     of time to demonstrate the stability of the software
     platform. However a better approach would be to run scenario
     based MMTF tests.
  3) Employing statistical analysis of quality trends
  4) Enforcing static analysis valuations on the periodic basis
  5a) Scenario based MMTF testing. Normally, one can gather a few (may
     be a hundred or so) typical usage scenarios for a software
     application. The number likely to be much higher for a software
     platform like Java. These scenarios might be simulated or
     replicated with a test harness of choice and a specific set of
     existing or newly developed tests. Of course, you might not be
     able to simulate any of these real-life scenarios with 100%
     accuracy, but it's not always necessary. These scenarios then
     should be executed repeatedly and their pass/fail rate has to be
     tracked over time.
  5b) Scenarios completeness. Using a list of features, utilized
     during a scenario execution, and static analysis results one can
     tell which parts of a software application will be touched during
     a particular scenario's run. Using code coverage methods you can
     findout which parts of the scenario's functionality are covered
     or not. With something similar to BSP
     http://weblogs.java.net/blog/cos/archive/2005/12/java_quality_me_6.html
     you can leverage efforts of the improvements, but this is another
     story and it's been covered already.
  6) Quality trends monitoring. The proceedings of #5a should be
     included here.

When I'm communicating these steps to my peers and colleagues I'm
hearing a number of concerns. Typically, these are:
  - how #2 is connected to the reliability
  - #4 seems to be an over stretch
  - how you can be sure that #5a is the same as running heavy weight
    applications to verity your platform stability/reliability

Hopefully, I'll be able to answer these or other questions, you might
send to me as your comments.

  1) Why design and architectural review? Long story short, you can
     keep bad solutions away from your system. Proven practices
     usually guarantee lesser amount of last minutes changes at the
     development stage. Thus, the testing burden will be lower, as
     well as amount of regressions, customer escalation, etc.  
     What about 1a)? I don't know - it just sounds cool, I guess :-)

  2) Everybody seems to be doing this, so why don't we..? Seriously,
     this one of the reliability's aspects you want to count, because
     it backed up by well developed theory and years of practice, and
     this one is meaningful quantitative metric.

  3) Not sure why? Just read some of those books, will ya? ;-)

  4) Static analysis is capable of finding types of defects, which
     aren't likely to be discovered in the runtime. That happens
     because for complex systems you can't guarantee a coverage of
     Cartesian product of input and output states sets. However, some
     of the nasty bugs are tend to be hiding right in those dusty
     corners, which you or one of your customers only might hit once
     in a while. Thus, if you are running a designated static
     analyzers cleanly on every build of yours, you at least can
     demonstrate, that it doesn't leak memory or running out of file
     handles. Consider that reliable also means trustworthy.

     One might say, that you can track memory leaks with runtime
     monitoring. True. But how you'll going to find and fix them now?

  5a) Is giving you the determinism of testing repetitiveness which is
      likely to be missed with BigApps approach, discussed later. 
      5b) is complimentary to this one.

  6) You want to know if your development/quality processes are
     convergent, right?

And finally I'd like to mention several common reliability
approaches. Also I'm going to explain why I see these as
misconceptions.

1) Stress testing
   This one is most oftenly being messed up with reliability
   concept. The reason for this is perhaps clear, because one might
   expect from a reliable system to work in a wide variety of
   conditions and perform its functions well. 

   You can hear a word on the streets, that '...Microsoft Windows is
   unreliable." Hell, yes. It sure isn't if you'll try to debug a huge
   C++ project, process some statistical data, get a bunch of spam
   emails, and install 20+ security fixes from their update center at
   the same time. It will likely to crash and destroy some of your
   files, or it might hung nicely. Or you'll suffer some critical
   performance degradations. I can't tell for sure, 'cause I'm not one
   of those lucky Windows users. And I'm not trying to make fun of
   Windows - people are doing so with their computers on a daily basis
   much better than I can even dream of :-) My point is to tell, that
   the scenario above is a bit extreme and well beyond an average
   Windows' users capabilities or, perhaps, desire.

   However, normally your Visual C++ project debug session will go
   smoothly in probably 95% of cases (however, I once was doing some
   C# project, which crashed my development machine to BSoD on every
   load. But an after crash attempt to load it again was always a
   success. Weirdo...). Did you ever count how many times your email
   client worked well when you were sending your emails? Perhaps not,
   but I'm sure almost everyone has a story to tell about so badly
   corrupted was address book last time the Outlook crashed, right?

   Correctly data series processing and features usage information
   gathering can relatively easy demonstrate that the Outlook is
   reliable application. It has, say, 93.5% failure free behavior over
   every 10 hours of execution. But it hard to guarantee that this
   application will survive under some monstrous load conditions. 

2) BigApps testing
   The concept of BigApps testing consists of running some bulky
   commercial applications to derive MTTF for. usually, a software
   platform. Well, I see three fold problem here (I'm sure there more
   of these, but I'll let you to deduce them on your own:)
      1. Any BigApp run is as good as the typical utilization (or
	 usage scenario) of features of your platform by that
	 application.
      2. The correctness the exercised application itself might be
         questionable
      3. Results you'll see at the completion of a run should be
	 considered accountable to that particular application. If you
	 were running a PeopleSoft system for a week and have
	 demonstrated a MTTF=140 hours that is great... For PeopleSoft
	 marketing and PR team, but not that useful for your
	 development organization. That gives them a little of handy
	 info. Although, if a crash will occur the engineering team
	 can discover some really bad problem in the code and fix
	 it. Which is the rare case of non-zero sum game!

	 It might be cool marketing or sales tool to use on customers,
	 but it might be not as great for engineers.

I hope this article helped to scratch some surface of this
problem. Please let me know what do you think, pinpoint the lacks of
logic, or just yell at me if you think I'm wrong. Let's communicate
about this. May be we'll work something out that we all can use later
for the applications and products we develop for life or for an
enjoyment.

Cheers,
  Cos

--
[1] IEEE 982.2-1988 standard. Has been withdrawn in 2002
[2] John D. Musa. A. Frank Ackerman."Quantifying Software Validation:
    When to Stop Testing?"
[3] http://portal.acm.org/citation.cfm?id=22980&dl=#


Open quality metrics and processes

Posted by cos on September 18, 2006 at 04:44 PM | Permalink | Comments (1)

Despite the differences between business models, development processes, functional areas of the application, and the languages, these applications are written in, the quality approaches for them are quite similar and final goal is the same. The engineering team has to deliver quality application to the end users.

Let's retrospect how it gets addressed by different teams. What techniques and tools are used by a few of well adopted systems in both open and close source communities (to get this information together I was using different sources; mostly from available on the Web such as application's web sites, pod casts, etc.) Also, I have to apologize in advance as the information you'll see below isn't fully normalized and I'd to spent more time for the pretty formatting, but I guess it will illustrate my point. Also, the choice of application might seem a bit awkward for you, but I could've done it intentionally

Open SuSe Linux:

  • stress testing
  • reliability testing (similar to the above + some boundary testing)
  • scalability testing (behavior on NUMA, etc.)
  • Tools: home grown automation tools, bugzilla for defects tracking
  • source code is available through participation in OpenSuse community
  • Number of currently reported bugs: 1238 or so it appears
Mozilla:
  • has some smoke tests http://www.mozilla.org/quality/smoketests/
  • overall Mozilla has a number of QA teams (per functional area, actually) and usually they have some tests and testing specs being published, so contributors can participate. More information http://www.mozilla.org/quality/
  • Tools: mostly manual testing; obviously Bugzilla for defects tracking
  • Number of currently reported bugs (I've looken into Firefox only 9237)
LTP: Community quality project for POSIX UNIX's
  • a few commercial vendors are contributing (SuSe, IBM, etc.)
  • participants from all over the place (23+ by now)
  • Tools: lcov, gcov
  • Number of currently reported bugs: the web-site doesn't have much
  • information on it. Or it perhaps, my inability to find relevant information.
  • also you might want to check this atricle with more detail about Linux reliability testing http://www-128.ibm.com/developerworks/linux/library/l-rel/
JetBrains (Intellij IDEA):
despite the tool is commercial it seems to utilize quite smart model of providing the quality of the product. It is manyfold:
  • developers are using new features immediately; thus as soon as new feature is being developed it gets into work. So, the programmers are eating their own dog food and polish their product as it goes. Also, this allows to polish work flow which is quite noticeable as the IDEA's user interface is very intuitive and flawless
  • Intellij IDEA has a system of EAP (Early Access Program) which allow IDEA's enthusiasts to get new build every week and to start using fresh builds and find new bugs without delays. All developers are communicating with EAP's participants directly, what eliminates any communication hardies
  • as for pure QA, they simply didn't have any designated QA forces until a couple of weeks ago and all quality activities were covered by development
  • Tools: JUnit, however most of the tests aren't classic unit, but rather functional tests; home grown continuous integration system (http://www.jetbrains.com/teamcity); defects tracking is done through JIRA (information is gathered from an insider. Thanks to Max Shafirov for his help)
Sun JDK:
  • has a well developed set of test suites for both JDK and virtual machine
  • has designated QA team of 100+ people
  • Tools: home grown tools for test harnesses, task dispatching, results analysis
  • internal defects tracking with external gateway and Webbugs system, which accepts bugs submission along with some test cases being sent in from outside
  • Source code is available through dev.java.net portal. Bugs could be submitted externally, however code is under Sun's own source license
  • overall number of bugs submitted so far:
Kaffe:
  • free project with free time contributions from around the world. No designated QA forces. Most of the quality activities are done by developers themselves. However they have a few smart ways of proven their product to be stable and sound. Here are the main approaches:
    1. continuous integration: Tinderbox on regular basis
    2. sneaking into other's build to see if there's problems: http://buildd.debian.org/build.php?pkg=kaffe
    3. some in-house configuration scripts around HP's testdrive site (http://www.testdrive.hp.com/current.shtml). One can ftp his stuff in, login in and run something, than ftp the results back.
    4. Kaffe cross-compilations to CPUs supported by Debian, which helps to weed out the odd breakage every now and then, and in general is a good test for intrusive patches, as they tend to trip up a lot of compiler warnings on non-mainstream platforms
    5. There is also an archive of weekly snapshots created by Kiyo Inaba at ftp://ricohgwy.ricoh.co.jp/pub/Lang/Java/Kaffe/ , so he regularly notices and catches any breakage that occurs on NetBSD.
  • on the side of industrial type of testing, Kaffe runs regression testing and Mauve test suite. Also EMMA based coverage is enforced for Mauve (http://builder.classpath.org/~cpdev/coverage/)
  • Kaffe folks mentioned that they would love to use some of Sun's test suites, and I didn't have any comments at the time of our email conversation
  • Dalibor.Topic made quite valuable observation in his email (cited below) " Besides changes to the core VM, it involves picking the right projects to cooperate with, and driving public attention towards them, so that all boats are lifted, rather then everyone having to develop their own set of insular functionality. So, we are in touch with distributions, and in close touch with various projects we use code from, and participate in their (QA) efforts, as well, and encourage people to do the same. A lot of the QA work comes from the integrators in distribution using Kaffe to deliver packaged applications and libraries to their users."
  • the information above has been provided by robilad@kaffe.org and jim@kaffe.org. Thanks a bunch, folks!
Eclipse:
  • I'm expecting to add more information about Eclipse as soon as I'll get it from the team.
  • For now I was able to find 22,000+ open bugs filed against the all components sub-projects
As we can see from above, all these team are mostly using quite traditional approaches to provide a quality of their software systems and applications. I would call these traditional ways of quality delivery extensive or brutal force, as they are involving more tests to be written and executed and/or more developer's eye-rolling.

However, technologies such BSP () and some other concepts of impact analysis () allows to sharply increase ROI of a typical quality organization. Implementations of the impact analysis might differ, but the general idea might be well expressed as following a couple of rules of thumb:

  1. find pieces of source code/modules/subsystems critical for your application
  2. focus your quality efforts on those
There is yet another way to somehow guarantee the quality of an application. You can do constant static analysis of your code, which will give you some sense of security too. I'll be writing about different static analyzers next time. Stay tuned to see info about Coverity, Klocwork, FindBugs and some others. There was an article recently in one of online technology news sites. The article was comparing Klocwork vs. Coverity. Frankly speaking, I read it twice and didn't get a clue of what was the difference between those two :-)

Another lesson I learned from the above observation, is that despite of popularity of the platform quality processes might be done equally or less successful, especially in the world of open source software.

Anyways, it's being rather a long post. And I'd like to draw a conclusion now.

Sun JDK is getting to open source quite soon. Considering all examples above I could tell, that Sun JDK's quality process might be implemented in one of already known models. However, "already known" doesn't mean "perfect" or "sufficient".

Thus, I would love to hear your ideas, suggestions, comments, outbursts, etc. Please post them here or just send me an email to kboudnik@gmail.com

Java is getting open. What about Java quality?

Posted by cos on May 20, 2006 at 12:13 AM | Permalink | Comments (2)

Hello there!

Last week happened to be quite busy for many Java developers, activists, and supporters. JavaOne 2006 conference had a lot of interesting pods, booths, talks and other kinds of presentations. A leading development companies were bringing their innovations to share the knowledge and expertise in the field.

I won't reiterate the same things you perhaps heard already: you can find them here or custom one from our SPB team (in Russian)

Among other great things there was one which firstly hit a crowd of attendees of Netbeans conference (there were a few very interesting talks, especially on my lately favorite topic - Java ME development and tools; I'll talk about this more next time) and then it'd been brought to the wider audience of JavaOne Day One's keynotes. Right, I'm talking about bringing Java to open source. I think, that has been expected for a long time and I guess that this will bring some new blood into Java platform around the world.

Ok, it's all hunky dory and rosy then. But also it brings some concerns. Java platform is big and complicated application. It has core things, like VM, libraries, a platform depending code, backward compatibility promises, etc. Besides, Java community has to make sure, that the quality of the platform won't suffer from this move.

I'm a strong believer of the following approach: if a standardized technology is getting available for a community, then the testing methodology has to accompany it as well. The only requirement ought to be attached: test suites should pass. This will help to keep proper level of compatibility, avoid unnecessary branching, and let participants to keep better grip over the development process. Doesn't sound too open to you? Great, share your thoughts with us.

However, any testing methodology assumes some frameworks to execute the tests. E.g. Junit is required for unit tests; JCK suite has to be passed under JavaTest for a JDK's certification, etc. So, the same is true in our case: we need to supply the testing frameworks.

It sounds like Sun has to bring some chunks of the quality infrastructure along. Does anyone get surprised, that I'm talking about Java quality again :-)? Back then I'd mentioned some choices of the testing tools. David Herron'd added more details on that topic. As the major contributor to the DTF for the last few years, I want to see the project moving ahead and not getting accidently abandoned. This scheduling framework can save a lot of valuable engineering time (thus money) for any java development/testing team.

Another great tool, which will be nicely going along with DTF is our internal test harness called Tonga. DTF and Tonga are getting together very well. In fact, they were designed and implemented with quite an awareness of each other. This couple makes a great yet very flexible and efficient testing infrastructure.

Obviously, we have a lot of other interesting in-house solutions, which, I'm sure, would be a real value-add for the community. For instance, our Java coverage framework, creatively called jcov, test results processing and reporting tools, et cetera.

I'm not a lawyer or any of the high-flying Java execs, so don't take my word to a court :-) However, I'd be pushing for this tools open sourcing approach enthusiastically and will keep you posted on the progress.

Take care,
Cos

Java APIs comparison

Posted by cos on March 28, 2006 at 09:57 PM | Permalink | Comments (9)

Hello folks

We recently thought about an interesting approach of finding a difference between APIs (public methods) of two versions of a Java software. I was doing it for JDK1.5 (Tiger) and JDK1.6 (Mustang b76).

Sometime you might want to take a quick look and find out what exactly has been changed in your (or somebody's else code) between two versions. Why? Well, if you want to focus your test development on the new API only it might be a very sound idea. I'd spoke about some of other methods to help you to achieve better software quality
Well, there are some methods to do so. E.g. you can use "Since" tags from a Javadoc Well, good if you have any...

Or you can store your product API's snapshot like RefactorIt does. Boy, what if you didn't create this snapshot in a first place and now you badly need one? "Sorry partner - no can do for ya..."

Anyways, the idea was on the surface as usual:
In one way or another build a list of all public (protected/private/static/etc.) methods in your software. You can use grep or javap along with grep for these purposes. Don't forget to do this for both versions of your application - it's about the difference, right? :-)

Now, you have these API lists, you just need to compare them. And again it's up to you: you can use Unix diff command, Emacs ediff, or vimdiff. I'd wrote that self-explanatory Perl script (feel free to modify it, criticize it, or merely throw it away). Fix &open_file()'s loop if your API lists weren't created in the form of "filename:method signature" per line...

#!/usr/bin/perl

if ($#ARGV != 1) {
    print "Please supply two lists of APIs you want to compare\n";
    exit 1;
}

my %diff = ();
my $left = &open_file ($ARGV[0]);  #Presumably, the earlier version
my $right = &open_file ($ARGV[1]); #Presumably, the latest version

foreach $file (keys %$right) {
    if (!exists $left->{$file}) {
	$diff{$file} =  \@{ $right->{$file} } ;
    } else {
	@l_stuff = @{ $left->{$file} };
	@r_stuff = @{ $right->{$file} };
	foreach $r ( @r_stuff )  {
	    if (&is_there($r, @l_stuff) != 1) {
		push @{ $diff{$file}}, $r;
	    }
	}
    }
}

foreach $d (keys %diff) {
    @extract = @{ $diff{$d} };
    print "$d\n   @extract\n";
}

#=========== Some subs
sub is_there () {
    my ($search, @list) = @_;

    for (@list) {
	return 1 if $_ eq $search;
    }
    return 0;
} 

sub open_file () {
    local ($name) = @_;
    open FILE, $name or
        die "Can't open specified file!$name\n";
    my %ret = ();
    
    while () {
        ($key, $value) = split(':', $_);
        push @{ $ret{$key} }, $value;
    }
    return \%ret;
}

Of course, there's always an expert opinion. Or you can ask an author of an application, or eye-roll it yourself. But isn't that much simpler to load your computer with the stuff it can do better then a human being?

So, I'll let you to carry on from here...

CU,
Cos

Java. Quality. Metrics (part 6)

Posted by cos on March 15, 2006 at 05:20 PM | Permalink | Comments (0)

Hi there.

In this short article I'll try to summarize what I was discussing for the last couple of months.

So, let's briefly list key factors that are likely to affect our judgment of software quality. - our code quality expectations (good enough quality, remember?) - coverage isn't everything - code complexity and a frequency of the changes - number of bugs filed against source code modules/files - testing methodologies

Alas, the last one doesn't sound like a beast, it might reduce the effectiveness of defects discovery rate a lot. Obviously, it is a choice of approaches of test failures analysis. The bulky one with a weak algorithm of false positives detection pisses off engineers and they begin ignoring most possibly important warnings and notifications.

Anyways, I want to talk about a combination of the first three bullets above.

About a year ago, a few of Sun's fellas were chatting about simpler ways of delivering a better code. Static analysis and variety of testing approaches were among the things on the table. At some point, the bright idea of mixing both of those and adding some other flavors had appeared. Afterwards, we came up with what was called Buggy Spots Prediction (BSP).

The idea itself is as follows:
  • we're creating a static call graph (CFG) for any given source code, using a commercial or home-grown tools
  • having this, we can calculate a few things about this graph, e.g. the frequency of calls to any particular method; the frequency of calls from a method to other methods with-in the code; basic-block based complexity of a code, etc. (currently, we calculate about five or seven of them, i.e. coverage per function, basic block per function, et cetera)
  • when executing tests against the instrumented build, we can prepare a code coverage metric for it
  • combine these two lists of modules - from CFG and from code coverage runs - by module names
  • sort the resulting list ascending by in-call frequency and descending by coverage scores
  • let's assume that most frequently called functions are, perhaps, most important from the quality standpoint. Well, their code is called more often, so any problems will immediately affect a top-level or at least quite important functionality.
  • if such methods are having low coverage numbers and high complexity or high number of reported bugs, then it might be a good indication that the code has to be targeted by quality engineers and/or developers.
So, all that is giving you a way of quickly selecting possibly buggy spots, e.g. the pieces of the source code which are likely to become a root of defects found by your customers. Why? Well, simply because of the fact that the coverage is low in these areas and existing tests aren't guaranteed an acceptable level of quality.

As any heuristic approach, this one might produce incorrect results. However, our preliminary predictions are quite coherent to the fact that most of externally reported defects were found in the poorly covered but frequently called methods.

In organization with limited QE resource, a manager might want to firstly address such hot spots. This will help achieve a good-enough quality level and then concentrate on less important issues.

Yet another benefit is that the technique is a language independent. Once you'll build a universal presentation for CFG and code coverage information, you can use the same engine to measure Java, C++, and programs written in other languages.

And, of course, our methodology doesn't replace a human knowledge of the importance of product features. It helps engineers see a valuable projection of static-to-runtime boundaries and helps focus on some aspects of that complex matter.

And I just want to remind to you about Project Mustang (Java6) Regressions Challenge. Please check here for more details.

CU,
Cos

Java. Quality. Metrics (part 5)

Posted by cos on December 23, 2005 at 05:55 PM | Permalink | Comments (4)

Hi there again.

Getting back to my favorite topic about quality of life... I meant that pseudo-life, we all are trying to make. And if there's Something, what had once created all of us and everything else around - it did a way better job :-( But, I think, we have deserved a credit too: we don't have all the time in the eternity to finish our job by trial and errors approach. An average man leaves about 70 years; say 30 years of which she spent in diapers, at school or on medications in front of TV, purchased on retirement money. Which left us with roughly a 40 years term, split between personal and professional life (change the order if you want to :-)

All my professional life is literally coupled with computers (I have six of them at home, including an old Sun's SparcClassic LX and not including my Palm Pilot). And as more and more software is coming into our lives, I'm wondering if I can rely on it and where are the limits of this trust. Also, as a participant of Java software development cycle I want to do better job myself and help my colleagues to do same: namely spend less time to develop more robust software.

This brought me to the point of quality indicators definition. What will be the right criteria to determine if a piece of software will break or not in customers hands. And again, where is the right target to aim our testing efforts and wether or not they are efficient?

After some consideration we humbly came up with the following list:
  • code complexity? Yes, this is the beast. If you have methods as long as 3+ screens it's bad. If you class hierarchy contains 5+ levels – it's not good either. Does it smell right, if you have 300+ lines of code/class?
  • a frequent code changes? Or even changes per feature or bug? Yeah, perhaps. If you do have about 5 commits per bugfix it doesn't sound right at all. However, I can imagine some other considerations.
  • poor code coverage? Yes, one of the kind. But even coverage numbers can't save you from troubles. It isn't a panacea.
  • design and coding styles? Well, may be. Stylish code is easier to read, understand, but not necessarily to maintain. And if coding styles are easy to check, design style's flaws might not be that easy to find and eliminate. Patterns are helping here in particular, but you have to know where is the fine line between good design solution and simple yet sufficient implementation
  • code duplication? Not in a sense of reuse, but in a way of overuse of copy-n-paste "technique". Yes, ugly enough. As one of my readers had appointed - if one had introduced a bug in a code, which was copied a few time later then one will end up with a multiple copies of the same bug hidden all over the place :-( There's a number of automated ways to find and fix these rotten spots. One of them named PMD was mentioned by another reader of the article and can be found here
  • Thanks to all of you sending comments or expressing interest otherwise. If you can imagine or know more of these from your practice – please send them in and I'll gladly add here any reasonable ones.
Well, assuming that now you got a data for all from above and then some. What and how you'll deduce from here?

Let's talk about this after I get back from that nice winter break. How about open sourcing some ideas? Ok, I'll share (swear!) the principles of the technology we're working on for last few months.

Best of everything to all of you and Happy New Year to everybody. Also, Merry Christmas to those who believe in Santa Clause or whatever you can call that dude, who knew something about quality in a long term!

Cheers!
Cos

Preparing your tests

Posted by cos on December 19, 2005 at 12:54 AM | Permalink | Comments (0)

Hmm, I just realized, that it's been a while since my last posting here. So, hi there! And for a difference, this time it isn't about java quality :-)

Last month I was pretty much busy with our testbase build automation. It's not a secret though, that our server VM testbase has a lot of native test code. As David had mentioned recently we have quite a handful of platforms to take care of and it's getting even more troublesome when you need to carefully lay out a lot of native bits.

So, here some insider's information, and please don't tell any one where you got it: we need to build the native pieces for as much as eight platforms (I might've missed a one or two), including Solaris/Linux/Windows on AMD64, of course. Most of native build's pieces are getting built on a separate machines, e.g. you can't build Solaris for Sparc and Solaris for x86 on the same box, can ya? Well, technically speaking, you can do cross compilation in most of cases, but we aren't that limited with hardware resources. Any way, doing this kind of drill manually might be a boring, not cost-effective effort and one can definitely spend all this time for something more interesting

So, we decided to automate out build process. The whole thing was divided into two pretty isolated pieces: writing Ant managed build mechanism and setting up a tiny run-n-watch framework to control as many as 6-8 remote processes, spawned in a certain order. The Ant part has been taken care off by our genius engineers (sic! I mean it) in Saint-Petersburg office and I handled relatively simple execution part.

We do use two different distributed execution systems in our daily business: DTF and Grid engine. Latter takes care about Unix's and former handles Windows, because it's pure Java and doesn't care on what hardware it's running. As the result, I had to create two different sets of starters and watch-dogs to manage our build processes in a slightly different manner.

And I was kind of surprised, that I can write distributed watch dogs, using nothing but Bourn shell (yes, the matter of fact, you can actually program on it and create clear and maintainable code. Wow!). Of course, there's no inter-processes communication or multi-threading. I didn't use anything else more fancy than file-locking (I know, I know – I hate this myself). But it was sort of fun to do this. Overall, the watch dogs' system is quite simple:

- there's a single driver script, which knows everything about build logic and sequences; takes care about any clinches or time out issues
- it sends a set of smaller platform specific scripts to Grid'ed and DTF'ed machines
- sets a number of watch-dog monitors to trace build progress on every platform
- platform specific build drivers do an actually Ant build and are sending notification to the central Tinderbox-like monitor. This helps to ease visualized human control over the process.
- top level script is rolling in a big loop until for all process to finish
- upon the build is successfully completed, the main driver does all necessary adjustments for the fresh bits and run baseline testing against this new testbase. Voilà!

That was my pretty much first real experience of using Grid and it unveils some pit falls, perhaps (or it may be me knowing just too less about this system). For instance, you can't rely on relative paths in your application, because Grid copies a submitted shell script somewhere and executes it from there. Eventually, what was `dirname $0`/../logs/mylog when you were developing your script, will lead to nowhere.
Two other things: setting an 32- and 64-bits environment on Windows platform from a shell script. It took me a while to carefully write a shell equivalent of SetEnv.Cmd batch file, which a Visual Studio supplies.
And I had to find a way around the fact that Linux's automounter sometimes has troubles mounting NFS' auto-maps.

However, on a pleasant note: I had spent a good chunk of my professional time between 2001 and 2004 developing DTF and now it's paying back :-) by nicely doing some leg work for me.

So, yesterday I was able to build new Server VM testbase out of the box without much of my involvement (still had to get done a couple of quick fixes, simply because I was lazy enough no to do any dry-runs beforehand). Hopefully, next time it will cost me as much as a single keystroke to start that main driver script.

What will be a conclusion, you might ask? Hmm, I don't know – I just liked the stuff and the way of getting complex things done with a set simple techniques and tricks :-)

Hope to see yall soon again,
Cos

Java. Quality. Metrics (part 4)

Posted by cos on November 16, 2005 at 01:26 PM | Permalink | Comments (4)

How are you doing everyone? I hope you're Ok.

Lately, I was organizing and attending an interesting event called Java Days in Saint-Petersburg University. For those who unaware - it's not about Saint-Petersburg, FL, USA - you all know that people in Florida can't count at all, aren't you :-)? It's about Saint-Petersburg, Russia - former capital of the country and the most beautiful city I ever saw in my life. Check out this or this, or this one, or that picture some time - you'll believe me. BTW, they are created by one very lovely and very creative young lady - my daughter Dasha (I know she would hate me for that reference :-).

Any way, let me get back though. As for this event - I liked it. I had a chance to to meet all these young and very bright folks from their Department of Computer Science and Software Engineering; it was an opportunity to be challenged by their questions and all that jazz; I had a chance to reunite with my first professor - Mr. Terekhov, who's leading his own research Institute of Information Technologies. It's so cool to see how IT sector in Russia is hauling forward today. Big thanks for that to all these folks who are teaching new scientists and programmers up there.
Ah, well - that would be a topic for another flame some other day. I have to write something you all are getting here for - software quality.

So, we can use a few methods to insure our product quality and guarantee that we're digging at least in right direction.

Another one, I was about to mention, is code coverage. Being a quite simple thing it might give you a rough understanding of where you're at this moment of product and quality development. To make the story shorter: you can somehow instrument your code to report on any execution of code's methods (you might want finer granularity, but let's not talk about this now). Instrumentation of a native code might be tricky sometimes and it requires creating of new binaries for code coverage measurements. Java code's instrumentation isn't that hard and might be accomplished "on a fly". Depending on a framework, instrumented code's output might be as simple as follows:
Method getListHeadPtr() is began
Method getListHeadPtr() is completed
or as weird as this
CLASS: foo/bar/Shift$Stuff []
SRCFILE: Shiftstuff.java
TIMESTAMP: 1131040951709
DATA: C
#kind	start	end		count
METHOD: ()V [private]
1	62464	0		2
3	62464	0		2
METHOD: (Lfoo/bar/Shiftstuff$1;)V []
1	62464	0		2
3	62464	0		2
Then you can process this input in some manner and create a variety of report out of this data. The only reason to have them is to estimate how good the source code is covered by your tests. In other words, you can roughly estimate the amount of exercise you're giving to your code. There's no special magic behind it. It's one of the primitive quantitative measures of the quality control. Well, it's not as primitive as the count of kilo-lines of codes or similar, but nevertheless. However, different organizations have different standards on this matter. Perhaps, say 70% would sound reasonable from some "common" sense point of view, wouldn't it? I'd say yes it is really sound figure. Would you expect to see 90 or even 100%? It sounds really cool, right? Well, it depends. Just one interesting observation: after about 65-75% of code coverage the complexity of growing it any further starts being almost asymptotic. I meant to say, that one can't really get 100% of their source code covered by the tests. Or at least not in a reasonable time/money frame. In other words, you can do it with ROI close to zero or even negative perhaps.
And using the principle of reasonable quality, you might not want to get that high marks. The level of 80% or even 70% might be a way sufficient. But how would you know? Don't ask me, because the answer is somewhere beyond that technology's boundaries. I will talk about this next time.

Yet another approach of making improvements in this field is static analysis. This again popular thing has been mixed lately with pattern analysis (like many of modern IDEs or standalone FindBugs application do). But I'm talking about the old-fashion one - dealing with control and data flow analysis. Here's a list of such tools. However, I'm not suggesting you to use them or something. I just googled this stuff.

So, using this complicated yet powerful technique one can do many interesting things. E.g. you can find a dead spots in your code. Good one, right? It helps to keep a code free of methods made "...just for future use..." and never been used at all. Thus, it increases the pureness of the code and left less spots for bugs to hide.
Or you can analyze which parts of your code are likely to get most attention, because a lot of others places in the code are using/calling them. Alas, you might want to pay more attention to these spots merely to insure they are bugs free.

And obviously you can't do much without some special tools. Just generic control flow graph's generator won't help you - what would you do with 40+ thousand nodes' graph? Print it all along and wrap around your office? Neat, isn't it? But you might want to do a little bit more useful stuff. Like analyze a parity of memory allocation/deallocation operations, opening/closing streams, etc. Unfortunately, I wasn't been able to find any free tools to suite such needs. And I don't want to do any commercials here. So, be my guests and find it if needed (please also drop me a note, will you?). I believe that noticeable software companies have to invest in their internal tools no matter what. It's simply not possible to by some stuff off the shelves.

Kinda hint again: I'll continue on this topic next time. Stay tuned...

Cos

Java. Quality. Metrics (part3)

Posted by cos on October 21, 2005 at 11:34 PM | Permalink | Comments (3)

Hello everyone!

I'm writing this sitting in Lufthansa's 747 – boy, I've seeing better planes in my life – going towards Frankfurt. There I will change planes and fly for another two and a half hours to Saint-Petersburg, Russia. There I'll have a lecture at the Saint-Petersburg State University. And guess what would this lecture be about? Right, it's about Java quality again :-) (I'll write about this event as soon as I'll be flying back two weeks later). And I have to say that electronic technology is great and it's advancing much faster than all these last century's wonders like airplanes and stuff. Could you imagine that my 15” laptop doesn't fit between seats? I have to type these things sitting in a really awkward position :-( (I wish I'd have one of those tiny yet cool X40). And at the same time I'm online! Wow!

Speaking of quality: sitting here I got an idea, that a “quality” isn't about something good for anyone. It's more about a reasonably acceptable level of things and services. It's like these two entries in a flight's menu and both are “reasonably” good. (Well, I guess it, because I didn't dare to taste a chicken :-( ). Perhaps, someone's flying in the first class and having caviar right this moment. But when I realize how much they are paying for this quality I start thinking that I'd rather be a reasonably cheap passenger, accepting a reasonable quality of public air transportation.

Did you start wondering what I'm up to? Well, I'm about software testing again.
Let's see: are your bosses willing to pay for finding all the bugs in software packages your company's making? Really? Well, then you're working at Microsoft and your QA department's budget is really fat (although, it is not helping much, isn't? :-) As rest of us are leaving real lives, more or less, we have to balance the cost of hunting bugs down and their harm to our business. No, really, don't consider me too radical or cynic. Let's just be real.

Will a bug in one of those debugging modules hurt much of your business and relations with customers? Naah, I don't think so. Will a dump typo in a help messages make a lot of troubles? Probably not. But what would you client think about it when noticed? They might not tell you, but I can say this. It'd be something like this “Hmm, what are other bugs this POS (piece of software :-)) has that are not that harmless and obvious? Shall I invest into this platform any more”? Then merely imagine all perturbs of discovering and writing tests for this debugging module's bug and simplicity of killing the bug of help messages? You got the point, right?

Of course, I'm intentionally simplifying the picture. It might not be that easy to prioritize one bug over another. So, let's discuss a few techniques you might find helpful in making such a decision. Of course, they are pretty much empirical and you'll have to decide how confident you are about them.

  • Firstly, it is source code's prioritization by their scope of visibility. So, it says that a private code is unlikely to have much effect over what is seeing by customers. And, likewise, public method is more likely to affect some important functionality. Well, arguable, but it might be a fair-enough starting point. Especially in libraries writing business.
    Actually, this is what unit testing is about, I believe. Due to the nature of unit testing approach (for Java at least) it is focusing on public methods testing. And if you ever want to test a private method – hello, isn't it time to change its scope of visibility and make it public? As you can realize this technique has its benefits and shortcomings
  • Secondly... Hey, wait. We're approaching a landing site and they asked to turn off any electronic equipment. Since I want to land safely and the battery is drying anyway, I'll post that small article now and will continue this interesting topic afterwards
Kinda hint: Deutsche bier – this is what I call a reasonably great quality :-)

Auf Wiedersehen,
Cos

Java quality's open-source tools

Posted by cos on October 04, 2005 at 03:26 PM | Permalink | Comments (1)

Hello again.

Since I posted my first article here I've been asked a number of times: "Why there's no open source tools for quality process? Are there any?"

Being a lazy enough, I decided to reply just once in this public forum. Ok, here's the answer:
there's a number of open source tools to do a neat automation of one's software quality process.

    To schedule and execute jobs execution in a heterogeneous environment you might want to use
  • Sun's donated project RIO
  • or
  • Sun's opensourced GRID engine
  • instead of Distributed Tasks Framework, we wrote.

    and to run actual tests
  • Tonga harness can be replaced with JTiger or TestNG
I encourage you to look around and find a tool, fits your needs in a best way possible. Enjoy,

Yours,
Cos

Java. Quality. Metrics (part 2)

Posted by cos on September 30, 2005 at 09:05 PM | Permalink | Comments (0)

Hi there!

Surprisingly, my last post was rated #1 by Google for 'java quality' search and lasted in this position for a few days. My friends were wondering how much I had paid to gain this honorable position. Honestly: I didn't pay a penny for it and I only have to thank those of you, who spent time reading it. So, thank you! I also hope not to disappoint you this time.

Moving closer to the promises given in this blog, I will discuss some of not totally innovative, but still interesting techniques of improving quality development effectiveness.

Firstly, I'd like to talk about code coverage methodology (sometimes referred as a test coverage). It is a very efficient quality indicator, but relatively simple to collect and analyze. I will not spend much time describing this technique here, considering the variety of good information sources. If you're willing to educate yourself about the topic, you might want to check a paper like this

One of the common misunderstandings of this interesting technology is as follows: you shouldn't consider coverage numbers as the final point of your quality development, but rather the starting one.

I mentioned it once and I would like to emphasize it again: code coverage merely shows an amount of tested code. However, it doesn't address quality of the test coverage nor its effectiveness; it doesn't guarantee that the most important pieces of the source code are covered. But nevertheless, code coverage is a valuable measure, and I'm not in a position to 'misunderestimate (C)' it.

Coverage data gathering in a pure Java environment is quite a straight forward task and there's a variety of tools for this job (sometime I'll talk about mixed one, i.e. where Java code coexists with native). So far, you might want to: check this

However, you still have to take care about your build's instrumentation and about a storage for all produced information. And that one might be a very significant chunk of data sometimes. Ideally, you might want to use a database server, which seems to do the job right.

Processing of collected data and visualizing of results might be not that easy at times and would require surrounding frameworks and/or infrastructure.
    Here's the approach we've been using in-house to achieve above goals.
  • once in a month, release engineering prepares JDK's instrumented build for coverage testing (it might not be required in your particular case, but, of course, I don't know your situation)
  • components' quality engineering teams run their test sets and gather results, which can be collected in a central storage during the execution stage or right after it
  • collected results might later visualized on-demand through Web-interface or another device. It is really helpful - even for manual code inspection - to present a product's source code in different colors, e.g. red for non-covered areas of code; green otherwise. Having executions counter associated with test(s) per basic block of code might be very handy as well. Also, you might consider a possibility to show a version control's data, i.e. check-in information, pointing to the latest code update or something. I hope you already got the idea. However, let me illustrate my point with that tiny snapshot (A friend of mine Roman Shaposhnick has created this tool a while ago for Sun's compilers work. After some minor modifications it was applied to Java quality as well):
    ExplorerPanel.java
    org/openide/explorer/ExplorerPanel.java code coverage
    1.2 psk 1:
    private final class PropL extends Object implements PropertyChangeListener {
    2: 12728 ->
          PropL() {}
    1.1 ma 3:
    1.1 ma 4:
          public void propertyChange(PropertyChangeEvent evt) {
    5: 12728 ->
               if(evt.getSource() != manager) {
    6:
    		return;
    7:
               }
    1.3 rvs 8:
    9: 12728 ->
               if (ExplorerManager.PROP_EXPLORED_CONTEXT.equals(evt.getPropertyName())) {
    10: 12728 ->
                     updateTitle ();
    1.1 ma 11:
                     return;
    1.1 ma 12:
               }
    1.1 ma 13:
           }
    1.1 ma 14:
    }
    coverage data is getting used for managerial reports, exit criteria preparation, et cetera coverage statistic are used in a process of quality metrics preparation. Metric subsystems are working with coverage storage facility through some kind of API, i.e. JDBC
And please keep in mind: most of the quantative metrics of quality development are about trends in the first place. So, instead of doing code coverage once in a release time, you might find it useful to collect data on a regular basis and build a tendency of the coverage's improvements or otherwise:-)

Sorta disclaimer, though: my opinion might appear as arguable. Well, leave me a comment if you disagree with it and I'll try to address your concern in my future posts. See you soon!

Java. Quality. Metrics

Posted by cos on September 23, 2005 at 09:16 PM | Permalink | Comments (3)

It isn't, perhaps, a secret, that software test development and quality are like a snow ball rolling down a hill: and as it reaches further towards the end of the slope, harder to stop... and think. Think about what is done right; what missed, and how I can do this better, if only I had another chance.

We all have heard (or know from a real experience) about a number of testing types: functional, unit, white box and black box, and so on, and so on. But what really drives most of the test development efforts? Ok, ok, I know - everyone wants to find this ugly bug sitting next to the last one :-) But how we can know if that bug is uglier than others? What is the criteria for this? What tools have to be brought into the process? What additional testing techniques we need to introduce? And last, but not least - where the efforts of test development engineers have to be directed to reduce the cost of quality e.g. more bugs found at the earlier stages of the development?

Now, multiply most of above by a number of platforms, your product is running on and you will start seeing the picture similar to what we see ourselves in the Java Standard Edition Quality Organization.

In the following series of posts I plan to talk about these and some other issues of the software testing and quality measurements. I will share our practices of static analysis (do not confuse it with FindBugs application - static analysis is somehow different from this technique); what stages of test development and test execution we're having in JavaSE production cycle; what tools and techniques we use to increase ROI and free our engineers to do something more sophisticated and exciting than merely manual test execution.

I have to admit, that I really like Test Driven Development (TDD). It sounds so cool to develop all your tests first and then simply make your software to pass these tests. I wish that all problems of the software quality would be that sound and being solved that clearly. Unfortunately, it isn't so. And there is a lot of issues, which couldn't be foreseeing and test framed in advance. And as more complex a system becomes the less efficient this approach will be. As time and development goes, a quality department has to introduce more complicated methodologies and trickier techniques.

To be more specific, I'd like to quickly illustrate my point here: only in Java Virtual Machine (VM) testing we are running more than a million test cases in a few different testing cycles, e.g. nightly, pre-integration, weekly, regression, et cetera. We do separate stress testing and BigApps (or real world applications) testing. We support about two dozen (sic!) platforms. Did I mention, that we walk dogs too?

Now, let me move to the point and talk about real things. No wonder, that to control such a crowd you have to be really creative. And we do. We use some distributed execution environments. One of them is home grown Distributed Tasks Framework (DTF; patents pending) and is 100% pure Java application based on Jini. Another one is Grid Engine from Sun Microsystems. Both of these applications are quite similar in their functionality. However, Grid Engine is officially supported product and I gave up on further DTF maintenance. Now we use it just for scheduling and executing tasks on Wintel platform

Another thing, I'd like to mention here, is multi platform test harness Tonga (patents pending) which supports distributed testing scenarios and can do a lot of other things. I hope to see this great product in open source some day. You can read more about test frameworks and test harnesses at my colleague David Herron's blog

And it is obviously important to have an efficient solution of test results analysis. Ideal system would have low level of false positives; automated detection of regressions and known bugs; run-to-run comparison to find any trends; and many others qualities. Of course, the system like this relies on bugs (or issues) tracking system of some kind. For generic approach you can pick some of well-known systems like Bugzilla. However, larger companies are often go with their own products. Some day I will talk more about result analysis applications we're using in our process.

Measurements of tests effectiveness becomes a high priority issue at the certain point of a product development. And test coverage is one of reasonable ways to measure it. Indeed, it is cool to see that your test coverage had increased by 12% since last release. It makes you proud and confident in the quality of your product. However, it might be not enough to rely on such metric only. E.g. a product's last release includes 27 new features. Overall size of source code introduced is 270K lines. At the same time, the test coverage increased 12% and became 68% overall. Hmm, I wonder if this is good or bad? Or how exactly good is it? Shall we celebrate over such an achievement or do something about it? So, test coverage isn't the single one and shouldn't be treated like the only panacea of software quality measurements. The situation just get's more complex when a product consists of native and Java source code.

On this optimistic note I will close my today's post, so that something else will be left for further discussion.

Sorta disclaimer, though: this is my very first blogging experience ever and so bear with my over enthusiasm and leave me a comment or suggestions of topics you would like to hear or dicuss further.. See you soon!



Powered by
Movable Type 3.01D
 Feed java.net RSS Feeds