Skip to main content


Posted by malcolmdavis on June 6, 2004 at 9:36 PM PDT

A certain degree of copy and pasting occurs with every development project. The problem is many times overlooked, or missed during code inspection. Simian helps stop some of this monkey business by searching through text files (like Java and C#) and identifying duplications.

To do some testing, I ran Simian against a recent project. After running through 250K+ lines of code (LOC) in 20 seconds, Simian had found 47129 duplicate lines.

I guess you can expect some duplication with 250K+ LOC written by 10 developers in 10 months. Further investigation discovered many of the code duplications were in auto-generated sections. [Of course this makes you wonder about the code generation process]. Additionally, some of the dups found were over common coding variables usages in routines.

Simian seems to look for exact copies and not similarities. A more meaningful tool would be something that found algorithm 'similarities' in code. A more common problem with copy and pasting is when a developer copies a section code, then makes a minor mod. In the following example, the developer intended to have a common display format of the form of identifier + description.

Class Member
    public String displayText() {
        return "[ " + memberId + " ] " + name;

Class Branch
    public String displayText () {
        return "[ " + branchId + " ] " + description;

The developer copied the code and made a small mod. [Yes, this is a real world example.]

The concept is valid, and Simian does find many copy & paste problems. Simian helps automate the process by providing an Ant tag. Additionally there are plugins for Checkstyle and Eclipse. [I was able to get Simian to work in both 2.1 and 3.x of Eclipse.] Simian can be found at Unfortunately, Simian is neither Open Source, nor free for commercial use. However, it is free for Open Source projects, and inexpensive to purchase for commercial.

Related Topics >>