Skip to main content

Managing 800,000+ Lines of Code

Posted by elevy on July 20, 2009 at 7:54 PM PDT

Imagine you get to an organization that has several applications accounting for more than 800,000 lines of code. There are defects everywhere, releases after releases, lots of developers cranking code, after every release more defects... how do you stop the spiral?

This might sound like a spiral of death, or a suicidal path perhaps ;)

Before I continue, I have to say that the number 1 requirement to succeed in a situation like this one, is to have a strong leadership team. In this case, I mean by strong leadership team, a team of managers that understand technology. "Professional Managers" not aware of technology would not be enough.

That been said, let's go back to how to stop the spiral:

1) Make sure you have some basic tools in place (i.e. version management system, automated build script, etc.)

I had to include this one, as I have interviewed candidates for senior IT Positions that were not aware of what a version control system is.

2) Get a continuous integration system like Hudson or any other commercial.

Hudson is a great project. If you haven't seen it, go download it, and play with it. It is a great tool!

3) Include as part of your build some "code complexity metrics" utility.

You can use Sonar, which is available as a plugin for Hudson. This is another great tool. I use the complexity metrics as indicators as I explain below. You can monitor how your code is evolving, and answer questions like: is the code getting worse?. This indicators are not going to help you know if you are moving the code in a better shape, but if it is going south you are going to know, and are going to be able to take action to correct it.

4) Find a utility to monitor the activity in your version control system.

There are a couple of commercial solutions (I don't want to advertise any in particular) or you can use statcvs (you can find it in sourceforge.) The only problem with statcvs is that it does not work on branches, and is only useful if you were using cvs in the first place. If you don't want to spend the few bucks the commercial tools cost, you can write your own scripts.

5) Code reviews

Use your repository monitor to see what files are getting modified every week. Join that data set with the "code complexity metrics" extracted in the step 3. And schedule deep code reviews in the most complex objects. If you have the bandwidth, you should review every single change that goes into the repository. If not, set a goal, and use the complexity metrics as a guideline of the areas to focus the most

6) Prepare for the code reviews well.

My intention in this blog is not how to complete a code review. There is plenty of material in the web, just google for it. Every organization is different, I suggest you develop your own code review process with your team. A checklist is very handy. There are some tools as well that will help you document the results, and measure the effectiveness.

7) Make sure that every bit of changed code has a well automated unit tested created, and that it works.

This is the best way to start. If you pretend to create unit tests for every single method of your 800,000 lines of code, you will never finish, because of that you will not start as well.

There are several challenges on writing good automated unit tests. I will not cover those in this blog, perhaps I will write about it in a later one. However, I can tell you that it is not an easy task. One of the most difficult things to resolve is that the code is rarely stateless. There is state stored somewhere, and the test cases are dependent on that state. There are different ways of addressing this problem. I don't like the "mock" strategy. The ideal situation is to have a "static" data set where you execute your automated test cases. There are some commercial solutions that will help you with that.

I like dbunit. It has the ability to recreate a database from a previous recorded one.

Another strategy can be to have the test cases setup the data they need to execute properly. But again, this is a topic for an entire blog, I will continue for now. Use Hudson to automate the execution of your automated test cases, and reporting.

8) Promote learning in the organization

Guide the developers with good practices, and document them.
The best practices can change from organization to organization. Yes, I am not crazy. For some organizations there are things that might be important that for others are not even relevant. Alright, here is an example: flexibility. For an organization that sells software that needs to be configurable for any customer, flexibility might be the most important requirement from an architectural perspective. In that case, if the software is not flexible enough it might mean that it will not be successful.

However, in an organization that requires the software to work in a specific niche, you might be wasting your time making it super flexible, most importantly, you are making it more complex, and harder to maintain than necessary.

In summary, make sure your team understands what is important for your business. And that should be your guiding principle number one always: adding value to your business, the reason your organization exists.

A pretty cool idea that I am starting to adopt, and I am waiting to see widely adopted, is to leverage the multimedia we have today for this purpose.

Instead of *only* writing tons of documents on guidelines, and standards that very few will ever read, you can record videos of best practices, and examples of using the IDE configuring the projects. Videos are a lot easier to follow, and are not that difficult to create. Every new member in the team, has a video library to watch and follow to get setup and running. And any developer can always refer to the videos to see how to do certain things. Specially with the latest 4GL like IDE's features for JSF/JavaFX/Swing development

9) If you can, standardize the IDE.

A lot of people will want to shut me up when I say this one. But a standard IDE has a lot of value for an organization. Yes, every developer has its preferences, and is more efficient with the tools is familiar with. But the organization as a whole will benefit tremendously by standardizing the IDE. When a developer has a problem, anyone will be able to help to resolve it. The code will be more consistent, specially when the advanced features of the IDEs are leveraged.

10) Classify properly the defects.

Use a good "task" management system that allows you to keep track of the issues, and their assignment. Do your research, there are some tools that are really powerful, and there are some that are going to become a bottleneck. One thing for sure, the whiteboard in the manager's office is not good enough.

11) Understand dependencies and interfaces.

I did not find a tool that does this, so I went out and created it. I parsed all the 800,000 lines of code, and extracted the dependencies. Then I used graphviz to create nice dependencies diagrams. They helped a lot to understand the underlying complexities and interactions. The developers where able to see graphically what objects would be dependent on the changes they were performing. Document the interfaces, and make sure that they are well defined. Simplicity overall is the most important principle for me. Again this will change from organization to organization.
If a service provided by a component can be stateless, then make it stateless.

12) Don't rewrite that app, please.

If the app is working, don't rewrite it, unless you are very certain that this time it is going to be significantly better. If you have the same constrains (i.e. no time, same people, etc) it is going to be difficult to make it a lot better this second time. I have seen developers rewriting the same app several times, and they always end up writing it with similar problems. Yes, I know that the second time you write an application you will write it better because you are aware of all the aspects of the problem. But, still, I have seen people rewriting the applications again and again with the same/similar problems. Remember good enough exists, there is no perfection.

13) Be careful about creating your own framework.

You will find yourself surrounded by developers that say they want to make a framework that would be configured with an XML file, and will be super flexible. I remember when I wrote my first Servlet 2 Architecture application. I did my Hashtable at the time (there was no HashMap back then) with a keyword -> EventHandler map. EventHandler was an interface with one method: processRequest.

A few months later, struts was out there, and I was lucky enough to not get caught by the pride trap, and went to struts.

Now a days, most of the infrastructure components you will need to do, are already there. If you think you have something unique, go for it. But please, make due diligence and make sure you are not creating the next big framework or "wheel".

14) Use a tool to manage deployments.

When you have multiple projects in parallel, how do you know what versions of what objects to move from the environments?
I haven't find a tool that does that really well yet. Perhaps I will publish an open source tool for this soon...

Related Topics >>


I couldn't agree more with you. ERP solutions are the way to go.

I mean by "Professional Managers" individuals who have a lot of people skills, can talk to the business users about their challenges, and understand project management concepts (i.e. tasks, resources, critical path, dependencies, etc.) but they lack software development understanding.

On the copy/paste problem, If you don't know there are problems, you cannot fix them. Identifying them is just the beginning. Anyways, if you go to an organization that already has 800k+ LOC, it is difficult to fix what is in place. But you can control not getting it worse. That is the primary idea I am exposing here. Fixing it, is ideal. But, if you are bombarded with business requirements, and you don't have budget to invest in fixing the already "working" pile of code solution, you have to deal with it. And manage it to not get worse. What I have done in the past in situations like this, is to get the enhancement requests as a mechanism to get pieces fixed. However, I believe that this is not always possible. As sometimes you can not afford the risk of breaking something that is already working.

I highly recommend to use an ERP package instead of in-house development and maintenance if current application's functional scope is under ERP umbrella. Otherwise problems would never finish even tough best tools and practices are used since companies are eventually leaving in-house ERP development. Besides ERP migration would not be easy.

By "Professional Managers" I assume you include architects paving framework principles and guidelines for a particular application? E.g. a class must not be more than 2 - 300 lines long? Also is this mainly targeted at Legacy Systems maintained by "cowboys"? 800, 000 lines varies depending on who was writing the code. Copy & paste vs properly re-factored software? I assume this is where the Software Metrics tools come-in? But then what do you do if you find duplicate copy-and-paste code? Isn't that more important than just detecting it? Some refactoring tools don't detect chunks of code that have been copied and pasted in different classes... Agree with most everything else though - except the Standardized IDE and code reviews (which I think can be avoided by scrum or XP).

You should also consider using some architecture management tools like SonarJ. These tools allow you to define the architecture of the software and assist the developers in obeying the architecure. They also can visualize dependencies and show you cycles in your packages.