Skip to main content

A syntax-dependent diff tool

Posted by felipegaucho on November 29, 2006 at 3:11 PM PST

How many times you observed someone modifying the code formatting of Java classes and other people getting crazy about the lost of the project history from the Concurrent Versions System? At first sight it seems just a problem of communication - someone is not following the company standards or something like that. But think again - is it really a human problem, or our tools are not wise enough to reduce the problem?
I've been discussing it for several hours through my community mailing lists (rsjug and cejug), and I detect the general opinion asserts that the problem is restricted to a misunderstanding about development best pratices and project life cycle. Despite that, there is an intrinsic discussion about what kind of difference really matters - a discussion about the DIFF Tools.

Before you abandon this text and start burning a voodoo puppet with my name, I invite you to read the text below with an open mind behaviour - let the ideas guide you into a different perspective about version control and diff tools. I don't have any intention to criticize the current tools, the ones I use and I like. This blog is about a (supposedly) better future - and not about criticism. I am just asking you to forget the common beliefs and think differently for few minutes. Most of these ideas were written in the raw mode, so it is possible and plausible that our discussion changes it before we could accept that as good ideas. If you have any contribution, please write them at the end of this blog entry.

The common tools available today relies on the longest common subsequences (LCS) to compare the sequence of characters to detect differences between two files. The appeal of such algorithm is that it is fast and robust. It was designed in the Early 70' and it was designed to be clean and to consumes small amount of resources - memory and cpu. In terms of software design, it is nice, but in terms of human support it seems not wise enough to provide a comfortable and safety environment for the developers. Check the example below:

revision 1.1 revision 1.2 revision 1.3 revision 1.4 revision 1.5
private int i = 0;


int i = 0;
private int i=0;

// these codes

// are different?

// really?

private int i = 0;

* TODO: ...


private int i = 0;

Some open questions:

  • Why a QA professional must pay attention if a variable is of type int or double ?
  • Why a technician looking for a memory leak or other last-minute-severe-bug should waste time with code formatting details ?
  • What is the best column width in coding formatting? 80 columns? 126? 888?
  • If you buy a new wide screen monitor, why you need to be obligated to keep the same number of columns from the time the code was created 5 years ago ?
  • Why two different developers must be forced to type code in the same way? To read code in the same way?
  • How about an editor which allows people to read, modify and commit code using their own preferences ? How about if this editor is smart enough to provide a customized view per user profile ?

If you keep attention on the questions, you noticed there is a common element for all of them: people. People is the key in the software development process, and tools that force people to do things in a different way they want to do is less wise than I supose it could be. Now you have a cenario, and I will suggest you some ideas about how a new tool - the one I don't know if exist because I asked to a lot of people and nobody give me any clue about that. If you know about such tool, please send me the link and I publish it in large letters as a contribution for the community. I am just asking and looking for Open Source tools, but if you know a commercial tool I will also publish the link here.

A syntax dependent diif tool

Several ideas emerged when I started asking about giving the freedom for people choosing their own formating standard. One argue it could be done using a new customized diff tool or through a special configurable diff tool. Other prefers that only the IDE control such view customization, leaving the diff as it is today - just comparing characters. In both cases, the aim of such idea is to allow people to act like the steps below:

  1. A developer create and commit a source code using Eclipse and following the SUN Code Conventions;
  2. Another developer checkout the code using NetBeans and format that code using Jalopy. Then commit the code.
  3. The first developer open the modified source code and observe it exactly with the same format he commited it in the first time.
  4. The developers has no idea about what kind of tools or format their coleagues are adopting. And they are not worried about that because he always receive the code formated in his preferred format.

This environment suggest everyone in the project felt comfortable about code formating - code formating has the same impact in the project as the music developers listen while working: nothing. Doesn't matter what kind of music your coleagues are listening. If you want to program listening music, you will listen your preferred music, right ? Imagine your level of productivity if your company established a standard for music and play the same boss selection for everyone all day ;).

Natural observations on the original draft about a diff dependent of the programming language context:

  1. The syntax dependency should be an option and not a defult - in order to preserve the compatibility with the current tools and the compatibility with the traditional beliefs. If a company wants to keep forcing people to obey standard, the company must have this power.
  2. Like the name suggests, syntax dependency implies language dependency. It is not a generic tool - the tool can eventually be configured for each different language. And it will not work on binary files. eventually we can imagine a hibrid solution, where non-code files are treated in the traditional way.
  3. The version control system don't need to store the files in its original format. It can compress or remove useless blank spaces and others formatting characters, reducing the size of the storage. The tool can transform the original file in anything else to enhance performance or the system robustness.
  4. Comments can be configured in the IDE to be shown in different place or format than the code. The folding icons of Eclipse is a good example - eventually, comments could be hidden and get shown only if desirable by the developer.

Done. I could extend this text with a lot of other ideas, but I prefer to wait the feedback before loosing context suggesting minor features or talking about other usages of such tool.

Related Topics >>