The Source for Java Technology Collaboration
User: Password:



Felipe Gaucho

Felipe Gaucho's Blog

A syntax-dependent diff tool

Posted by felipegaucho on November 29, 2006 at 03:11 PM | Comments (14)

How many times you observed someone modifying the code formatting of Java classes and other people getting crazy about the lost of the project history from the Concurrent Versions System? At first sight it seems just a problem of communication - someone is not following the company standards or something like that. But think again - is it really a human problem, or our tools are not wise enough to reduce the problem? I've been discussing it for several hours through my community mailing lists (rsjug and cejug), and I detect the general opinion asserts that the problem is restricted to a misunderstanding about development best pratices and project life cycle. Despite that, there is an intrinsic discussion about what kind of difference really matters - a discussion about the DIFF Tools.

Before you abandon this text and start burning a voodoo puppet with my name, I invite you to read the text below with an open mind behaviour - let the ideas guide you into a different perspective about version control and diff tools. I don't have any intention to criticize the current tools, the ones I use and I like. This blog is about a (supposedly) better future - and not about criticism. I am just asking you to forget the common beliefs and think differently for few minutes. Most of these ideas were written in the raw mode, so it is possible and plausible that our discussion changes it before we could accept that as good ideas. If you have any contribution, please write them at the end of this blog entry.

The common tools available today relies on the longest common subsequences (LCS) to compare the sequence of characters to detect differences between two files. The appeal of such algorithm is that it is fast and robust. It was designed in the Early 70' and it was designed to be clean and to consumes small amount of resources - memory and cpu. In terms of software design, it is nice, but in terms of human support it seems not wise enough to provide a comfortable and safety environment for the developers. Check the example below:

revision 1.1revision 1.2revision 1.3revision 1.4revision 1.5
private int i = 0; private
int i = 0;
private int i=0; // these codes
// are different?
// really?
private int i = 0;
/**
* TODO: ...
*/
private int i = 0;

Some open questions:

  • Why a QA professional must pay attention if a variable is of type int or double ?
  • Why a technician looking for a memory leak or other last-minute-severe-bug should waste time with code formatting details ?
  • What is the best column width in coding formatting? 80 columns? 126? 888?
  • If you buy a new wide screen monitor, why you need to be obligated to keep the same number of columns from the time the code was created 5 years ago ?
  • Why two different developers must be forced to type code in the same way? To read code in the same way?
  • How about an editor which allows people to read, modify and commit code using their own preferences ? How about if this editor is smart enough to provide a customized view per user profile ?

If you keep attention on the questions, you noticed there is a common element for all of them: people. People is the key in the software development process, and tools that force people to do things in a different way they want to do is less wise than I supose it could be. Now you have a cenario, and I will suggest you some ideas about how a new tool - the one I don't know if exist because I asked to a lot of people and nobody give me any clue about that. If you know about such tool, please send me the link and I publish it in large letters as a contribution for the community. I am just asking and looking for Open Source tools, but if you know a commercial tool I will also publish the link here.

A syntax dependent diif tool

Several ideas emerged when I started asking about giving the freedom for people choosing their own formating standard. One argue it could be done using a new customized diff tool or through a special configurable diff tool. Other prefers that only the IDE control such view customization, leaving the diff as it is today - just comparing characters. In both cases, the aim of such idea is to allow people to act like the steps below:

  1. A developer create and commit a source code using Eclipse and following the SUN Code Conventions;
  2. Another developer checkout the code using NetBeans and format that code using Jalopy. Then commit the code.
  3. The first developer open the modified source code and observe it exactly with the same format he commited it in the first time.
  4. The developers has no idea about what kind of tools or format their coleagues are adopting. And they are not worried about that because he always receive the code formated in his preferred format.

This environment suggest everyone in the project felt comfortable about code formating - code formating has the same impact in the project as the music developers listen while working: nothing. Doesn't matter what kind of music your coleagues are listening. If you want to program listening music, you will listen your preferred music, right ? Imagine your level of productivity if your company established a standard for music and play the same boss selection for everyone all day ;).

Natural observations on the original draft about a diff dependent of the programming language context:

  1. The syntax dependency should be an option and not a defult - in order to preserve the compatibility with the current tools and the compatibility with the traditional beliefs. If a company wants to keep forcing people to obey standard, the company must have this power.
  2. Like the name suggests, syntax dependency implies language dependency. It is not a generic tool - the tool can eventually be configured for each different language. And it will not work on binary files. eventually we can imagine a hibrid solution, where non-code files are treated in the traditional way.
  3. The version control system don't need to store the files in its original format. It can compress or remove useless blank spaces and others formatting characters, reducing the size of the storage. The tool can transform the original file in anything else to enhance performance or the system robustness.
  4. Comments can be configured in the IDE to be shown in different place or format than the code. The folding icons of Eclipse is a good example - eventually, comments could be hidden and get shown only if desirable by the developer.

Done. I could extend this text with a lot of other ideas, but I prefer to wait the feedback before loosing context suggesting minor features or talking about other usages of such tool.


Bookmark blog post: del.icio.us del.icio.us Digg Digg DZone DZone Furl Furl Reddit Reddit
Comments
Comments are listed in date ascending order (oldest first) | Post Comment

  • And why only syntax? How about the following:

    int i=0;
    int j=0;

    and

    int j=0;
    int i=0;

    Should this be flagged as a diff? How about this:

    int i=0;

    and

    int i=0;
    int i=0;

    Should this be flagged as a diff? How about this:

    int i=i=0;

    and

    int i=0;

    Should this be flagged as a diff (happened to me as i was cleaning someone else's code)? How about this:

    int i=0;

    and

    int i=1-1;

    Should this be flagged as a diff? There's no end to this...

    Posted by: kirillcool on November 29, 2006 at 04:21 PM


  • Hi kiricool,sure there are several different ideas in order to enhance the control of Java code - that's why I suggested just the control over syntax changing. The semantic of the code is something much more complex to evaluate. Even thinking only about syntax, the performance of such tool can be "complicated".

    Posted by: felipegaucho on November 29, 2006 at 11:50 PM

  • A lot of today's languages (Java, the .NET family, etc) get compiled to an intermediate "language" (byte codes). How about a diff tool that analyzes the diffs between this intermediate code, and maps it back to its source code?
    This mechanism would allow the "human readable" source code to be formatted in any way, but would only be flagged as different if the resultant byte-codes where different. (I'm sure this can be generalized to any language, but guessing it would be easier with an intermediate language.)
    I love the idea of formatting code by user preference, and only flagging differences as visible to the computer (ie - differences that affect how the computer runs the program). Note that this mechanism should not replace today's diff tools, only be an additional option over today's tools.

    Posted by: bangrazi on November 30, 2006 at 05:03 AM

  • There are several problems with comparing compiled results.


    Dependencies and build time. If you changed a file that depends on other files, those other files will need to be built first. In a large project, this can be very time consuming for a cvs update/commit. Imagine having to wait 5 minutes every time you check in a file. And that's being generous, large enterprise projects can take hours to build. If your local file is in a state where it can't be built, you couldn't even run an update at all!

    Dependencies and versioning. You also need to decide if it should compile against the version of the dependencies you have locally, or the latest on the branch. If the latter, you can have a different binary because of someone else changing a dependency, even if the changes to the file in question would not have resulted in a different binary on their own. Even if you compare locally, doing an update on your local tree will cause the same problems.

    Compiler versions. First you have to decide if you will be compiling locally or on the repository server. If on the server, then only one version of the compiler can be used for the diff test, or multiple compiles will have to be run, again very time consuming. If locally, you will need to update and recompile any changes made by other developers, which can again be very time consuming for a cvs commit/update action.

    Posted by: mhall on November 30, 2006 at 06:55 AM

  • Hi mhall,I agree with you. Performance is the reason I suggested the scope of the idea only to syntax parsers :). If we can use the Java syntax to detect code differences, it would be great.

    Posted by: felipegaucho on November 30, 2006 at 07:21 AM

  • Could not something like the Mirror API be useful in finding a compromise between "byte code" only and "source code" only comparisons?

    (Not a rhetorical question - I haven't really looked into it, just interested in this discussion)

    Pete

    Posted by: peeet on November 30, 2006 at 07:44 AM

  • Felipe,
    True, comparing an AST would cut out a lot of the problems with dependencies and being able to build. I'm not overly familiar with snytax parsers, but would the enhanced for loop in java5 be parsable by java1.4? If not, then you would still have the problem of comiler, or in this case parser, versions, though to a much lesser extent.

    Posted by: mhall on November 30, 2006 at 07:53 AM

  • YES! I made the exact same request a few years back where anyone who checks code out sees it in their preferred format and everyone can use their own coding style.

    I just hope there are enough other people who feel the same way so we can make this a reality and add it as a RFE to all the major Java IDEs.

    Gili

    Posted by: cowwoc on November 30, 2006 at 08:25 AM

  • Gili, You'll need it in more than just the Java IDE's for it to become a part of mainstream RCS systems. A lot of people use non-java editors, and if CVS spits out something different from what they checked in, they'll not like it.
    The idea is to produce a tool really transparent for the users. The IDE doesn'tseems the best place to put all the logic related to such tool. A diff tool based on the language syntax could provide the stored files in two different ways:
    Formatted Content: the tool can have a (configurable) default standard. If any external tool like non-java editors request a file, the system returns the file formatted with this standard.The diff special format: the format used internally by the diff and that can also be used by IDEs and other tools to adapt stored content to the users profile.There is several other ideas we can think in order to guarantee that all users will receive the files in their prefered way - it is up to you to think about that.

    Posted by: mhall on November 30, 2006 at 08:59 AM

  • Check out PMD's copy/paste detector for another approach to finding similar code chunks. It uses the Burrows-Wheeler transform + Karb-Rabin string matching.

    Posted by: tcopeland on November 30, 2006 at 12:09 PM

  • Some of the performance implications could be alleviated using redundancy: Whenever a source file is committed to the repository, a dump of the AST for that file is committed as well - maybe in some separate repository tree. The diff then could act on ASTs. As stated above, this would exclude formatting differences, and allows to ignore in a syntax-aware (configurable) manner certain differences, e.g. a renamed local variable.

    Posted by: hagger on December 06, 2006 at 03:25 PM

  • I just hope there are enough other people who feel the same way so we can make this a reality and add it as a RFE to all the major Java IDEs.

    Posted by: juanjuanmak on April 01, 2007 at 11:24 PM

  • I created a program with a similar goal to what is described above. You can check it out at http://sourceforge.net/projects/sourcediff

    It does a functional diff on Java 1.5 as well as XML and is extensible to handle any other language.

    Posted by: snocorp on May 10, 2007 at 08:56 AM

  • The subject of a very wonderful and distinct
    I thank you for continuing excellence
    Thank you

    =========================================================================

    ليبيا
    شباب ليبيا
    libya
    منتديات
    منتديات ليبية
    غرائب وحقائق
    أحاديث شريفة
    برامج اسلامية للجوال
    مفاتيح الديجيتل
    الشيرنج
    الرسيفرات
    كتب إسلامية
    خلفيات للموبيل
    الشعر الشعبي
    الصحة والطب
    طب اسنان
    كتب طب اسنان مجانية
    برامج طبية
    تعلم الإنكليزية
    اللغة الفرنسية
    طب الإعشاب
    الخواطرالادبية
    الازياء والمكياج
    تعليم الطبخ
    الاثاث الحديث
    مقاطع كرة قدم
    المصارعه الحرة
    اهداف كوره
    الفوتوشوب
    اروع البرامج
    الدوري الليبي
    خلفيات رياضية
    المصارعة
    كورة عربية
    كرة قدم عالمية
    الدوري الإيطالي
    الدوري الاسباني
    الدوري الإنجليزي
    صور المشاهير
    انواع الحلويات
    افلام كوميدية
    احدث الافلام
    افلام
    التقنية
    تحميل افلام
    برامج
    اخر برامج الجوال
    kaspersky
    أفلام كرتون عربية
    برامج برامج كمبيوتر
    برامج حماية
    برامج اختراق
    برامج صوت
    برامج تحميل برامج احدث البرامج
    محادثة
    خلفيات الطبيعة
    برامج مبايل للتحميل
    اخبار الفن
    احدث الافلام للتحميل
    تحميل افلام رعب
    ترجمةأفلام
    الكامات
    برامج جوال
    برامج محاسبة
    برامج
    kasper
    games
    برامج
    برامج
    انترنت
    برامج صوتية
    شبكات الحاسوب
    خلفيات للويندوز
    تطويرالمواقع
    العاب
    العاب الفيديو
    games
    شفرات
    برامج مسنجر
    خلفيات شاشة
    صور ترحيبيه
    الفوتوشوب
    خلفيات طبيعة
    تطويرالمواقع
    الفوتوشوب
    مقاطع البلوتوت
    مسجات ليبية
    خلفيات
    الفلاش
    التصميم الثلاثي
    برامج الجوال
    العاب الجوال
    فيديو كليب
    مسجات
    ترددات ستالايت
    نغمات

    Posted by: libyan on May 30, 2008 at 03:19 PM



Only logged in users may post comments. Login Here.


Powered by
Movable Type 3.01D
 Feed java.net RSS Feeds