The Source for Java Technology Collaboration
User: Password:



Felipe Gaucho's Blog

Tools Archives


A syntax-dependent diff tool

Posted by felipegaucho on November 29, 2006 at 03:11 PM | Permalink | Comments (14)

How many times you observed someone modifying the code formatting of Java classes and other people getting crazy about the lost of the project history from the Concurrent Versions System? At first sight it seems just a problem of communication - someone is not following the company standards or something like that. But think again - is it really a human problem, or our tools are not wise enough to reduce the problem? I've been discussing it for several hours through my community mailing lists (rsjug and cejug), and I detect the general opinion asserts that the problem is restricted to a misunderstanding about development best pratices and project life cycle. Despite that, there is an intrinsic discussion about what kind of difference really matters - a discussion about the DIFF Tools.

Before you abandon this text and start burning a voodoo puppet with my name, I invite you to read the text below with an open mind behaviour - let the ideas guide you into a different perspective about version control and diff tools. I don't have any intention to criticize the current tools, the ones I use and I like. This blog is about a (supposedly) better future - and not about criticism. I am just asking you to forget the common beliefs and think differently for few minutes. Most of these ideas were written in the raw mode, so it is possible and plausible that our discussion changes it before we could accept that as good ideas. If you have any contribution, please write them at the end of this blog entry.

The common tools available today relies on the longest common subsequences (LCS) to compare the sequence of characters to detect differences between two files. The appeal of such algorithm is that it is fast and robust. It was designed in the Early 70' and it was designed to be clean and to consumes small amount of resources - memory and cpu. In terms of software design, it is nice, but in terms of human support it seems not wise enough to provide a comfortable and safety environment for the developers. Check the example below:

revision 1.1revision 1.2revision 1.3revision 1.4revision 1.5
private int i = 0; private
int i = 0;
private int i=0; // these codes
// are different?
// really?
private int i = 0;
/**
* TODO: ...
*/
private int i = 0;

Some open questions:

  • Why a QA professional must pay attention if a variable is of type int or double ?
  • Why a technician looking for a memory leak or other last-minute-severe-bug should waste time with code formatting details ?
  • What is the best column width in coding formatting? 80 columns? 126? 888?
  • If you buy a new wide screen monitor, why you need to be obligated to keep the same number of columns from the time the code was created 5 years ago ?
  • Why two different developers must be forced to type code in the same way? To read code in the same way?
  • How about an editor which allows people to read, modify and commit code using their own preferences ? How about if this editor is smart enough to provide a customized view per user profile ?

If you keep attention on the questions, you noticed there is a common element for all of them: people. People is the key in the software development process, and tools that force people to do things in a different way they want to do is less wise than I supose it could be. Now you have a cenario, and I will suggest you some ideas about how a new tool - the one I don't know if exist because I asked to a lot of people and nobody give me any clue about that. If you know about such tool, please send me the link and I publish it in large letters as a contribution for the community. I am just asking and looking for Open Source tools, but if you know a commercial tool I will also publish the link here.

A syntax dependent diif tool

Several ideas emerged when I started asking about giving the freedom for people choosing their own formating standard. One argue it could be done using a new customized diff tool or through a special configurable diff tool. Other prefers that only the IDE control such view customization, leaving the diff as it is today - just comparing characters. In both cases, the aim of such idea is to allow people to act like the steps below:

  1. A developer create and commit a source code using Eclipse and following the SUN Code Conventions;
  2. Another developer checkout the code using NetBeans and format that code using Jalopy. Then commit the code.
  3. The first developer open the modified source code and observe it exactly with the same format he commited it in the first time.
  4. The developers has no idea about what kind of tools or format their coleagues are adopting. And they are not worried about that because he always receive the code formated in his preferred format.

This environment suggest everyone in the project felt comfortable about code formating - code formating has the same impact in the project as the music developers listen while working: nothing. Doesn't matter what kind of music your coleagues are listening. If you want to program listening music, you will listen your preferred music, right ? Imagine your level of productivity if your company established a standard for music and play the same boss selection for everyone all day ;).

Natural observations on the original draft about a diff dependent of the programming language context:

  1. The syntax dependency should be an option and not a defult - in order to preserve the compatibility with the current tools and the compatibility with the traditional beliefs. If a company wants to keep forcing people to obey standard, the company must have this power.
  2. Like the name suggests, syntax dependency implies language dependency. It is not a generic tool - the tool can eventually be configured for each different language. And it will not work on binary files. eventually we can imagine a hibrid solution, where non-code files are treated in the traditional way.
  3. The version control system don't need to store the files in its original format. It can compress or remove useless blank spaces and others formatting characters, reducing the size of the storage. The tool can transform the original file in anything else to enhance performance or the system robustness.
  4. Comments can be configured in the IDE to be shown in different place or format than the code. The folding icons of Eclipse is a good example - eventually, comments could be hidden and get shown only if desirable by the developer.

Done. I could extend this text with a lot of other ideas, but I prefer to wait the feedback before loosing context suggesting minor features or talking about other usages of such tool.



The IDEs are driving us crazy

Posted by felipegaucho on February 09, 2006 at 12:13 PM | Permalink | Comments (12)

Activities Report #02:

Planet Earth, 2006 A.D.
Workstation: java.net
Module: Cejug-Classifieds
Status: active
MEMBERS PROFILE:
orrego: Spanish native speaker. OS: MacOS 3.1. IDE: Sun Studio Creator.
rajesh_sannareddy: Indish native speaker. OS: Linux 3.1. IDE: Eclipse.
java.net robot: en_US default locale. OS: Solaris. IDE: console.
raphael_paiva: Portuguese native speaker. OS: Windows 2000. IDE: MyEclipse.
felipegaucho: Portuguese native speaker. OS: Windows XP. IDE: WTP 0.7.
Date: Time: From: Entry:
01, Wednesday11:30pmCHILEorrego modifies some presentation files and commits to the CVS repository.
02, Thursday1:00amINDIArajesh_sannareddy update the controller in order to show the new features on the web.
02, Thursday1:30amUSAjava.net cruise control robot triggers the integration tests. Snapshot release is published on the test server.
03, Friday8:15pmBRAZILfelipegaucho tries the snapshot and gets several usability problems. Flushed characters suggest encoding problems in the files stored into the CVS repository.
04, Saturday9:00amBRAZILraphael_paiva fixes the character encoding. He send a mail to the team asking why the code still formated with 80 columns Line Wrapping.
04, Saturday9:30amUSAjava.net cruise control robot triggers the integration tests. Snapshot release is published on the test server.
04, Saturday2:10pmBRAZILfelipegaucho tries the snapshot release and notice that all features were working ok. The snapshot release is published on the project web site. The team receives a mail about encoding - the source of the problem is still unknown

If you had patient to observe the above report, you noticed some messy running on the code produced by the project team during february. The scenario is simple:

My project has members from many countries in the world, everyone using a different IDE, a different operational system and speaking a different idiom. All the people are motivated to do the best to put the project in the right way, but this Babel of tools are causing much more problems than solutions.

Seems familiar ? I guess so because almost every project I ever seen in my life suffers from configuration problems. If you are a member of a project in which every person could choose his own development environment, I'm sure you have painfull experiences with the code-style, integration and even the project communication. It is so common I decided to register this situation here in my blog.

Two other recent situations I experienced in my projects:

  1. The project I'm working uses a modified template where the Line Wrapper was modified to 255 columns, instead of the traditional 80 columns. One week of good job and I decided to update my IDE - I installed the new version in my computer and keep working on the code. Few days later a colleague asked me: did you change the Line Wrapping rule?. Of course, as human being, I was tired and also excited with the new version of my IDE - and I simply forgot to redefine its defaults before start working.
  2. A member of my Open Source project decided to contribute - he updated the CVS snapshot and fixed several open issues. Great job, unless a mute trap - the IDE he was using adopted the file encoding of his operational system, completely different from the project defaults. Nowadays, the project has part of its code in UTF-8 and part of it in Cp1252.

Machines: please don't think, just obey

The configuration problems seem ontological to computing and I can't imagine great solutions without much human interference. Despite that, several IDE vendors are designing heuristics to provide their IDEs the ability to guess what the humans are trying to do. This comfortable feature causes lazyness and ignorance about the development details - such as file encoding or code-style, for example. When a machine start taking decisions instead to ask the person what to do, the problems start. I think friendly IDEs as a great stuff, but it should be less pretentious about their ability to help us in our duties.

Some ideas: project descriptors like the ones used by Maven seem a great idea and something like that could be available for the IDEs. Imagine for a minute a specification of a XML Schema created by a consortium of IDE providers, in order to unify the project descriptors. Imagine now that every IDE forces the user to inform this default project descriptor in order to start a new project. Wonderful ? I guess so. Unfortunately, it seems far beyond the reality. For now, all responsability to reconfigure every new tools is still on humans and, you know, humans are not perfect.

Until we can figure out a solution for configuration problems, I enumerated a sanity checklist to help you to keep your team working together - if you know smart ways to avoid configuration problems, please let us to know.

Tricks to keep the team sanity:

  • Be simple - don't reinvent specifications unless you have a serious reason to do that.
  • Create templates for all kind of documents, including source code, documentation and communication stuff. Ask the team members to adopt it.
  • Use Ant, Maven or any console builder tool. Keep away the IDEs from your builds.
  • Use some continuous integration tool, such cruisecontrol or luntbuild - it will early alert you about the mess running underneath the CVS.
  • If possible, create alerts for the most common mistakes induced by the IDEs, such as file encoding or proprietary code formating.
  • Be patient, for now we depend on human memory to keep the things working on. Unfortunately, people are not machines and eventually your project will fail because someone was thinking about his girlfriend instead of code formating ;)
  • Don't confuse friendly IDEs with intelligence, they are simulacrum of someone else thinking - someone that can think very differently from you.


  • Wink - the power of presentations

    Posted by felipegaucho on December 19, 2005 at 04:54 AM | Permalink | Comments (6)

    Cejug-Classifieds have become popular these days and several new developers have asked about how to configure their development environment. I first tried the traditional way publishing some documents such as the Reference Guide and posting detailed messages in the developers mailing list. That effort revealed itself weak and many developers remain out of work just because they don't have much time to figure out how to configure the Tomcat, the MySql and mainly the Web Tools for Eclipse. Visiting other projects I noted the usage of demo videos as a powerful way to teach how to do things. Kirill Grouchnikov uses AVI video format [1, 2] and the Solaris team shows some well tailored videos about the OS. Other project and blogs include people in complete audio/vídeo about technology news and installation guidance.

    Sample: Click the green button to play
    Looking for Open Source solutions in order to create demos on how to configure my project into the Eclipse, I found Wink - a freeware tutorial/presentation creation software. It is a very easy to use tool with a set of nice features, including a compreensive user guide. I spent half an hour to produce my first video, and two hours later I published the first presentation at the home-page of my project: How to configure Cejug-Classifieds into Web Tools for Eclipse.
    Wink usability: a feature I have liked too much was the ability to create a video based on screenshots instead of a continous action - reducing the video size. It also provided me a chance to choice what screnshot will compose the video and also the chance to remove mistakes in the tool usage. A flaw I couldn´t work around was the absence of persistent text blocks, i.e., I can´t mantain a text box during more than one frame. The tool also gives me the chance to mix screenshots with continous image recording - cool!

    I strongly recommend the usage of tutorials in your Open Source project. Presentations and tutorials are faster to produce and reduce communication problems. I know there are more sophisticated commercial softwares to create presentation, but Wink is a free alternative that offers a very good usability.

    Aknowledgement: moments before publishing this entry I found a previous post by Vincent Brabant. Brabant first introduced the Wink through an elegant presentation about NetBeans. I will try to use those next button tricks in order to control the rhythm of my next presentations.



    Powered by
    Movable Type 3.01D
     Feed java.net RSS Feeds