The Source for Java Technology Collaboration
User: Password:



Osvaldo Pinali Doederlein

Osvaldo Pinali Doederlein's Blog

No tabs? Yes, you ARE nuts!

Posted by opinali on September 26, 2007 at 10:28 AM | Comments (21)

Quote: (WHAT? NO TABS? ... yes, no tabs ... ARE YOU NUTS? ... at times ... WHY? ... because tabs create a source display problem ... BUT IT WORKS FINE FOR ME IN VI/EMACS! ... yes, but what about everybody else?)

Tabs are only a problem if you mix tabs and spaces before the first non-whitespace char, or indent some lines with tabs and others with spaces, or if you use tabs after the first non-whitespace char. If you have discipline to use tabs the way God intended when He created the ASCII charset; that is, using tabs (and only tabs) for indentation only - then tabs have the advantage of allowing each programmer to pick his or her preferred indentation size, and none of the claimed disadvantages.

I have written a small Java utility that will scan a directory tree with text files, and fix tab/whitespace usage automatically. I wrote this utility after searching, and not finding, anything similar on the Internet - it seems there are some utilities (created from programmers from the dark side of the force) that will do the reverse operation of expanding tabs to spaces. I think only Jalopy would do what I wanted, but at the cost of forcing me to run a full reformat.

The code is very small, so I'm just including it in this blog; enjoy if you find it useful. It's released to the public domain. The program is easy to use and efficient, and I won't claim that any code is bug-free but I've been using it for a couple years without any issues. I fear however, that this code may be confused by filesystems with links (limitation of java.io), or arbitrary Unicode data (I assume 8-bit chars). And I didn't have time to add really cool features like IDE integrations. Clearly it's not a sufficiently advanced program to make me the next famous open source project leader, so I'm posting this here just to help other programmers who are in the Right side of the Tab Wars... yeah, you know what you're supposed to do now: just wait till all your coworkers go home next Friday night; checkout the entire company repositories' sources; run FixTabs from the root; commit. ;-)

import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.util.ArrayList;
import java.util.regex.Pattern;

/**
 * Fixes usage of physical tabs in text files: Tabs are mandatory for indentation,
 * and only allowed for that purpose (forbidden after first non-blank in line).
 * 
 * @author osvaldo
 */
public class FixTabs
{
	private static int spacesPerTab = 4;
	private static final ByteArrayOutputStream baos = new ByteArrayOutputStream();
	private static final String[] defaultIncludes = { ".*\\.java", ".*\\.properties" };
	private static final String[] defaultExcludes =
	{
		"\\..*",     // .svn, Unix, Eclipse hidden directories
		"CVS",       // CVS
		"bin",       // Common output directory for Eclipse
		"dist",      // Common output directory for NetBeans
		"build",     // Common output directory for NetBeans
		"nbproject", // Common output directory for NetBeans
		"target",    // Common output directory for Maven
	};

	public static void main (final String[] args)
	{
		if (args.length > 3 ||
				(args.length > 0 && ("-?".equals(args[0]) || "-help".equals(args[1]))))
			help();
		
		final File root = new File(args.length < 1 ? "." : args[0]);
		
		if (args.length >= 2) try
		{
			spacesPerTab = Integer.parseInt(args[1]);
		}
		catch (NumberFormatException e)
		{
			help();
		}
		
		final ArrayList includeList = new ArrayList(); 
		final ArrayList excludeList = new ArrayList(); 

		for (int i = 2; i < args.length; ++i)
		{
			if (args[i].startsWith("+"))
				includeList.add(Pattern.compile(args[i].substring(1)));
			else if (args[i].startsWith("-"))
				excludeList.add(Pattern.compile(args[i].substring(1)));
			else
				help();
		}
		
		if (includeList.isEmpty()) for (final String p: defaultIncludes)
			includeList.add(Pattern.compile(p));
		
		if (excludeList.isEmpty()) for (final String p: defaultExcludes)
			excludeList.add(Pattern.compile(p));

		final int count = fixDirectory(root,
			includeList.toArray(new Pattern[includeList.size()]),
			excludeList.toArray(new Pattern[excludeList.size()]));
		System.out.println("Fixed: " + count);
	}
	
	private static void help ()
	{
		System.out.println(
			"FixTabs [root [spacesPerTab [+includePattern*] [-excludePattern*]]]\n" +
			"Ex.: FixTabs . 4 +.*\\.java +.*\\.properties -CVS");
	}

	/**
	 * Processes a directory recursively.
	 */
	private static int fixDirectory (final File dir, final Pattern[] includeFiles,
		final Pattern[] excludeDirs)
	{
		System.out.print(dir.getAbsolutePath() + " ... ");
		final File[] files = dir.listFiles();
		if (files == null) return 0;
		int count = 0;
		int changed = 0;
		
		for (int i = 0; i < files.length; ++i)
		{
			final File file = files[i];
			
			if (file.canRead() && file.canWrite() && !file.isHidden() &&
					matchesAny(includeFiles, file.getAbsolutePath()))
			{
				if (fixFile(file)) ++changed;
				++count;
			}
		}
		
		System.out.println(Integer.toString(changed) + '/' + count);
		
		for (int i = 0; i < files.length; ++i)
		{
			final File file = files[i];
			
			if (file.isDirectory() && !matchesAny(excludeDirs, file.getName()))
				changed += fixDirectory(file, includeFiles, excludeDirs);
		}
		
		return changed;
	}

	/**
	 * Checks a string against a set of patterns; returns true if any pattern matches.
	 */
	private static boolean matchesAny (final Pattern[] patterns, final String s)
	{
		for (Pattern p: patterns)
			if (p.matcher(s).matches())
				return true;
		
		return false;
	}

	/**
	 * Processes a single file.
	 * 
	 * @param file The file to process.
	 * @return true if file was touched; otherwise, it was already okay.
	 */
	private static boolean fixFile (final File file)
	{
		RandomAccessFile raf = null;
		
		try
		{
			raf = new RandomAccessFile(file, "rw");
			final byte[] originalData = new byte[(int)raf.length()];
			raf.readFully(originalData);
			final byte[] fixedData = fix(originalData);
			if (fixedData == null) return false;
			raf.seek(0);
			raf.write(fixedData);
			raf.setLength(fixedData.length);
			raf.close();
			return true;
		}
		catch (IOException e)
		{
			System.err.println(e);
			return false;
		}
		finally
		{
			if (raf != null) try { raf.close(); } catch (IOException e) {}
		}
	}

	/**
	 * Fixes tabs & spaces, if needed.
	 * 
	 * @param data Original data in ASCII format (1 byte = 1 char).
	 * @return Fixed data, or null if no fix was necessary.
	 */
	private static byte[] fix (final byte[] data)
	{
		baos.reset();
		boolean changed = false;
		boolean indenting = true;
		int position = 0;
		
		for (int read = 0; read < data.length; ++read)
		{
			final byte c = data[read];
			
			if (c == '\n')
			{
				indenting = true;
				position = 0;
				baos.write(c);
			}
			else if (c == '\r')
				baos.write(c);
			else if (indenting)
			{
				if (c == ' ')
				{
					if (++position % spacesPerTab == 0)
					{
						changed = true;
						baos.write('\t');
					}
				}
				else if (c == '\t')
				{
					baos.write(c);
					position += (spacesPerTab - position % spacesPerTab);
				}
				else
				{
					for (int i = 0; i < position % spacesPerTab; ++i)
						baos.write(' ');
					
					baos.write(c);
					indenting = false;
					++position;
				}
			}
			else
			{
				if (c == '\t')
				{
					changed = true;
					
					do
					{
						baos.write(' ');
					}
					while (++position % spacesPerTab != 0);
				}
				else
				{
					baos.write(c);
					++position;
				}
			}
		}
		
		if (indenting) for (int i = 0; i < position % spacesPerTab; ++i)
			baos.write(' ');

		return changed ? baos.toByteArray() : null;
	}
}


Bookmark blog post: del.icio.us del.icio.us Digg Digg DZone DZone Furl Furl Reddit Reddit
Comments
Comments are listed in date ascending order (oldest first) | Post Comment

  • I would suggest looking at the checkstyle tool, removes tabs and removes/includes whitespaces where you want them.

    http://checkstyle.sourceforge.net/config_whitespace.html

    Posted by: atehrani on September 26, 2007 at 06:20 PM

  • I know Checkstyle, in fact I use this tool heavily, every day and for all my code. But Checkstyle only reports violation of the defined rules, it doesn't fix the code. Even worse, in the Tabs vs. Spaces issue it only has a rule TabCharacter that helps only to get rid of tabs (I want the opposite).

    Posted by: opinali on September 26, 2007 at 07:32 PM

  • Everyone knows God intended tabs to be 8 spaces.

    Posted by: shannon on September 26, 2007 at 08:12 PM

  • shannon: Yeah I agree that was the original design. This should even be encoded in the Genesis - any archaeologist there has the original text to check indentation? ;-) But then, there's evolution... 8 spaces were good in simpler times when a program consisted of a bunch of functions in the global scope. Nowadays your code is much more nested - at minimum methods are nested inside classes, and it gets worse with try/catch blocks, inner classes... we have to use feature-packed IDEs where lots of real estate is taken from the code editor by other tools... so even if you're very disciplined to avoid deeply nested control structures, it's very hard to use 8 space indentation these days. Except that it's a good excuse to request a 20" monitor to your boss!

    Posted by: opinali on September 27, 2007 at 04:55 AM

  • How about the 80 columns codes? I never use this pattern, but we can see it even on the JRE classes. I can imagine in the old times, when we should print the codes to see external documentation (when God not created JavaDocs). What do you think about it, Osvaldo?

    Posted by: rael_gc on September 27, 2007 at 08:20 AM

  • Tabs are defined to be 8 characters wide. Editors that do anything else with them are broken. I'd rather not indent that much.

    If you expect editors to be broken and tab widths to vary, how do you manage line lengths? Editor wrapped lines are a PITA to deal with. Editors that don't line wrap are far worse.

    Posted by: tackline on September 27, 2007 at 08:25 AM

  • Your program has horizontal scrolling in firefox where tab length is not easily controlled (without user css applied).

    Posted by: coding on September 27, 2007 at 09:02 AM

  • I hope I'm not being labeled the evil TAB-less guy, actually I like TABs, when used properly, so I don't disagree with you at all. It's very unfortunate that we have so much inconsistent usage of them, and different ways to view them. And maybe that's the issue here. It's how the source is viewed that is at issue here. Displaying source in a variable width font could be just as bad (do people do that?). I suppose an alternative would be to normalize the source into something that uses TABs correctly, then add in a check at commit time to verify this. But have you ever tried to get everyone to agree on what "using TABs correctly" means? Sigh...

    -kto

    Posted by: kellyohair on September 27, 2007 at 09:26 AM


  • rael_gc: javadocs exists since JDK 1.0. Having a maximum horizontal limit is a good idea, I use 110cols because I found that this size fits neatly in my editor (in an IDE where some H space is already taken by a project view), and it's also a perfect fit for hard printing. 80cols is more traditional, but both Java and modern frameworks and 'good practices' are pretty verbose... for example when writing a simple variable initialization, I'll often fill the first 80 cols just with generic type declarations and an invocation to a ServiceLocator ;-)


    tackline: Defined where? The ASCII standard doesn't specify any specific number of spaces; the horizontal tab char means just "move to the next tab stop", where these tab stops are device-specific (the standard was not meant for screen only, in fact screens were rare devices in that time). Tabs don't even have to be all same size: you could have your first tab stop at line 10, then one stop each 4 chars, and no stops at all after column 50. And there's a HUGE number of text/code editors that default to a different number, most often 4 spaces (for example that's the default of Notepad++ which is my favourite general-purpose text editor).


    As for line width, fair enough, perhaps my 110col lines will jump to 120-140 when opened by someone who uses 8col tabs. It's not a perfect system; but it is the less evil: if I open code from somebody who uses 8 char indent but no physical tabs, and whose lines are too long for my preferences, I don't have the option to trim line sizes by reducing tab length. But if you open my code and find that it has too long lines, at least you have the choice of making it narrower (even if you're not pleased with the resulting indentation).


    coding: It's the first time I posted such a big chunk of code, I'm also a Firefox user and unfortunately the blog's preview didn't show this scrollbars. I don't know if I can use custom CSS here


    kellyohair: Don't worry, I just used your blog as an opportunity ;-) I'm hacking Java since 1.0beta and I always noticed that Sun's sources were often inconsistently (when not just horribly) indented. It seems like there's no standard of editor, IDE, or coding style inside Sun. This is better in sources of more recent APIs (I guess you're using NetBeans instead of vi/emacs, finally?) But one of my wishes for OpenJDK is "get a good Jalopy style - any style - and REFORMAT THE WHOLE F***ING MESS for good". I know, that's never going to happen (massive diff in repository, veteran coders suiciding).


    On variable font widths, most Smalltalk environments used that as default, I think that's part of Smalltalk's attept to be closer to natural language. Yet another Smalltalk-ism that didn't catch. I don't miss this (always configured the environments to fixed fonts) although I miss keyword messages (so much needed in these days of API inflation...)

    Posted by: opinali on September 27, 2007 at 11:29 AM

  • Osvaldo, about the JavaDoc: I was talking about pre-Java languages :)

    Posted by: rael_gc on September 27, 2007 at 01:24 PM

  • About the Notepad++: I love it!

    Posted by: rael_gc on September 27, 2007 at 01:26 PM

  • About the lines width: I don't wrap the lines at any length, cause the most of editors can wrap it automatically.

    Posted by: rael_gc on September 27, 2007 at 01:41 PM

  • I don't wrap anything either, i meant that i set a limit so I'll never write code beyond that line. "Soft" line wrapping is a blasphemy in code editors :)

    Meanwhile, why does MovableType suck so badly on Firefox? Even the comments preview functionality doesn't work perfectly - when I click preview, I receive a page that contains the rendered comment but the editable textarea is empty so I have to click Back to edit. Very weird, for a site that's supposed to promote so many open source projects.

    Posted by: opinali on September 27, 2007 at 02:02 PM

  • BTW, on pre-Java languages: 8-char tabs and 80-col pages come from FORTRAN and puch cards. Now, this is technologies from the sixties... some programmers evolve, others don't. :-)) Nobody used such a large tabulation with mechanical typewriters (I used these for years, didn't learn it in wikipedia). And source code is much less dense than common prose documents, there's a lot of empty space due to short statements, empty lines, and block structuring - remarkably when programmers follow the One True Bracing Style (which is the BSD or Allman style of course) - large tabs are better for K&R-style users, these are those old, long-bearded diehards who are still traumatized by 24-line terminals (hint: 24 = 2 * 12, punched cards were 80x12) so they resort to large tabs to regain some readability.

    Posted by: opinali on September 27, 2007 at 02:21 PM

  • BTW, on pre-Java languages: 8-char tabs and 80-col pages come from FORTRAN and puch cards. Now, this is technologies from the sixties... some programmers evolve, others don't. :-))
    At this time a tabulation was a unit of indention, yes. :)

    Posted by: arsene on September 28, 2007 at 01:45 AM

  • "Soft" line wrapping is a blasphemy in code editors :)

    I don't use wrap lines in any way ("hard" or "soft" wraps). I just talk about the editor wrap when someone like it, as I never use, you still can see my code wrapped by the editor if you want :)

    Posted by: rael_gc on September 28, 2007 at 06:04 AM

  • At this time a tabulation was a unit of indention, yes. :)
    Not fair! In FORTRAN (from punched cards time at least) there was no indentation at all, not in the modern meaning of indentation, since the language was not even block-structured, all statements would start in the same line and not exceed one line. The "tab stops" were used just to set the position for specific code elements, e.g. a label would have to be placed in a specific column, I think that was mostly a cheap trick to optimize the parser. Modern languages that follow this idea are those like Python and Haskell, where indentation has semantic meaning (I happen to like that in Haskell)... :)

    Posted by: opinali on September 28, 2007 at 06:23 AM

  • I just talk about the editor wrap when someone like it, as I never use, you still can see my code wrapped by the editor...
    This is known as "soft wrapping" - when the editor wraps lines automatically but doesn't break them physically with EOL and indentation.

    Posted by: opinali on September 28, 2007 at 06:27 AM

  • For punch card machines, fortran tabs were usually set at columns 7 and 72. As the tab/style wars will never end perhaps we should use editors that reformat on the fly according to each users preferred style. But this formatting should be just for display --- the source file isn't changed unless you edit significant characters.
    Want to do tables, etc in your source documentation; just use html, also formatted on the fly.
    I quite like using variable width fonts, but many IDE don't permit their use. The main issue here is to find a font which adequately distinguishes between similar characters such as I/l/1, 0/O.

    Posted by: mthornton on September 29, 2007 at 02:07 AM

  • Re: 8 spaces in original language Actually, I have a copy of Genesis in the original language, but it doesn't help on this issue, as tabs and other forms of indentation apply right-to-left! ;-)


    Re: FORTRAN I must disagree emphatically with the statement "In FORTRAN (from punched cards time at least) there was no indentation at all, not in the modern meaning of indentation..." (and I speak from experience, having learned FORTRAN in 1968). As with most unfair statements, there's a grain of truth: the FORTRAN punched-card source format defined the following fields:

    Cols. 1-5: Statement number (what we'd call "statement labels" today)
    Col. 6: Continuation marker (if non-blank, concatenates content of current physical line onto logical source line)
    Cols. 7-72: Statement area (freely formatted source code)
    Cols. 73-80: Identification sequence (a crude form of source control that need not concern us here)

    It was entirely possible (having done so myself) to set up a keypunch machine's "control card" so that the tab key not only (physically) moved the card through the above fields, but also provided regular indentation points within the statement area.


    The key issue is that the statement area was whitespace-agnostic in FORTRAN. Even though "structured programming" was not yet an industry-standard buzzword, it was considered good style (at least in the circles in which I moved) to use indentation (and other whitespace conventions) to make the organization and meaning of one's code as explicit as possible. In fact it was even more important to do so, as the language had few syntactic constructs to represent logical structure explicitly.


    It is certainly true that there were lots of people who didn't bother. It is equally true that one can still find published or blogged Java source code that causes fits of cringing. After all, one can write lousy code in any language.


    Just to set the historical record straight (and now I feel like a software archaeologist! ;-)


    -jn-

    Posted by: joelneely on October 07, 2007 at 05:45 AM

  • Thanks so much for this! This is exactly what I was looking for. thai boxing

    Posted by: winbill on December 20, 2007 at 01:24 AM





Powered by
Movable Type 3.01D
 Feed java.net RSS Feeds