<?xml version="1.0" encoding="utf-8"?>
<feed version="0.3" xmlns="http://purl.org/atom/ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xml:lang="en">
<title>Tom White&apos;s Blog</title>
<link rel="alternate" type="text/html" href="http://weblogs.java.net/blog/tomwhite/" />
<modified>2008-03-18T14:07:33Z</modified>
<tagline></tagline>
<id>tag:weblogs.java.net,2008:/blog/tomwhite/225</id>
<generator url="http://www.movabletype.org/" version="3.01D">Movable Type</generator>
<copyright>Copyright (c) 2008, tomwhite</copyright>
<entry>
<title>&quot;Disks have become tapes&quot;</title>
<link rel="alternate" type="text/html" href="http://weblogs.java.net/blog/tomwhite/archive/2008/03/disks_have_beco.html" />
<modified>2008-03-18T14:07:33Z</modified>
<issued>2008-03-18T14:07:24Z</issued>
<id>tag:weblogs.java.net,2008:/blog/tomwhite/225.9378</id>
<created>2008-03-18T14:07:24Z</created>
<summary type="text/plain">What trends in disk drive technology mean for data processing.</summary>
<author>
<name>tomwhite</name>

<email>tom@tiling.org</email>
</author>
<dc:subject>Distributed</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://weblogs.java.net/blog/tomwhite/">
<![CDATA[<p>
MapReduce is a programming model for processing vast amounts of data. One of the reasons that it works so well is because it exploits a sweet spot of modern disk drive technology trends. In essence MapReduce works by repeatedly sorting and merging data that is streamed to and from disk at the <span style="font-style: italic;">transfer rate</span> of the disk. Contrast this to accessing data from a relational database that operates at the <span style="font-style: italic;">seek rate</span> of the disk (seeking is the process of moving the disk's head to a particular place on the disk to read or write data).
</p>

<p>
So why is this interesting? Well, look at the trends in seek time and transfer rate. Seek time has grown at about 5% a year, whereas transfer rate at about 20% <a href="#1">[1]</a>. Seek time is growing more slowly than transfer rate - <span style="font-style: italic;">so it pays to use a model that operates at the transfer rate</span>. Which is what MapReduce does. I first saw this observation in Doug Cutting's talk, with Eric Baldeschwieler, at <a href="http://conferences.oreillynet.com/os2007/">OSCON</a> last year, where he worked through the numbers for updating a 1 terabyte database using the two paradigms B-Tree (seek-limited) and Sort/Merge (transfer-limited). (See the <a href="http://wiki.apache.org/hadoop-data/attachments/HadoopPresentations/attachments/oscon-part-1.pdf">slides</a> and <a href="http://us.dl1.yimg.com/download.yahoo.com/dl/ydn/hadoop.m4v">video</a> for more detail.)
</p>

<p>
The general point was well summed up by Jim Gray in an <a href="http://www.acmqueue.org/modules.php?name=Content&amp;pa=showpage&amp;pid=43">interview</a> in ACM Queue from 2003:
<blockquote>... programmers have to start thinking of the disk as a sequential device rather than a random access device.</blockquote>Or the more pithy: "Disks have become tapes." (Quoted by <a href="http://www.databasecolumn.com/2007/09/disk-trends.html">David DeWitt</a>.)
</p>

<p>
But even the growth of transfer rate is dwarfed by another measure of disk drives - capacity, which is growing at about 50% a year. David DeWitt <a href="http://www.databasecolumn.com/2007/09/disk-trends.html">argues</a> that since the effective transfer rate of drives is falling we need database systems that work with this trend - such as column-store databases and wider use of compression (since this effectively increases the transfer rate of a disk). Of existing databases he says:
<blockquote>Already we see transaction processing systems running on farms of mostly empty disk drives to obtain enough seeks/second to satisfy their transaction processing rates.</blockquote>But this applies to transfer rate too (or if it doesn't yet, it will). Replace "seeks" with "transfers" and "transaction processing" with "MapReduce" and I think over time we'll start seeing Hadoop installations that choose to use large numbers of smaller capacity disks to maximize their processing rates.
</p>

<p>
<span style="font-size:85%;"><a name="1">[1]</a> See <a href="http://www.cs.utexas.edu/users/dahlin/techTrends/trends.disk.ps">Trends in Disk Technology</a> by Michael D. Dahlin for changes between 1987-1994. For the period since then these figures still hold - as it's relatively easy to check using manufacturer's data sheets, although with seek time it's harder to tell since the definitions seem to change from year to year and from manufacturer to manufacturer. Still, 5% is generous.</span>
</p>

<p>
(Cross-posted at <a href="http://www.lexemetech.com/2008/03/disks-have-become-tapes.html">my other blog</a>.)
</p>]]>

</content>
</entry>
<entry>
<title>Consistent Hashing</title>
<link rel="alternate" type="text/html" href="http://weblogs.java.net/blog/tomwhite/archive/2007/11/consistent_hash.html" />
<modified>2007-11-27T18:01:45Z</modified>
<issued>2007-11-27T17:56:25Z</issued>
<id>tag:weblogs.java.net,2007:/blog/tomwhite/225.8676</id>
<created>2007-11-27T17:56:25Z</created>
<summary type="text/plain">I&apos;ve bumped into consistent hashing a couple of times lately. But what is it and why should you care? This post has a look.</summary>
<author>
<name>tomwhite</name>

<email>tom@tiling.org</email>
</author>
<dc:subject>Distributed</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://weblogs.java.net/blog/tomwhite/">
<![CDATA[<p>
I've bumped into consistent hashing a couple of times lately. The paper that introduced the idea (<a href="http://citeseer.ist.psu.edu/karger97consistent.html">Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web</a> by <a href="http://people.csail.mit.edu/karger/">David Karger</a> <i>et al</i>) appeared ten years ago, although recently it seems the idea has quietly been finding its way into more and more services, from Amazon's <a href="http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html">Dynamo</a> to <a href="http://www.danga.com/memcached/">memcached</a> (courtesy of <a href="http://www.last.fm/">Last.fm</a>). So what is consistent hashing and why should you care?
</p>

<p>
The need for consistent hashing arose from limitations experienced while running collections of caching machines - web caches, for example. If you have a collection of <i>n</i> cache machines then a common way of load balancing across them is to put object <i>o</i> in cache machine number <i>hash(o)</i> mod <i>n</i>. This works well until you add or remove cache machines (for whatever reason), for then <i>n</i> changes and <i>every object is hashed to a new location</i>. This can be catastrophic since the originating content servers are swamped with requests from the cache machines. It's as if the cache suddenly disappeared. Which it has, in a sense. (This is why you should care - consistent hashing is needed to avoid swamping your servers!)
</p>

<p>
It would be nice if, when a cache machine was added, it took its fair share of objects from all the other cache machines. Equally, when a cache machine was removed, it would be nice if its objects were shared between the remaining machines. This is exactly what consistent hashing does - <i>consistently</i> maps objects to the same cache machine, as far as is possible, at least.
</p>

<p>
The basic idea behind the consistent hashing algorithm is to hash both objects and caches using the same hash function. The reason to do this is to map the cache to an interval, which will contain a number of object hashes. If the cache is removed then its interval is taken over by a cache with an adjacent interval. All the other caches remain unchanged.
</p>
]]>
<![CDATA[<h3>Demonstration</h3>

<p>
Let's look at this in more detail. The hash function actually maps objects and caches to a number range. This should be familiar to every Java programmer - the <code>hashCode</code> method on <code>Object</code> returns an <code>int</code>, which lies in the range -2<sup>31</sup> to  2<sup>31</sup>-1. Imagine mapping this range into a circle so the values wrap around. Here's a picture of the circle with a number of objects (1, 2, 3, 4) and caches (A, B, C) marked at the points that they hash to (based on a diagram from <a href="http://www8.org/w8-papers/2a-webserver/caching/paper2.html">Web Caching with Consistent Hashing</a> by David Karger <i>et al</i>):
</p>

<p>
<img alt="consistent_hashing_1.png" src="http://weblogs.java.net/blog/tomwhite/archive/images/consistent_hashing_1.png" width="237" height="239" />
</p>

<p>
To find which cache an object goes in, we move clockwise round the circle until we find a cache point. So in the diagram above, we see object 1 and 4 belong in cache A, object 2 belongs in cache B and object 3 belongs in cache C. Consider what happens if cache C is removed: object 3 now belongs in cache A, and all the other object mappings are unchanged. If then another cache D is added in the position marked it will take objects 3 and 4, leaving only object 1 belonging to A.
</p>

<p>
<img alt="consistent_hashing_2.png" src="http://weblogs.java.net/blog/tomwhite/archive/images/consistent_hashing_2.png" width="251" height="232" />
</p>

<p>
This works well, except the size of the intervals assigned to each cache is pretty hit and miss. Since it is essentially random it is possible to have a very non-uniform distribution of objects between caches. The solution to this problem is to introduce the idea of "virtual nodes", which are replicas of cache points in the circle. So whenever we add a cache we create a number of points in the circle for it.
</p>

<p>
You can see the effect of this in the following plot which I produced by simulating storing 10,000 objects in 10 caches using the code described below. On the x-axis is the number of replicas of cache points (with a logarithmic scale). When it is small, we see that the distribution of objects across caches is unbalanced, since the standard deviation as a percentage of the mean number of objects per cache (on the y-axis, also logarithmic) is high. As the number of replicas increases the distribution of objects becomes more balanced. This experiment shows that a figure of one or two hundred replicas achieves an acceptable balance (a standard deviation that is roughly between 5% and 10% of the mean).
</p>

<p>
<img alt="ch-graph.png" src="http://weblogs.java.net/blog/tomwhite/archive/images/ch-graph.png" width="400" height="400" />
</p>

<h3>Implementation</h3>

<p>
For completeness here is a simple implementation in Java. In order for consistent hashing to be effective it is important to have a hash function that <a href="http://problemsworthyofattack.blogspot.com/2007/10/mixing-with-md5.html">mixes</a> well. Most implementations of <code>Object</code>'s <code>hashCode</code> do <i>not</i> mix well - for example, they typically produce a restricted number of small integer values - so we have a <code>HashFunction</code> interface to allow a custom hash function to be used. MD5 hashes are recommended here.
</p>

<pre>
import java.util.Collection;
import java.util.SortedMap;
import java.util.TreeMap;

public class ConsistentHash&lt;T&gt; {

 private final HashFunction hashFunction;
 private final int numberOfReplicas;
 private final SortedMap&lt;Integer, T&gt; circle = new TreeMap&lt;Integer, T&gt;();

 public ConsistentHash(HashFunction hashFunction, int numberOfReplicas,
     Collection&lt;T&gt; nodes) {
   this.hashFunction = hashFunction;
   this.numberOfReplicas = numberOfReplicas;

   for (T node : nodes) {
     add(node);
   }
 }

 public void add(T node) {
   for (int i = 0; i &lt; numberOfReplicas; i++) {
     circle.put(hashFunction.hash(node.toString() + i), node);
   }
 }

 public void remove(T node) {
   for (int i = 0; i &lt; numberOfReplicas; i++) {
     circle.remove(hashFunction.hash(node.toString() + i));
   }
 }

 public T get(Object key) {
   if (circle.isEmpty()) {
     return null;
   }
   int hash = hashFunction.hash(key);
   if (!circle.containsKey(hash)) {
     SortedMap&lt;Integer, T&gt; tailMap = circle.tailMap(hash);
     hash = tailMap.isEmpty() ? circle.firstKey() : tailMap.firstKey();
   }
   return circle.get(hash);
 }

}
</pre>

<p>
The circle is represented as a sorted map of integers, which represent the hash values, to caches (of type <code>T</code> here).
When a <code>ConsistentHash</code> object is created each node is added to the circle map a number of times (controlled by <code>numberOfReplicas</code>). The location of each replica is chosen by hashing the node's name along with a numerical suffix, and the node is stored at each of these points in the map.
</p>

<p>
To find a node for an object (the <code>get</code> method), the <i>hash value of the object</i> is used to look in the map. Most of the time there will not be a node stored at this hash value (since the hash value space is typically much larger than the number of nodes, even with replicas), so the next node is found by looking for the first key in the tail map. If the tail map is empty then we wrap around the circle by getting the first key in the circle.
<h3>Usage</h3>So how can you use consistent hashing? You are most likely to meet it in a library, rather than having to code it yourself. For example, as mentioned above, memcached, a distributed memory object caching system, now has clients that support consistent hashing. Last.fm's <a href="http://www.audioscrobbler.net/development/ketama/">ketama</a> by <a href="http://www.last.fm/user/RJ/">Richard Jones</a> was the first, and there is now a <a href="http://bleu.west.spy.net/%7Edustin/projects/memcached/">Java implementation</a> by <a href="http://bleu.west.spy.net/%7Edustin/">Dustin Sallings</a> (which inspired my simplified demonstration implementation above). It is interesting to note that it is only the client that needs to implement the consistent hashing algorithm - the memcached server is unchanged. Other systems that employ consistent hashing include <a href="http://pdos.csail.mit.edu/chord/">Chord</a>, which is a distributed hash table implementation, and Amazon's Dynamo, which is a key-value store (not available outside Amazon).
</p>

<hr>

<p>
(Cross-posted at <a href="http://problemsworthyofattack.blogspot.com/2007/11/consistent-hashing.html">Problems worthy of attack</a>.)
</p>]]>
</content>
</entry>
<entry>
<title>Hadoop + EC2 + S3</title>
<link rel="alternate" type="text/html" href="http://weblogs.java.net/blog/tomwhite/archive/2007/07/hadoop_ec2_s3_1.html" />
<modified>2007-07-20T09:10:56Z</modified>
<issued>2007-07-20T09:10:48Z</issued>
<id>tag:weblogs.java.net,2007:/blog/tomwhite/225.7887</id>
<created>2007-07-20T09:10:48Z</created>
<summary type="text/plain">How to run data processing applications on a rented grid.</summary>
<author>
<name>tomwhite</name>

<email>tom@tiling.org</email>
</author>
<dc:subject>Distributed</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://weblogs.java.net/blog/tomwhite/">
<![CDATA[<p>
I've raved about the MapReduce parallel programming model in the <a href="http://weblogs.java.net/blog/tomwhite/archive/2005/09/mapreduce.html">past</a>, and Apache Hadoop (the framework for running MapReduce applications), and Amazon's compute and storage webservices (EC2 and S3). Now I've written an article - <a href="http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873&categoryID=112">Running Hadoop MapReduce on Amazon EC2 and Amazon S3</a> - about using them all together to do some data crunching.
</p>

<p>
The nice thing is that you can fire up a fair sized Hadoop cluster (20 nodes is the current limit on <a href="http://aws.amazon.com/ec2">EC2</a>) in minutes and run it just for as long as you need to run your job - you pay by the hour. EC2 is still in limited beta and has had long waiting lists to get on it, but recently they <a href="http://developer.amazonwebservices.com/connect/message.jspa?messageID=61999#61999">cleared the backlog</a>, so if you're interested in trying it out, now might be a good time.
</p>]]>

</content>
</entry>
<entry>
<title>Wanted: A Public Amazon EC2 AMI for Java EE</title>
<link rel="alternate" type="text/html" href="http://weblogs.java.net/blog/tomwhite/archive/2007/06/wanted_a_public.html" />
<modified>2007-06-27T21:02:57Z</modified>
<issued>2007-06-27T21:00:43Z</issued>
<id>tag:weblogs.java.net,2007:/blog/tomwhite/225.7753</id>
<created>2007-06-27T21:00:43Z</created>
<summary type="text/plain">Ruby on Rails has got one - is there one for the Java EE stack?</summary>
<author>
<name>tomwhite</name>

<email>tom@tiling.org</email>
</author>
<dc:subject>J2EE</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://weblogs.java.net/blog/tomwhite/">
<![CDATA[<p>
I <a href="http://aws.typepad.com/aws/2007/06/ruby-on-rails--.html">noticed</a> that Paul Dowman has created a Ruby on Rails AMI for use on <a href="http://aws.amazon.com/ec2">Amazon EC2</a> (Amazon's rented CPU service). It allows you to fire up a fully-configured RoR environment that you deploy your application to. It's not yet multi-server, but it's <a href="http://pauldowman.com/projects/ruby-on-rails-ec2/">coming</a>, so it won't be too long before you can launch your own production environment with a few keystrokes.
</p>

<p>
Now, can we have one of these for the Java EE stack please?
</p>]]>

</content>
</entry>
<entry>
<title>jMock 2 and my Java Unit Testing Toolkit</title>
<link rel="alternate" type="text/html" href="http://weblogs.java.net/blog/tomwhite/archive/2007/04/jmock_2_and_my.html" />
<modified>2007-04-11T14:16:39Z</modified>
<issued>2007-04-11T14:16:32Z</issued>
<id>tag:weblogs.java.net,2007:/blog/tomwhite/225.7033</id>
<created>2007-04-11T14:16:32Z</created>
<summary type="text/plain">The long-awaited final version of jMock 2 was released today. Another useful tool for my unit testing toolkit.</summary>
<author>
<name>tomwhite</name>

<email>tom@tiling.org</email>
</author>
<dc:subject>Testing</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://weblogs.java.net/blog/tomwhite/">
<![CDATA[<p>
The long-awaited final version of <a href="http://www.jmock.org/">jMock 2</a> was released today. There are some big changes since version one. For example, you can now write
</p>

<pre><code>Cat cat = mock(Cat.class);</pre></code>

<p>
and then set expectations on the returned <code>cat</code> object itself:
</p>

<pre><code>checking(new Expectations() {{
    one(cat).miaow();
}});</pre></code>

<p>
(This means that the <code>miaow</code> method is expected to be called exactly once on the <code>cat</code> object.)
</p>

<p>
This change means that you can refactor your method names without breaking your tests, unlike jMock 1 where the method names were strings:
</p>

<pre><code>Mock mockCat = mock(Cat.class);
mockCat.expects(once()).method("miaow")</pre></code>

<p>
You may have noticed that the new syntax is a little weird (check out the double braces, and how e.g. <a href="http://www.jmock.org/returning.html">will modifiers</a> need new statements), and it may take a little getting used to, but I think it's worth it for the IDE completion and refactoring support. For example, in test driven mode I can write the name of the method that doesn't yet exist then get my IDE to create the method for me.
</p>

<p>
The other nice thing is the <a href="http://www.jmock.org/matchers.html">integration</a> with <a href="http://code.google.com/p/hamcrest/">Hamcrest</a> (a library of matchers that <a href="http://weblogs.java.net/blog/tomwhite/archive/2006/12/hamcrest_1.html">I've written about before</a>). My toolkit for writing new unit tests now includes:
</p>

<ol>
<li>Hamcrest for making flexible assertions.</li>
<li>jMock 2 for defining expected behaviour.</li>
<li><a href="http://googletesting.blogspot.com/2007/02/tott-naming-unit-tests-responsibly.html">Responsible Naming</a> for understanding what the test does.</li>
</ol>]]>

</content>
</entry>
<entry>
<title>Testing for errant network connections</title>
<link rel="alternate" type="text/html" href="http://weblogs.java.net/blog/tomwhite/archive/2007/02/testing_for_err.html" />
<modified>2007-02-08T09:50:05Z</modified>
<issued>2007-02-08T09:37:31Z</issued>
<id>tag:weblogs.java.net,2007:/blog/tomwhite/225.6508</id>
<created>2007-02-08T09:37:31Z</created>
<summary type="text/plain">Or, &quot;Why&apos;s my application connecting to that site?!&quot;</summary>
<author>
<name>tomwhite</name>

<email>tom@tiling.org</email>
</author>
<dc:subject>Testing</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://weblogs.java.net/blog/tomwhite/">
<![CDATA[<p>
We kept breaking our <a href="http://www.xml.com/pub/a/2004/03/03/catalogs.html">XML catalog resolution</a> in the course of developing an application. We would refactor the parser code, or we would upgrade a schema and forget to upgrade the catalog. The application wouldn't break, but it took longer to run since resources were being retrieved over the network rather than using the local catalog. Because we didn't time our test runs, and because we had lots of non-network dependent tests in the suite, this regression would go unnoticed for a while. When it was noticed we'd fix the symptom, then move on. Until it happened again...
</p>

<p>
After the sixth or so occurrence in few years, I wrote a class to detect the problem. It's a simple implementation of <a href="http://java.sun.com/javase/6/docs/api/java/lang/SecurityManager.html">SecurityManager</a> that throws an exception when an attempt is made to connect to a site that is not on the list of approved hosts. The code appears below.
</p>

<p>
To use it we set the <code>-Djava.security.manager</code> command line argument (to the fully-qualified classname of <code>RestrictedNetworkAccessSecurityManager</code>) when running our test suites. Tests that access hosts that we aren't expecting will then fail with an error.
</p>

<p>
It's not just for applications that use XML catalogs. It's useful for almost any code, as a way to audit - and regression test - the network resources your application depends on. (Like a database you thought wasn't being used any more.) It's also a simple way to discover when third-party libraries connect to the internet unannounced.
</p>

<p>
The code is rough and ready - it's good enough for testing, but not really suitable for anything else, since it is totally permissive except for the checks on outbound connections. Feel free to use it, and suggest improvements or ways you've tackled the same problem in the comments section.
</p>
<pre><code>
<font color="navy"><b>import</b></font> java<font color="blue"><b>.</b></font>net<font color="blue"><b>.</b></font>SocketPermission<font color="blue"><b>;</b></font>
<font color="navy"><b>import</b></font> java<font color="blue"><b>.</b></font>security<font color="blue"><b>.</b></font>AccessControlException<font color="blue"><b>;</b></font>
<font color="navy"><b>import</b></font> java<font color="blue"><b>.</b></font>security<font color="blue"><b>.</b></font>Permission<font color="blue"><b>;</b></font>
<font color="navy"><b>import</b></font> java<font color="blue"><b>.</b></font>util<font color="blue"><b>.</b></font>Arrays<font color="blue"><b>;</b></font>
<font color="navy"><b>import</b></font> java<font color="blue"><b>.</b></font>util<font color="blue"><b>.</b></font>List<font color="blue"><b>;</b></font>

<font color="navy"><b>public</b></font> <font color="navy"><b>class</b></font> RestrictedNetworkAccessSecurityManager <font color="navy"><b>extends</b></font> SecurityManager <font color="blue"><b>{</b></font>
  <font color="navy"><b>private</b></font> <font color="navy"><b>static</b></font> <font color="navy"><b>final</b></font> List<font color="blue">&lt;</font>String<font color="blue">&gt;</font> ALLOWED_HOSTNAMES <font color="blue">=</font> Arrays<font color="blue"><b>.</b></font>asList<font color="blue"><b>(</b></font><font color="navy"><b>new</b></font> String<font color="blue"><b>[</b></font><font color="blue"><b>]</b></font><font color="blue"><b>{</b></font>
    <font color="red">"localhost"</font><font color="blue"><b>,</b></font> <font color="red">"127.0.0.1"</font><font color="blue"><b>,</b></font>
  <font color="blue"><b>}</b></font><font color="blue"><b>)</b></font><font color="blue"><b>;</b></font>

  @Override
  <font color="navy"><b>public</b></font> <font color="navy"><b>void</b></font> checkConnect<font color="blue"><b>(</b></font>String host<font color="blue"><b>,</b></font> <font color="navy"><b>int</b></font> port<font color="blue"><b>,</b></font> Object context<font color="blue"><b>)</b></font> <font color="blue"><b>{</b></font>
    checkConnect<font color="blue"><b>(</b></font>host<font color="blue"><b>,</b></font> port<font color="blue"><b>)</b></font><font color="blue"><b>;</b></font>
  <font color="blue"><b>}</b></font>

  @Override
  <font color="navy"><b>public</b></font> <font color="navy"><b>void</b></font> checkConnect<font color="blue"><b>(</b></font>String host<font color="blue"><b>,</b></font> <font color="navy"><b>int</b></font> port<font color="blue"><b>)</b></font> <font color="blue"><b>{</b></font>
    <font color="navy"><b>if</b></font> <font color="blue"><b>(</b></font>host <font color="blue">=</font><font color="blue">=</font> <font color="navy"><b>null</b></font><font color="blue"><b>)</b></font> <font color="blue"><b>{</b></font>
      <font color="navy"><b>throw</b></font> <font color="navy"><b>new</b></font> NullPointerException<font color="blue"><b>(</b></font><font color="red">"host can't be null"</font><font color="blue"><b>)</b></font><font color="blue"><b>;</b></font>
    <font color="blue"><b>}</b></font>
    <font color="navy"><b>if</b></font> <font color="blue"><b>(</b></font><font color="blue">!</font>host<font color="blue"><b>.</b></font>startsWith<font color="blue"><b>(</b></font><font color="red">"["</font><font color="blue"><b>)</b></font> <font color="blue"><font color="blue">&amp;</font><font color="blue">&amp;</font></font> host<font color="blue"><b>.</b></font>indexOf<font color="blue"><b>(</b></font>'<font color="blue">:</font>'<font color="blue"><b>)</b></font> <font color="blue">!</font><font color="blue">=</font> <font color="blue">-</font><font color=BROWN>1</font><font color="blue"><b>)</b></font> <font color="blue"><b>{</b></font>
      host <font color="blue">=</font> <font color="red">"["</font> <font color="blue">+</font> host <font color="blue">+</font> <font color="red">"]"</font><font color="blue"><b>;</b></font>
    <font color="blue"><b>}</b></font>
    <font color="navy"><b>if</b></font> <font color="blue"><b>(</b></font>ALLOWED_HOSTNAMES<font color="blue"><b>.</b></font>contains<font color="blue"><b>(</b></font>host<font color="blue"><b>)</b></font><font color="blue"><b>)</b></font> <font color="blue"><b>{</b></font>
      <font color="navy"><b>return</b></font><font color="blue"><b>;</b></font>
    <font color="blue"><b>}</b></font>
    String hostPort<font color="blue"><b>;</b></font>
    <font color="navy"><b>if</b></font> <font color="blue"><b>(</b></font>port <font color="blue">=</font><font color="blue">=</font> <font color="blue">-</font><font color=BROWN>1</font><font color="blue"><b>)</b></font> <font color="blue"><b>{</b></font>
      hostPort <font color="blue">=</font> host<font color="blue"><b>;</b></font>
    <font color="blue"><b>}</b></font> <font color="navy"><b>else</b></font> <font color="blue"><b>{</b></font>
      hostPort <font color="blue">=</font> host <font color="blue">+</font> <font color="red">":"</font> <font color="blue">+</font> port<font color="blue"><b>;</b></font>
    <font color="blue"><b>}</b></font>
    String message <font color="blue">=</font> <font color="red">"Opening a socket connection to "</font> <font color="blue">+</font> hostPort <font color="blue">+</font> <font color="red">" is restricted."</font><font color="blue"><b>;</b></font>
    <font color="navy"><b>throw</b></font> <font color="navy"><b>new</b></font> AccessControlException<font color="blue"><b>(</b></font>message<font color="blue"><b>,</b></font> <font color="navy"><b>new</b></font> SocketPermission<font color="blue"><b>(</b></font>hostPort<font color="blue"><b>,</b></font> <font color="red">"connect"</font><font color="blue"><b>)</b></font><font color="blue"><b>)</b></font><font color="blue"><b>;</b></font>
  <font color="blue"><b>}</b></font>

  @Override
  <font color="navy"><b>public</b></font> <font color="navy"><b>void</b></font> checkPermission<font color="blue"><b>(</b></font>Permission perm<font color="blue"><b>)</b></font> <font color="blue"><b>{</b></font>
    <font color="green"><I>// Allow all other actions
</I></font>  <font color="blue"><b>}</b></font>

  @Override
  <font color="navy"><b>public</b></font> <font color="navy"><b>void</b></font> checkPermission<font color="blue"><b>(</b></font>Permission perm<font color="blue"><b>,</b></font> Object context<font color="blue"><b>)</b></font> <font color="blue"><b>{</b></font>
    <font color="green"><I>// Allow all other actions
</I></font>  <font color="blue"><b>}</b></font>
<font color="blue"><b>}</b></font></code></pre>]]>

</content>
</entry>
<entry>
<title>Hamcrest</title>
<link rel="alternate" type="text/html" href="http://weblogs.java.net/blog/tomwhite/archive/2006/12/hamcrest_1.html" />
<modified>2006-12-22T20:27:28Z</modified>
<issued>2006-12-22T20:27:00Z</issued>
<id>tag:weblogs.java.net,2006:/blog/tomwhite/225.6208</id>
<created>2006-12-22T20:27:00Z</created>
<summary type="text/plain">Hamcrest release 1.0 is now available. It allows you to write flexible assertions in your unit testing framework of choice.</summary>
<author>
<name>tomwhite</name>

<email>tom@tiling.org</email>
</author>
<dc:subject>Testing</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://weblogs.java.net/blog/tomwhite/">
<![CDATA[<p>
In <a href="http://weblogs.java.net/blog/tomwhite/archive/2006/05/literate_progra_1.html">Literate Programming with jMock</a>
I enthused about jMock's idea of constraints and <a href="http://joe.truemesh.com/blog//000511.html">flexible assertions</a>.
Now the jMock team has released version 1.0 of <a href="http://code.google.com/p/hamcrest/">Hamcrest</a>,
the constraints part of jMock.
</p>

<p>
Hamcrest matchers (what were called constraints in jMock) are actually useful for more than just writing unit tests,
but it is their application in writing assertions
where they really shine and will probably see most use.
</p>

<p>
So now I can write
</p>

<pre><code>assertThat(a, equalTo(b))</pre></code>

<p>
or even
</p>

<pre><code>assertThat(a, is(equalTo(b)))</pre></code>

<p>
rather than JUnit's
</p>

<pre><code>assertEquals(b, a)</pre></code>

<p>
Apart from being more readable, <code>assertThat</code> takes any matcher as the second argument, so
I can combine matchers or even <a href="http://code.google.com/p/hamcrest/wiki/Tutorial">write my own</a>
rather than creating an ever growing listof <code>assertXxx</code> methods. For example, I can say such things as
</p>

<pre><code>assertThat(collection, hasItem(anyOf(is(item1), is(item2))));</pre></code>

<p>
And since the matchers (and the <code>assertThat</code> method) are accessed using static imports,
I can use them in any test framework I like.
</p>

<p>
So why not read the <a href="http://code.google.com/p/hamcrest/wiki/Tutorial">tutorial</a>
and give Hamcrest a go?
</p>]]>

</content>
</entry>
<entry>
<title>Lift Off</title>
<link rel="alternate" type="text/html" href="http://weblogs.java.net/blog/tomwhite/archive/2006/10/lift_off.html" />
<modified>2006-10-30T11:51:13Z</modified>
<issued>2006-10-30T11:50:54Z</issued>
<id>tag:weblogs.java.net,2006:/blog/tomwhite/225.5821</id>
<created>2006-10-30T11:50:54Z</created>
<summary type="text/plain">Introducing LiFT - a Literate Functional Testing framework for making your web application tests more readable.</summary>
<author>
<name>tomwhite</name>

<email>tom@tiling.org</email>
</author>
<dc:subject>Testing</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://weblogs.java.net/blog/tomwhite/">
<![CDATA[<p>
In a <a href="http://weblogs.java.net/blog/tomwhite/archive/2006/05/literate_progra_1.html">previous blog entry</a> I mentioned a literate functional testing framework that we had developed at our company, <a href="http://www.kizoom.com">Kizoom</a>. The framework was initially developed by my colleague <a href="http://chatley.com/blog/">Robert Chatley</a> for testing a rather obscure digital TV XML application. After that project we produced a new version of the framework for testing HTML applications, which we used internally for a number of projects with great success. Last month we spend a bit of time cleaning up the API and released LiFT, short for <i>Literate Functional Testing</i>, as an <a href="https://lift.dev.java.net/">open source project</a> on java.net.
</p>

<p>
Here I'd like to give a flavour of LiFT and why you might want to use it.
</p>

<p>
<img src="http://www.kizoom.com/images/summer04.jpg" alt="Kizoom Summer Party 2004" align="right"/>
When you set out to test your web application you need to think about the mechanics of how you interact with the application. Typically this involves fetching pages, finding elements on the page that you want to read (so you can make an assertion about them) and elements you want to change (so you can simulate user input). Often, the code required to handle these aspects obscures the intent of the test. Consider this <a href="http://httpunit.sourceforge.net/index.html">HttpUnit</a> test to do a Google search for an image of our ill-fated company summer party of two years ago:

</p>

<pre><code><font color="navy"><b>import</b></font> junit.framework.TestCase;<br><br><font color="navy"><b>import</b></font> com.meterware.httpunit.WebConversation;<br><font color="navy"><b>import</b></font> com.meterware.httpunit.WebForm;<br><font color="navy"><b>import</b></font> com.meterware.httpunit.WebImage;<br><font color="navy"><b>import</b></font> com.meterware.httpunit.WebResponse;<br><br><font color="navy"><b>public</b></font> <font color="navy"><b>class</b></font> HttpUnitGoogleTest <font color="navy"><b>extends</b></font> TestCase <font color="navy">{</font><br><br>    <font color="navy"><b>public</b></font> <font color="navy"><b>void</b></font> testGoogleImageSearch() <font color="navy"><b>throws</b></font> Exception <font color="navy">{</font><br>        WebConversation conversation = <font color="navy"><b>new</b></font> WebConversation();<br>        WebResponse page = conversation.getResponse(<font color="red">"http://www.google.com/"</font>);<br>        assertEquals(page.getTitle(), <font color="red">"Google"</font>);<br>        page = page.getLinkWith(<font color="red">"Images"</font>).click();<br>        WebForm form = page.getForms()[0];<br>        form.setParameter(<font color="red">"q"</font>, <font color="red">"kizoom"</font>);<br>        page = form.submit();<br>        assertTrue(page.getText().contains(<font color="red">"&lt;b&gt;Kizoom&lt;/b&gt; summer party"</font>));<br>        <font color="navy"><b>boolean</b></font> foundImage = <font color="navy"><b>false</b></font>;<br>        <font color="navy"><b>for</b></font> (WebImage i : page.getImages()) <font color="navy">{</font><br>            <font color="navy"><b>if</b></font> (i.getSource().endsWith(<font color="red">"summer04.jpg"</font>)) <font color="navy">{</font><br>                foundImage = <font color="navy"><b>true</b></font>;<br>                <font color="navy"><b>break</b></font>;<br>            <font color="navy">}</font><br>        <font color="navy">}</font><br>        assertTrue(foundImage);<br>    <font color="navy">}</font><br><br><font color="navy">}</font><br></code>
</pre>

<p>
Hmm, not too clear what it's testing. I'm not trying to single out HttpUnit here - it's a great tool for interacting with web applications, but it's pretty low-level. Of course, I could abstract some of the operations into helper methods to make the test clearer. For instance, the bit at the end where I'm looking for an image is ripe for extracting as a method. In fact, this is the approach we took for years - building libraries of helper functions that made using HttpUnit a bit less of an effort. However, it's not a structured approach. We can do better.
</p>

<pre><code><font color="navy"><b>import</b></font> com.kizoom.lift.NavigatingLiftTestCase;<br><br><font color="navy"><b>import</b></font> <font color="navy"><b>static</b></font> com.kizoom.lift.constraints.Constraints.*;<br><font color="navy"><b>import</b></font> <font color="navy"><b>static</b></font> com.kizoom.lift.matcher.Matchers.*;<br><br><font color="navy"><b>public</b></font> <font color="navy"><b>class</b></font> GoogleTest <font color="navy"><b>extends</b></font> NavigatingLiftTestCase <font color="navy">{</font><br><br>    <font color="navy"><b>public</b></font> <font color="navy"><b>void</b></font> testGoogleImageSearch() <font color="navy"><b>throws</b></font> Exception <font color="navy">{</font><br>        goTo(<font color="red">"http://www.google.com/"</font>);<br>        assertThat(page, has(title(<font color="red">"Google"</font>)));<br>        clickOn(link(<font color="red">"Images"</font>));<br>        enter(<font color="red">"kizoom"</font>, into(textField()));<br>        clickOn(button(<font color="red">"Search Images"</font>));<br>        assertThat(page, has(text(<font color="red">"Kizoom summer party"</font>)));<br>        assertThat(it, has(image().withUrlThat(endsWith(<font color="red">"summer04.jpg"</font>))));<br>    <font color="navy">}</font><br><font color="navy">}</font><br>
</code></pre>

<p>
This is much more concise. But more importantly it is much more <i>readable</i>. Each line can be read as a sentence, either as a jMock-style <a href="http://joe.truemesh.com/blog/000511.html">flexible assertion</a> starting with "assert that", or as a command, such as "click on".
<p>

<p>
Page navigation is handled for us - we are provided with a <code>page</code> variable which always refers to the current page we are on.
<p>

<p>
Perhaps the most powerful idea is in the use of <i>matchers</i>. Matchers allow us to refer to bits of the page - so we can assert something about them or do something to them. For example, <code>title("Google")</code> creates a matcher that matches HTML <i>title</i> elements with text content "Google". We then use the <code>has</code> constraint to assert that there is (at least) one such match.
Or take the next line: <code>link("Images")</code> matches a link with text "Images" which we use to navigate to the next page by means of the <code>clickOn</code> action.
</p>

<p>
Matchers allow for extensibility  - it is easy to write your own if you need to - and help avoid the creation of <i>ad hoc</i> helper method libraries. Often you don't even need to go this far, you can <i>refine</i> an existing matcher using any jMock constraint. This is what is done in the last line. The <code>withUrlThat</code> refinement on the image matcher limits the matches to those that meet the given constraint. In this case we use the jMock <code>endsWith</code> constraint to match the image filename.
</p>

<p>
LiFT still has some rough edges - for example, its error messages are not always as clear as they could be - but it is definitely ready for testing real web applications.
There's much more I could say, but I won't here. If you want to learn more try LiFT out by following the <a href="https://lift.dev.java.net/introduction.html">introduction</a>, or read the <a href="https://lift.dev.java.net/literate-functional-testing.pdf">slides</a> from the talk Robert and I gave at the <i>Google London Test Automation Conference</i> in September, or even watch the <a href="http://video.google.com/videoplay?docid=1505469784301926538&amp;q=label%3Altac">video</a>.
</p>]]>

</content>
</entry>
<entry>
<title>Are your beans thread-safe?</title>
<link rel="alternate" type="text/html" href="http://weblogs.java.net/blog/tomwhite/archive/2006/09/are_your_beans_1.html" />
<modified>2006-09-22T09:59:28Z</modified>
<issued>2006-09-21T21:55:00Z</issued>
<id>tag:weblogs.java.net,2006:/blog/tomwhite/225.5604</id>
<created>2006-09-21T21:55:00Z</created>
<summary type="text/plain">Why it&apos;s worth being a little paranoid about what your IoC container does in a multi-threaded environment.</summary>
<author>
<name>tomwhite</name>

<email>tom@tiling.org</email>
</author>
<dc:subject>J2SE</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://weblogs.java.net/blog/tomwhite/">
<![CDATA[<p><b>[Update: changed wording per comments to fix error.]</b></p>
<p>
<a href="http://www.martinfowler.com/articles/injection.html" target="_blank">Dependency injection</a> is pretty well established these days, with plenty of Inversion of Control containers available to manage your beans. I'm currently reading <a href="http://jcip.net/" target="_blank">Java Concurrency in Practice</a> by Brian Goetz <i>et al</i>, which got me thinking about the thread-safety of large object graphs managed by IoC containers.
</p>

<p>
In most applications I've seen, the common usage pattern is to use dependency injection to wire up your object graph in one go when the application starts. After that the application uses the object graph, and effectively treats it as being immutable. For example, in an application for buying books we might have a <code>BookStore</code> object that is given a <code>BookFinder</code> so it can find whether a particular book is available. The <code>BookFinder</code> is created at the beginning of the application and never changes. It is common to code this using setter injection:

</p>

<pre><code>public class BookFinder {

    // BookFinder's business methods...

}

public class BookStore {
    private BookFinder bookFinder;

    public void setBookFinder(BookFinder bookFinder) {
        this.bookFinder = bookFinder;
    }

    // BookStore's business methods which use bookFinder...

}</code></pre>

<p>
The problem is that it's not clear that this code is thread-safe. Unless the container publishes the <code>BookStore</code> bean safely then there is no guarantee that other threads will see the value of
the <code>BookFinder</code> bean. If this seems weird then that's because it is. To quote <i>Java Concurrency in Practice</i> (p33): &quot;In general, there is <i>no</i> guarantee that the reading thread will see a value written by another thread on a timely basis, or even at all.&quot;

</p>

<p>
Thankfully, this is easy to fix. We can mark the <code>bookFinder</code> instance as <code>volatile</code> to ensure its visibility to other threads:
</p>

<pre><code>public class BookStore {
    private <b>volatile</b> BookFinder bookFinder;

    public void setBookFinder(BookFinder bookFinder) {
        this.bookFinder = bookFinder;
    }

    // BookStore's business methods which use bookFinder...

}</code></pre>

<p>
Alternatively, we can use constructor injection and <code>final</code> fields. This works because the Java Memory Model makes special guarantees about the safety of final fields.
</p>

<pre><code>public class BookStore {
    private <b>final</b> BookFinder bookFinder;

    public BookStore(BookFinder bookFinder) {
        this.bookFinder = bookFinder;
    }

    // BookStore's business methods which use bookFinder...

}</code></pre>

<h3>The Future</h3>

<p>
So could the IoC container providers take steps to ensure that application developers don't have to worry about thread safety in wiring up object graphs? Possibly, although at the time of writing it is less than clear whether the popular containers do or not. Tim Peierls, one of the authors of <i>Java Concurrency in Practice</i>, wrote in a email to me that &quot;Until it's proven safe, I have adopted a rule that all such setter-injected effectively immutable fields must be volatile.&quot; He also suggested marking the field with an annotation, such as <code>@WriteOnce</code>, so that when your container's behaviour is clarified it is easy to go through code and remove the <code>volatile</code> modifier.
</p>]]>

</content>
</entry>
<entry>
<title>Affordable Web-Scale Computing Redux</title>
<link rel="alternate" type="text/html" href="http://weblogs.java.net/blog/tomwhite/archive/2006/08/affordable_webs_1.html" />
<modified>2006-08-24T22:02:10Z</modified>
<issued>2006-08-24T22:01:55Z</issued>
<id>tag:weblogs.java.net,2006:/blog/tomwhite/225.5423</id>
<created>2006-08-24T22:01:55Z</created>
<summary type="text/plain">Amazon&apos;s new Elastic Compute Cloud should be a perfect fit for running Hadoop jobs.</summary>
<author>
<name>tomwhite</name>

<email>tom@tiling.org</email>
</author>
<dc:subject>Distributed</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://weblogs.java.net/blog/tomwhite/">
<![CDATA[<p>
In March I  wrote of <a href="http://weblogs.java.net/blog/tomwhite/archive/2006/03/affordable_webs.html">affordable web-scale computing</a>:
</p>

<blockquote>
<p>
I would love an API that exposes Google's <a href="http://labs.google.com/papers/mapreduce.html">MapReduce</a>, a simple programming model for crunching on large datasets. You can write and run MapReduce programs today, using <a href="http://lucene.apache.org/hadoop/">Hadoop</a>, but it's only really useful if you have enough machines at your disposal. The pay-as-you-go model of S3 (and <a href="http://www.sun.com/service/sungrid/overview.jsp">Sun Grid</a>) would be very attractive to developers who want to run <i>ad hoc</i> computations, or can't afford the upfront investment in hardware.
</p>
</blockquote>

<p>
Well, now it's possible with the beta launch of <a href="http://aws.amazon.com/ec2">Amazon EC2</a> (picked up from the <a href="http://radar.oreilly.com/archives/2006/08/fantastic_elastic.html">O'Reilly Radar</a>). EC2 (apart from coincidentally being the postcode of my company's new London office) stands for <i>Elastic Compute Cloud</i> and allows you to commission compute resources on an on-demand basis using simple web-service-based tools.  The unit of compute capacity is an Amazon Machine Image (AMI) - a Linux image - which you can configure to have any software you like on it. You can run any number of instances, paying for the number of instance hours you use (and the data you transfer).
</p>

<p>
This goes beyond what I wished for in March as it allows you to run anything on the image! Going back to Hadoop and MapReduce, I can imagine a generic Hadoop AMI that you configure your job on, before commissioning a number of EC2 server instances to run it. Press go, wait for your job to complete and then decommission the server instances.
</p>

<p>
Definitely one to watch.
</p>]]>

</content>
</entry>
<entry>
<title>S3Map</title>
<link rel="alternate" type="text/html" href="http://weblogs.java.net/blog/tomwhite/archive/2006/08/s3map.html" />
<modified>2006-08-13T20:31:04Z</modified>
<issued>2006-08-13T20:30:47Z</issued>
<id>tag:weblogs.java.net,2006:/blog/tomwhite/225.5339</id>
<created>2006-08-13T20:30:47Z</created>
<summary type="text/plain">Implementing a distributed java.util.Map using Amazon S3.</summary>
<author>
<name>tomwhite</name>

<email>tom@tiling.org</email>
</author>
<dc:subject>Distributed</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://weblogs.java.net/blog/tomwhite/">
<![CDATA[<p>
In case you haven't heard of it, <a href="http://aws.amazon.com/s3">Amazon S3</a> is a web service for storing data.
The two great things about it are that it's simple (look at its nice REST API), and it's cheap (with a pay-as-you-go charging model).

This latter point explains <a href="http://gigaom.com/2006/07/13/startups-embracing-amazon-s3/">the growing number of startups</a> that are using it to launch new business ventures: no data silos to maintain, and pay by the gigabyte.

My favourite innovative service to use Amazon S3 uses AJAX to great effect to implement a wiki that stores its content on S3. Read <a href="http://decafbad.com/blog/2006/04/21/an-s3-ajax-wiki">all</a> <a href="http://decafbad.com/blog/2006/04/23/more-on-s3ajaxwiki">about</a> <a href="http://decafbad.com/blog/2006/04/24/s3ajaxwiki-got-noticed">it</a>.
If you think about it, this is an interactive service that resides entirely on S3, so it will scale and scale - there's no need for an application server. I think Content Management Systems could take this approach too.
</p>
<p>
It struck me that you could treat S3 as a big hashtable, so I tried writing an implementation of <code>java.util.Map</code> (built on the <a href="http://developer.amazonwebservices.com/connect/entry.jspa?externalID=132&categoryID=47">Amazon S3 Library for REST in Java</a>) that uses S3 as its backing store.  Here's some code to put some arbitrary objects into it:
</p>

<pre><code>Map&lt;String, Fruit&gt; map = new S3Map&lt;Fruit&gt;(bucket, awsAccessKeyId, awsSecretAccessKey);
map.put(&quot;breakfast&quot;, new Fruit(&quot;banana&quot;));
map.put(&quot;lunch&quot;, new Fruit(&quot;apple&quot;));

for (Map.Entry&lt;String, Fruit&gt; entry : map.entrySet()) {
  System.out.println(entry.getKey() + &quot;: &quot; + entry.getValue().getName());
}
</code></pre>

<p>
This code prints:
</p>

<pre><code>breakfast: banana
lunch: apple
</code></pre>

<p>
Notice that keys are strings, but the objects in the map can be of any type, as long as they are <code>Serializable</code> (as <code>Fruit</code> is), although even this restriction can be dropped by providing a custom serialization strategy when constructing the map. (Something like <a href="http://xstream.codehaus.org/">XStream</a> would be a good choice here.)
</p>

<p>
Also, the map needs the connection parameters for an S3 account, and a bucket name. One bucket corresponds to one map.
The map is persistent, so I can run the code above again, but without putting anything into the map, and the values put in previously will be retrieved and printed out exactly as before.
</p>

<pre><code>Map&lt;String, Fruit&gt; map = new S3Map&lt;Fruit&gt;(bucket, awsAccessKeyId, awsSecretAccessKey);

for (Map.Entry&lt;String, Fruit&gt; entry : map.entrySet()) {
  System.out.println(entry.getKey() + &quot;: &quot; + entry.getValue().getName());
}
</code></pre>
<p>
Writing an implementation of <code>java.util.Map</code> is interesting. It's actually straightforward to do, since you start by subclassing <code>java.util.AbstractMap</code> and implementing a handful of methods.
There's a line in the javadoc which says &quot;Each of these methods may be overridden if the map being implemented admits a more efficient implementation.&quot;

Normally we strive for an implementation that is efficient in terms of <i>speed</i>, however in this case I was very aware of the <i>monetary cost</i> of each method. After all, I'm paying $0.20 per GB to get my data in and out of S3. There's a table in the <a href="http://s3.amazonaws.com/s3map/javadoc/org/tiling/s3map/S3Map.html">javadoc</a> that sets out the number of S3 operations performed by each method so you can get a feel for their cost. As it turned out, I ended up overriding almost all of <code>AbstractMap</code>'s methods.
</p>

<p>
So, what is it useful for? Well, I haven't actually had a use for it yet, but S3Map is really a non-transactional persistent datastore, so it could be used in many scenarios, from applets (if the applet is hosted on S3 too), to desktop apps (to share user data between machines?), to server-side apps. If you're intrigued, you can try it out yourself by signing up for an <a href="http://aws.amazon.com/s3">Amazon S3</a> account, and downloading <a href="http://s3.amazonaws.com/s3map/s3map-1.0.zip">S3Map</a> (it's hosted on S3 of course).
</p>]]>

</content>
</entry>
<entry>
<title>Pluralization</title>
<link rel="alternate" type="text/html" href="http://weblogs.java.net/blog/tomwhite/archive/2006/07/pluralization.html" />
<modified>2006-07-26T21:15:55Z</modified>
<issued>2006-07-26T21:14:01Z</issued>
<id>tag:weblogs.java.net,2006:/blog/tomwhite/225.5256</id>
<created>2006-07-26T21:14:01Z</created>
<summary type="text/plain">Tool builders can now easily add pluralization to their applications using Inflector, a new Java library hosted on java.net.</summary>
<author>
<name>tomwhite</name>

<email>tom@tiling.org</email>
</author>
<dc:subject>Tools</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://weblogs.java.net/blog/tomwhite/">
<![CDATA[<blockquote>
<p>
<i>Singulars and plurals are so different, bless my soul.</i><br />
<i>Has it ever occurred to you that the plural of &quot;half&quot; is &quot;whole&quot;?</i>
</p>
<p>
Allan Sherman, <i>One Hippopotami</i>

</p>
</blockquote>

<p>
It's well known that Perl has been <a href="http://www.wall.org/~larry/natural.html">heavily influenced by ideas from natural language</a>, which probably explains why it has so many libraries for handling natural language text. Java is less well served, but as our programming tools become more powerful, it is inevitable that they will provide more linguistic support. It would be handy for example if your IDE told you in a not-too-obtrusive way that you had <a href="http://www.oreillynet.com/onlamp/blog/2006/03/the_worlds_most_maintainable_p_4.html">mispelled an identifier</a>.
</p>

<p>
One linguistic feature that is already in today's tools is <i>pluralization</i> - forming plurals of words. The most <a href="http://weblog.rubyonrails.org/2005/08/25/10-reasons-rails-does-pluralization/">famous example</a> is Ruby on Rails, which automatically creates names in the plural for collections of a singular entity. The JAXB 2.0 Reference Implementation also uses pluralization in <a href="https://jaxb.dev.java.net/nonav/2.0.1/docs/vendorCustomizations.html#simple">generating bindings</a>   of repeatable XML elements by pluralizing the corresponding Java property name.


</p>

<p>
Tool builders can now easily add pluralization to their applications using <a href="https://inflector.dev.java.net/">Inflector</a>, a new Java library hosted on java.net. Using it can be as simple as calling:
</p>

<pre><code>import static org.jvnet.inflector.Noun.pluralOf;
...
pluralOf(&quot;hippopotamus&quot;);</code></pre>

<p>
The library is also useful for producing natural language messages. For instance, the following code will print the message <i>I bought 10 loaves.</i>

</p>

<pre><code>int n = 10;
System.out.printf(&quot;I bought %d %s.&quot;, n, pluralOf(&quot;loaf&quot;, n));</code></pre>

<p>
(See the <a href="https://inflector.dev.java.net/nonav/maven/apidocs/index.html">documentation</a> for more examples.)
</p>

<p>

In principle this would be easy to internationalize as Inflector does have multi-language support. However at present Inflector's implementations of pluralization algorithms for different languages are a little thin on the ground. English is practically complete (thanks to Damian Conway's <a href="http://www.csse.monash.edu.au/~damian/papers/HTML/Plurals.html">excellent contribution</a> to the subject). There is an implementation for Italian too, but it is incomplete.
So, if you have (grammatical) expertise in a language for which there is no pluralization algorithm, and you would like to get involved, then consider <a href="https://inflector.dev.java.net/">joining the project</a>.
</p>]]>

</content>
</entry>
<entry>
<title>More Literate Programming: Language-Level Anaphora</title>
<link rel="alternate" type="text/html" href="http://weblogs.java.net/blog/tomwhite/archive/2006/06/more_literate_p_1.html" />
<modified>2006-06-29T21:21:34Z</modified>
<issued>2006-06-29T21:21:21Z</issued>
<id>tag:weblogs.java.net,2006:/blog/tomwhite/225.5125</id>
<created>2006-06-29T21:21:21Z</created>
<summary type="text/plain">Following on from a previous post about using anaphora (a word like it that refers to something previously referred to) to make jMock tests more readable, I ask &quot;Can we have language-level anaphora?&quot;</summary>
<author>
<name>tomwhite</name>

<email>tom@tiling.org</email>
</author>
<dc:subject>Programming</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://weblogs.java.net/blog/tomwhite/">
<![CDATA[<p>
Last month I blogged about <a href="http://weblogs.java.net/blog/tomwhite/archive/2006/05/literate_progra_1.html">Literate Programming with jMock</a>, and also about <a href="http://weblogs.java.net/blog/tomwhite/archive/2006/05/more_literate_p.html">using anaphora</a> to avoid repetition in the tests. (An <i>anaphor</i> is a word like <i>it</i> that refers to something previously referred to.)
</p>

<p>
This got me thinking: is it possible to use anaphora more widely at the language level? Would such constructs be useful? Before trying to do this in Java I looked at more dynamic languages, starting with a very quick look at Lisp, where I first came across anaphora in programming languages.
</p>]]>
<![CDATA[<h3>Common Lisp</h3>

<p>
Paul Graham, in his wonderful book <a href="http://www.paulgraham.com/acl.html">ANSI Common Lisp</a> (which is well worth a read, by the way, even if you never intend to write a line of Lisp), introduces an <i>anaphoric if</i> that captures the test condition in a variable called <code>it</code>. His example is:
</p>

<pre><code>(aif (calculate-something)
     (1+ it)
     0)</code></pre>

<p>
This can be read as &quot;if the variable <code>calculate-something</code> is non-zero, then return one plus its value, otherwise return zero&quot;. The <code>aif</code> construct is actually a macro, a very powerful feature that effectively allows you to re-write the language. I won't go into the details here as it's very clearly explained in the book. In a nutshell, the Lisp compiler transforms the expression into another chunk of code (defined by the macro) that it can understand - in this case a regular <code>if</code> with a bit of variable capture.
</p>

<p>
Let's try another language - Ruby.
</p>

<h3>Ruby</h3>

<p>
Here's how <code>aif</code> would be used in Ruby:
</p>

<pre><code>hash = { 'a' =&gt; 'peach', 'b' =&gt; 'pear', 'c' =&gt; 'plum'}
aif hash['a'] do
  puts @it
end
</code></pre>

<p>
This code snippet prints <code>peach</code> to the console
</p>

<p>
Ruby doesn't have macros, so the way this is implemented is not like Lisp, it's actually closer to the jMock approach. We basically extend <code>Object</code>, and add an <code>aif</code> method that takes a conditional expression and a block. If the expression is true, then the instance variable <code>@it</code> (instance variables always start with <code>@</code> in Ruby) is set to the value of the expression and the block is executed. Here's the definition:
</p>

<pre><code>class Object
  def aif(expression)
    if expression
      @it = expression
      yield
    end
  end
end</code></pre>

<p>
The Ruby implementation is inferior to the Lisp one in a few ways (although some of this could be down to my inexperience in Ruby - I invite seasoned Rubyists to improve the code!). Firstly, there is no way that I can see of supporting an else clause for <code>aif</code> (the Lisp version does this easily). Secondly, there are concurrency issues - storing state in an instance variable is dangerous if multiple threads are using the object. However, this is fairly straightforward to solve with synchronization or by using thread-local variables.  Thirdly, there is a minor, but potentially irritating, syntactic difference between <code>aif</code> and <code>if</code>: <code>aif</code> takes a block and hence has an extra <code>do</code>. Compare:
</p>

<pre><code>if hash['a']
  puts hash['a']
end</code></pre>

<p>and</p>

<pre><code>aif hash['a'] <b>do</b>
  puts @it
end
</code></pre>

<h3>Java</h3>

<p>
Can you do the same thing in Java? Let's try to replicate the Ruby approach since Java doesn't have Lisp macros. We can't add methods to <code>Object</code> in the way we can in Ruby, so instead create a class called <code>AnaphoricObject</code> with a static method called <code>aif</code>. We can then either extend <code>AnaphoricObject</code> or statically import <code>aif</code> when we want to use anaphoric if.
</p>

<p>
Now we hit a problem: what does the definition of <code>aif</code> look like? It would take an <code>Object</code> as the test expression, but then we would need to cast the <code>it</code> variable to the type we wanted. Doable, but not very pleasant. (Aren't we trying to improve readability?) Worse, how do we supply a block of code? We can't do it. There is probably a way to do it using anonymous inner classes, but I don't want to go down that alley - we'll lose all improvements in syntax which is why we were trying to do this in the first place.
</p>

<h3>Conclusion</h3>

<p>
Obviously, I like Java a lot, but I know its limits. Anaphora work well for an API like jMock, but not at the Java language level. Even Ruby, touted for its ease of <a href="http://poignantguide.net/ruby/chapter-6.html">meta-programming</a>, struggles to provide a nice implementation of anaphoric if (although, again, I'd be happy to be proved wrong on this). Lisp manages it, if only because it has macro support. (It's probably possible in your favourite language too.) 
</p>

<p>
But, do we really need anaphoric if - after all, you can probably argue that you've never needed it. This is actually a weak argument. For example,  I use Java 5 static imports all the time now and wouldn't like to give them up, but I didn't rage about not having them before I was introduced to them. I didn't know what I was missing. Similarly, once you've used constructs like anaphoric if, you get to know when it is useful, then start missing it in languages where it's not available. Here's Paul Graham again (from <i>ANSI Common Lisp</i>):
</p>

<blockquote>
Is it worth writing a macro just to save typing? Very much so. Saving typing is what programming languages are all about; the purpose of the compiler is to save you from typing your programs in machine language. And macros allow you to bring to your specific applications the same kinds of advantages that high-level languages bring to programming in general. By the careful use of macros, you may be able to make your programs significantly shorter than they would be otherwise, and proportionately easier to read, write, and maintain.
</blockquote>

<p>
Although he's talking about macros, I think the point is more general: it's worth striving for higher-level abstractions to make our programs more literate.
</p>]]>
</content>
</entry>
<entry>
<title>More Literate Programming with jMock: Anaphora</title>
<link rel="alternate" type="text/html" href="http://weblogs.java.net/blog/tomwhite/archive/2006/05/more_literate_p.html" />
<modified>2006-05-14T10:30:47Z</modified>
<issued>2006-05-14T10:30:37Z</issued>
<id>tag:weblogs.java.net,2006:/blog/tomwhite/225.4726</id>
<created>2006-05-14T10:30:37Z</created>
<summary type="text/plain">How to reduce repetition in jMock tests using an idea from natural languages.</summary>
<author>
<name>tomwhite</name>

<email>tom@tiling.org</email>
</author>
<dc:subject>Testing</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://weblogs.java.net/blog/tomwhite/">
<![CDATA[<p>
According to the dictionary, an <a href="http://wordnet.princeton.edu/perl/webwn?s=anaphor">anaphor</a> is a word used to avoid repetition. It refers back to something in the conversation. The word "it" in the previous sentence refers back to the word "anaphor" in the first sentence, so "it" is an anaphor for "anaphor". Natural language is often ambiguous, and one reason for this is that it may not be clear which word an anaphor such as "it" is referring to.
</p>

<p>
But ambiguity and programming languages don't go very well together - so why would anyone want to mix the two? Actually, there are circumstances in programming languages where there is no real ambiguity, and an anaphor can have a use in eliminating repetition. Think of it as applying the <a href="http://www.artima.com/intv/dry.html">DRY</a> (Don't Repeat Yourself) principle at the syntax level.
</p>]]>
<![CDATA[<h3>jMock</h3>

<p>
We introduced a limited form of anaphora in the functional test framework I talked about in my previous blog entry, <a href="http://weblogs.java.net/blog/tomwhite/archive/2006/05/literate_progra_1.html">Literate Programming with jMock</a>. Consider some more sample code:
</p>

<pre><code>public void testMainMenu() throws Exception {

    goToTheIndexPage();
    assertThat(theCurrentPage, has(title(&quot;Home Menu&quot;)));
    assertThat(<b>it</b>, has(menuOptions().named(&quot;Departures and Arrivals&quot;,
                                           &quot;Travel News&quot;)));
}</code></pre>

<p>
The variable <code>it</code> refers to the object that was the target of the <code>assertThat</code> call on the previous line; in this case the current page referred to by the <code>theCurrentPage</code> variable. We improve readability and remove multiple references to <code>theCurrentPage</code>, without compromising ambiguity (and therefore without the risk of introducing bugs).
</p>

<p>
So where does the the variable <code>it</code> come from? We created a subclass of <code>org.jmock.MockObjectTestCase</code>, and added a protected instance variable called <code>it</code> of type <code>Object</code>, then overrode the <code>assertThat</code> method to stash away the target object:
</p>

<pre><code>@Override
public void assertThat(Object target, Constraint constraint) {
    it = target;
    super.assertThat(target, constraint);
}
</code></pre>

<p>
That's it.
</p>

<h3>Temporary Local Variables</h3>

<p>
The biggest gain is the case where you can avoid having to create a temporary local variable. You often want to assert several things about an object, so you end up creating a local variable with a short name to refer to the target object:
</p>

<pre><code>public void test() {
    Fruit fruit = lunchbox.getFruit();
    assertThat(fruit, isNotNull());
    assertThat(fruit, hasProperty(&quot;colour&quot;, eq(&quot;yellow&quot;)));
    assertThat(fruit, isIn(lunchbox.getItems()));
}</code></pre>

<p>
Alternatively, you can use the <code>and</code> constraint to make a single assertion, but this is less readable and is hard to format well: 
</p>

<pre><code>public void test() {
    assertThat(lunchbox.getFruit(),
        and(isNotNull(),
            hasProperty(&quot;colour&quot;, eq(&quot;yellow&quot;)),
            isIn(lunchbox.getItems())));
}</code></pre>

<p>
Using <code>it</code> produces the most readable test:
</p>

<pre><code>public void test() {
    assertThat(lunchbox.getFruit(), isNotNull());
    assertThat(it, hasProperty(&quot;colour&quot;, eq(&quot;yellow&quot;)));
    assertThat(it, isIn(lunchbox.getItems()));
}</code></pre>

<p>
Our experience has been that the little bit of syntactic sugar that <code>it</code> provides is convenient, and not at all ambiguous.
</p>]]>
</content>
</entry>
<entry>
<title>Literate Programming with jMock</title>
<link rel="alternate" type="text/html" href="http://weblogs.java.net/blog/tomwhite/archive/2006/05/literate_progra_1.html" />
<modified>2006-05-11T19:11:18Z</modified>
<issued>2006-05-11T19:10:38Z</issued>
<id>tag:weblogs.java.net,2006:/blog/tomwhite/225.4702</id>
<created>2006-05-11T19:10:38Z</created>
<summary type="text/plain">jMock is not just about mock objects, its support for constraints make it a great example of literate programming.</summary>
<author>
<name>tomwhite</name>

<email>tom@tiling.org</email>
</author>
<dc:subject>Testing</dc:subject>
<content type="text/html" mode="escaped" xml:lang="en" xml:base="http://weblogs.java.net/blog/tomwhite/">
<![CDATA[<p>
We've been using <a href="http://jmock.codehaus.org/">jMock</a> at our <a href="http://www.kizoom.com/">company</a> for some time now. We've found it great for test driven development
and isolating our unit tests from the rest of the system more effectively. One aspect of jMock that stands out for me
is its idea of <a href="http://jmock.codehaus.org/constraints.html">constraints</a>. In fact, we've found this idea so useful that we always use the <code>org.jmock.MockObjectTestCase</code> base class
rather than <code>junit.framework.TestCase</code>, even when we aren't mocking anything out. This seems to have registered with the jMock development team, as they are planning to extract the constraints
into a separate project, codenamed Hamcrest (it's an anagram of &quot;matchers&quot;).
</p>]]>
<![CDATA[<p>
We can use constraints to construct more <a href="http://joe.truemesh.com/blog//000511.html">flexible</a> assertions.
</p>
<pre><code>assertThat(a, eq(&quot;3&quot;));</code></pre>

<p>
is more readable and understandable than
</p>

<pre><code>assertEquals(&quot;3&quot;, a);</code></pre>

<p>
You can read it as &quot;assert that a eq(uals) 3&quot;. (The only reason it is not <code>equals</code> is that it conflicts with Object's <code>equals</code> method.) This form makes the code more <i>literate</i>, that is, easier for humans to read, which is the main goal of Donald Knuth's idea of <a href="http://en.wikipedia.org/wiki/Literate_programming">literate programming</a>.
</p>

<p>
The <code>assertThat</code> form is very extensible too - you just write new constraints. We've grown a small library of constraints - a sort of <a href="http://en.wikipedia.org/wiki/Domain_Specific_Language">DSL</a> for unit testing.
The library contains constraints ranging from the simple conveniences <code>isNull</code> and <code>isNotNull</code>, to more complex collections constraints
such as <code>includes</code> which takes a variable number of objects (varargs) that the collection must contain:
</p>

<pre><code>assertThat(list, includes(&quot;peach&quot;, &quot;pear&quot;, &quot;plum&quot;));</code></pre>

<p>
They are all <a href="http://jmock.codehaus.org/custom-constraints.html">easy to write</a>.
</p>

<h3>Consequences</h3>

<p>
One thing we missed from JUnit assertions was the ability to control the failure message. This isn't often needed as the <code>describeTo</code> method of the constraint does a good job in most cases. There are occasions however when you need more context, such as when you have several boolean assertions in a single test and you want to distinguish them. For example, the following test
</p>

<pre><code>public void test() {
    boolean a = true;
    boolean b = false;
    assertThat(a, eq(true));
    assertThat(b, eq(true));
}
</code></pre>

<p>
fails with a message that doesn't tell you if it is <code>a</code> or <code>b</code> that is <code>false</code>:
</p>

<pre><code>junit.framework.AssertionFailedError:
Expected: eq()
    got : false
</code></pre>

<p>
We extended jMock's base test class <code>org.jmock.MockObjectTestCase</code> to take an overloaded version of <code>assertThat</code> that takes a new parameter called a <code>Consequence</code>. The test uses the <code>otherwise</code> method that creates a <code>Consequence</code> instance with the given failure message,
</p>

<pre><code>public void test() {
    boolean a = true;
    boolean b = false;
    assertThat(a, eq(true), otherwise(&quot;a should be true&quot;));
    assertThat(b, eq(true), otherwise(&quot;b should be true&quot;));
}
</code></pre>

<p>
and fails more helpfully:
</p>

<pre><code>junit.framework.AssertionFailedError: b should be true
</code></pre>

<h3>Literate Functional Tests</h3>


<p>
My colleague <a href="http://chatley.com/blog/">Robert Chatley</a>, who worked on the jMock extensions described above, has recently taken the idea of literate testing further and applied it to functional (acceptance) tests. He has built a framework for testing a markup language for digital TV services, where as well as the assertions, the steps for driving the test are literate. Here is a sample:
</p>

<pre><code>public void testArrivalsWithUnambiguousOriginAndUnambiguousDestination() throws Exception {

    goToTheTrainArrivalsSearchScreen();

    enter(LONDON_PADDINGTON, into(textBox().labelled(&quot;Arriving at&quot;)));
    enter(&quot;Bristol Temple Meads&quot;, into(textBox().labelled(&quot;Calling at&quot;)));
    enter(&quot;1554&quot;, into(textBox().labelled(&quot;Arriving after&quot;)));
    select(option(&quot;Today&quot;), from(selectBox().labelled(&quot;When&quot;)));

    clickOn(button(&quot;Select&quot;));

    assertThat(theCurrentPage, has(title(&quot;Train arrivals after 15:54 today&quot;)));
}
</code></pre>

<p>
Notice that you can't tell that it is testing a digital TV service. Those details are hidden behind the small collection of methods that allow you to perform actions: <code>goTo()</code> (called by <code>goToTheTrainArrivalsSearchScreen()</code>), <code>enter()</code>, <code>select()</code>, <code>clickOn()</code>; or extract data from the page: <code>textBox().labelled(&quot;Arriving at&quot;)</code>, <code>button(&quot;Select&quot;)</code>.
</p>

<p>
Taking a <a href="http://nat.truemesh.com/archives/000188.html">tip</a> from Nat Pryce, we can apply a neat trick and change Eclipse's settings so that punctuation is white, to make the code even more readable (but less editable)! (Steve Yegge had the <a href="http://www.cabochon.com/~stevey/sokoban/docs/article-kawa.html">same idea</a> for Scheme.) This is what you get:
</p>

<pre><code>public void testArrivalsWithUnambiguousOriginAndUnambiguousDestination<font color="#ffffff">(</font><font color="#ffffff">)</font> throws Exception <font color="#ffffff">{</font>

    goToTheTrainArrivalsSearchScreen<font color="#ffffff">(</font><font color="#ffffff">)</font><font color="#ffffff">;</font>

    enter<font color="#ffffff">(</font>LONDON_PADDINGTON<font color="#ffffff">,</font> into<font color="#ffffff">(</font>textBox<font color="#ffffff">(</font><font color="#ffffff">)</font><font color="#ffffff">.</font>labelled<font color="#ffffff">(</font>&quot;Arriving at&quot;<font color="#ffffff">)</font><font color="#ffffff">)</font><font color="#ffffff">)</font><font color="#ffffff">;</font>
    enter<font color="#ffffff">(</font>&quot;Bristol Temple Meads&quot;<font color="#ffffff">,</font> into<font color="#ffffff">(</font>textBox<font color="#ffffff">(</font><font color="#ffffff">)</font><font color="#ffffff">.</font>labelled<font color="#ffffff">(</font>&quot;Calling at&quot;<font color="#ffffff">)</font><font color="#ffffff">)</font><font color="#ffffff">)</font><font color="#ffffff">;</font>
    enter<font color="#ffffff">(</font>&quot;1554&quot;<font color="#ffffff">,</font> into<font color="#ffffff">(</font>textBox<font color="#ffffff">(</font><font color="#ffffff">)</font><font color="#ffffff">.</font>labelled<font color="#ffffff">(</font>&quot;Arriving after&quot;<font color="#ffffff">)</font><font color="#ffffff">)</font><font color="#ffffff">)</font><font color="#ffffff">;</font>
    select<font color="#ffffff">(</font>option<font color="#ffffff">(</font>&quot;Today&quot;<font color="#ffffff">)</font><font color="#ffffff">,</font> from<font color="#ffffff">(</font>selectBox<font color="#ffffff">(</font><font color="#ffffff">)</font><font color="#ffffff">.</font>labelled<font color="#ffffff">(</font>&quot;When&quot;<font color="#ffffff">)</font><font color="#ffffff">)</font><font color="#ffffff">)</font><font color="#ffffff">;</font>

    clickOn<font color="#ffffff">(</font>button<font color="#ffffff">(</font>&quot;Select&quot;<font color="#ffffff">)</font><font color="#ffffff">)</font><font color="#ffffff">;</font>

    assertThat<font color="#ffffff">(</font>theCurrentPage<font color="#ffffff">,</font> has<font color="#ffffff">(</font>title<font color="#ffffff">(</font>&quot;Train arrivals after 15:54 today&quot;<font color="#ffffff">)</font><font color="#ffffff">)</font><font color="#ffffff">)</font><font color="#ffffff">;</font>
<font color="#ffffff">}</font>
</code></pre>

<p>
Truly literate.
</p>]]>
</content>
</entry>

</feed>