Skip to main content

blueMarine went Semantic

Posted by fabriziogiudici on February 2, 2009 at 8:11 AM PST

About ten months ago I announced that blueMarine was going to use some semantic web technologies. I planned to start blogging about that in May 2008, but indeed this branch of development was delayed, then frozen for a couple of months, then resumed, so it's beyond schedule. But now it's here.

Technically, blueMarine itself isn't using yet semantic web because of some problems in the integration with Derby, problems that I'm working on. But there's a prototype of a website using blueOcean (= server-side counterpart of blueMarine), and it works with MySQL, so I'm not lying even though the title of this post is not 99% precise. In any case, Derby problems will be solved soon.

This is just an introductory post of a series that will illustrate why and how I integrated this stuff in blueMarine. Don't expect a tutorial about the semantic web itself, rather - as usual for me - an architectural perspective about a tool and why it's good/bad. Of course, I'll talk about tools and frameworks too, which happen to be OpenRDF and Elmo; a special care will be spent in explaining how the stuff integrates with the NetBeans Platform and which patterns / idioms are best.

As an introductory post, I can at least explain why I got there. It's because of birds, one of my favourite photo subjects. The cataloguing engine of blueMarine was born because of my need to precisely catalog my bird photos, being largely unsatisfied by what the market offered a few years ago. Birds (as any other living being) are catalogued by a taxonomy which is basically a hierarchical tree of labels; for instance, the dunlins depicted below are technically:

  • Kingdom: Animalia
  • Phylum: Chordata
  • Class: Aves
  • Order: Charadriiformes
  • Family: Scolopacidae
  • Genus: Calidris
  • Species: alpina

For this reason, I initially developed the support for a hierarchical tree of tags. This facility allowed to catalog also other things such as inhabited places (state / region / province / place / etc...) and other stuff, especially after the introduction of the capability of creating a relation between two entities. This went even further for the needs of a customer of mine, where the thing started to show its limits, especially when I had to consider some smart way to do complex searches, so I started looking around for existing solutions.

In the meantime, I've also learnt that bird taxonomy isn't as easy: there are slightly different taxonomies upon which people don't agree, and sometimes they are changed; for instance, sometimes animals which are thought to be subspecies of the same species are "split" into different species (this frequently happens with gulls), other times the opposite happens ("lumping and splitting"). As a result, a serious bird cataloguing facility must keep this into account, a thing that my original tagging facility didn't: a leaf can have multiple parents at the same time and change them over time. This is not a quirk typical of birds: e.g. you can get sophisticated about hierarchies of places. For instance, in Italy we use three levels: region / province / place, while in France you throw in "arrondissement" and "canton" too, and "province" is indeed "department". If you are interested in history, both political and administrative borders move with time.

In a few words, I've found that RDF (Resource Description Framework), the main technology behind the semantic web, was the perfect solution. Its main feature is that it represents knowledge in form of statements known as triples (subject - predicate - object), such as:

  • "dunlin" "is-a" "bird"
  • "Calidris" "is-a-subset-of" "Scolopacidae"

Coupled with the proper formal stuff and facilities, this capability of asserting statements has no limits, as the "AAA slogan" says: "Anyone can say Anything about Any topic". This is the thing that bought me: the capability of organizing any kind of information without having to redesign a database schema. In other words, I saw RDF as a better store for information than the relational database (my previous tagging facility was an attempt about that, but with many limitations, as it didn't supported AAA at all). Of course, this use is just a subset of semantic web (this is why I prefer to say "semantic web techonologies"), but I'll start from here. It is also the most probable thing you, as an architect, might be interested about.

See you later for the next post.

Dunlins, or why I decided to use RDF

Technorati Tags: , , , , , ,

Comments

Hi. blueMarine can import most of XMP since a few months (while it is not capable yet of exporting anything).

What would really make blueMarine really interesting if it could work XMP data. The Apache XML Graphics Commons for reading and writing that metadata. From there, it's really easy to store it with Sesame.