Skip to main content

Elmo, a Semantic Entity Manager

Posted by fabriziogiudici on October 26, 2009 at 9:26 PM PDT






My last post about my use of semantic technologies in my projects dates
back to href="http://weblogs.java.net/blog/2009/04/29/observation-api-hey-its-not-observable-pattern">several
months ago - it's high time I get on, also taking the chance
of a href="http://www.slideshare.net/benfante/got-bored-by-the-relational-database-switch-to-a-rdf-store">presentation
I've held a couple of days ago at the href="http://www.jugpadova.it/articles/2009/09/20/javaday-verona-2009">JavaDay
Verona.



Today I'm going to introduce the products I'm using: OpenSesame and
Elmo. Both are produced by Aduna
Software
and are available at the href="http://www.openrdf.org/">OpenRDF site, under
the pretty liberal BSD-style license. I'm not going to write a
tutorial, of course, as some documentation is available on the website,
but rather show a small example.



OpenSesame is a RDF store, that is a sort of database that instead of
using an Entity-Relationship (ER) model, as MySQL or Oracle, uses the
RDF model, made of “triples” (I shortly href="http://weblogs.java.net/blog/fabriziogiudici/archive/2009/02/bluemarine_went.html">introduced
RDF triples in my first post about this topic). I'm going to
give some more details about OpenSesame implementation, but
conceptually we don't need to know about it - just as we don't need to
know how MySQL or Oracle stores data on the disk, we just rely on the
comprehension of the ER model and the Java APIs to manage it.



For some respects, you can think of OpenSesame being similar to Derby,
as it can operate in multiple ways: only in memory (that is without any
persistence), with a file backing store (thus supporting persistence)
or delegating to a separate, standard database (in this case, in
addition to persistence, we also get transaction support). It is also
possible to run OpenSesame as a separate process and connect to it by
means of a network protocol. Specifying the working mode is a matter of
initializing the store in the code.



Now, the OpenSesame APIs are logically a bit above JDBC: they support
raw access to the data in native form, that is RDF stores; I've said
that they are a bit above JDBC as OpenSesame provides an abstraction
for a triple (the Statement class), while JDBC doesn't provide
abstractions for “table” and “record”. In any case, this is still too
low-level for me and I avoid the direct use of the OpenSesame API as
much as I can.



The good thing about OpenRDF software is the existence of another API,
called Elmo, which works in a similar way of a ORM (Object Relational
Mapper) - pretty similar to JPA, for some respects. Indeed it's the
presence of Elmo that made me decide to use Aduna's software instead of
other competing products.



With this post I'm just giving a short example of how to use Elmo.
First we start definining an entity class, that could be this:



package javaday.example1;
import org.openrdf.elmo.annotations.rdf;
@rdf("http://www.tidalwave.it/rdf/geo/2009/02/22#location")
public class GeoLocation
  {
    @rdf("http://www.w3.org/2003/01/geo/wgs84_pos#lat")
    private Double latitude;
    @rdf("http://www.w3.org/2003/01/geo/wgs84_pos#long")
    private Double longitude;
    @rdf("http://www.w3.org/2003/01/geo/wgs84_pos#alt")
    private Double altitude;
    @rdf("http://www.tidalwave.it/rdf/geo/2009/02/22#code")
    private String code;
    public Double getAltitude()
      {
        return altitude;
      }
    public void setAltitude(Double altitude)
      {
        this.altitude = altitude;
      }
    public String getCode()
      {
        return code;
      }
    public void setCode(String code)
      {
        this.code = code;
      }
    public Double getLatitude()
      {
        return latitude;
      }
    public void setLatitude(Double latitude)
      {
        this.latitude = latitude;
      }
    public Double getLongitude()
      {
        return longitude;
      }
    public void setLongitude(Double longitude)
      {
        this.longitude = longitude;
      }
    public String toString()
      {
        return String.format("GeoLocation[%s %f %f]", code, latitude, longitude);
      }
  }





As you can see, it's pretty similar to a JPA entity, where instead of
using @Table and @Column you use the @rdf annotation (strangely
lower-case). You shouldn't be surprised that the same annotation can
define both an entity and its fields, as in a RDF store everything is a
triple. In particular, what we're going to achieve is the creation of
four triples, one for each field, where the subject is the entity, the
statement describes each field, and the object is the field value. You
should recall that in RDF anything is defined as a URL or a URN - in
this case I've used a mix of a standard ontology (the WSG-84 standard
ontology for defining geographic stuff) and a couple of my own URNs
(for the entity and its “code” field).



Now, let's use it. First we have to set up the repository and some
facility classes:

package javaday.example1;

import java.io.File;
import java.util.List;
import org.openrdf.repository.Repository;
import org.openrdf.repository.sail.SailRepository;
import org.openrdf.rio.rdfxml.util.OrganizedRDFXMLWriter;
import org.openrdf.sail.memory.MemoryStore;
import org.openrdf.elmo.ElmoModule;
import org.openrdf.elmo.ElmoQuery;<br />import org.openrdf.elmo.sesame.SesameManager;
import org.openrdf.elmo.sesame.SesameManagerFactory;

public class Main
  {
    public static void main( String[] args )
      throws Exception
      {
        Repository repository = new SailRepository(new MemoryStore(new File("/tmp/RDFStore")));
        repository.initialize();
        ElmoModule module = new ElmoModule();
        SesameManagerFactory factory = new SesameManagerFactory(module, repository);
        SesameManager em = factory.createElmoManager();



Any annotated entity class must also be registered with the runtime.
Elmo chooses to use the META-INF/services approach: you must create a
file named META-INF/org.openrdf.elmo.concepts which contains the fully
qualified names of all annotated entity classes.



Contrast this with the standard JPA approach, that in JSE applications
requires the entity classes are all listed in persistence.xml. I much
prefer Elmo's approach, as in modular applications (such as NetBeans
Platform applications) you can't have all the registrations in a
centralized file. With Elmo you can have multiple META-INF/services
registrations, one per module, and the classloader merges them
together. For having JPA working in the same way, I had to heavily
patch the JPA implementation.



The href="http://www.openrdf.org/doc/sesame2/api/org/openrdf/repository/Repository.html">Repository
is an OpenSesame class, as well as its implementation href="http://www.openrdf.org/doc/sesame2/api/org/openrdf/repository/sail/SailRepository.html">SailRepository.
I'm instantiating it against a memory implementation which is backed up
by a file - this means that I get persistence. In a real case I'd use a
database, but now I want to keep things simpler for you to reproduce.
Note that unfortunately only two databases are supported (MySql and
Postgres).



The remaining three classes ( href="http://www.openrdf.org/doc/elmo/1.4/apidocs/org/openrdf/elmo/ElmoModule.html">ElmoModule,
href="http://www.openrdf.org/doc/elmo/1.4/apidocs/org/openrdf/elmo/sesame/SesameManagerFactory.html">SesameManagerFactory
and href="http://www.openrdf.org/doc/elmo/1.4/apidocs/org/openrdf/elmo/sesame/SesameManager.html">SesameManager)
are somewhat equivalent of Persistence, EntityManagerFactory and
EntityManager (of course, not one-to-one mappings). SesameManager, in
particular, implements JPA's EntityManager and could be used as a JPA
implementation, even though not completely compliant with the
specification. I'm not interested in this approach, as I want my code
to be fully aware that we're using RDF triples rather than ER -
something that I'll talk about in my next posts.



Once you have a SesameManager, things are somewhat similar to JPA: for
instance, I can create an persist two entities with:

em.getTransaction().begin();
GeoLocation genova = new GeoLocation();
genova.setLatitude(45.0);
genova.setLongitude(9.0);
genova.setCode("GE");
GeoLocation milano = new GeoLocation();
milano.setLatitude(46.0);
milano.setLongitude(9.0);
milano.setCode("MI");

em.persist(genova);
em.persist(milano);
em.getTransaction().commit();



Some other operations are not similar, for instance merge() - but also
for this topic I need a further post.



To conclude my introduction, I'd like to show you an example of a
query. As SQL is a query language for ER databases, SPARQL is a well
known query language for a RDF store (I believe it's the most popular,
but alternatives exist and OpenRDF supports some). Queries are created
in a similar way as JPA, with the capability of specifying
“placeholders” in the query body, and assigning a value for a
placeholder in code:

em.getTransaction().begin();
String queryString = "PREFIX wgs84: <http://www.w3.org/2003/01/geo/wgs84_pos#>\n" +
                     "SELECT ?location WHERE \n" +
                     "  {\n" +
                     "    ?location a ?type.\n" +
                     "    ?location wgs84:lat ?lat\n" +
                     "  }";
final ElmoQuery query = em.createQuery(queryString).
                           setType("type", GeoLocation.class).
                           setParameter("lat", 45.0);

final List<GeoLocation> result = query.getResultList();
       
for (GeoLocation l : result)
  {
    System.err.println(l);
  }

em.getTransaction().commit();



Placeholders are designated by the question mark - but note that not
everything marked with the question mark is a placeholder whose value
is set by code. ?location is a variable representing any result of the
query, and we can refer it multiple time to specify various conditions.
The simpler way to specify a condition is by writing a triple. I've
used two conditions in my previous query:

  1. The entity must have an assigned latitude (wgs84:lat) whose
    value is 45.0. Note that wgs84: is just a prefix for the whole
    namespace, used only to write queries more easily readable.
  2. The result must be of type GeoLocation. To be more precise,
    I need to say “must be assignable to” a GeoLocation. The reason for
    this condition, that you wouldn't use in JPA as you'd specify the type
    with a “FROM” clause, is that in a RDF store “Anyone can say Anything
    about Any topic” (AAA slogan), thus there could be many kinds of
    entity, other than a GeoLocation, in the store that are the subject of
    a wgs84:lat - not filtering them would cause a ClassCastException in
    the assignment to GeoLocation.

My last statement opens interesting considerations, that I'll reserve
for my next posts. For now, you should just notice that thanks to the
higher flexibility of a RDF store, your Java code might need to manage
that flexibility. In other words, while a ER database is usually less
flexible than the code managing it (e.g. there's no inheritance,
polymorphism, etc...) and there's a sort of strong type system (one
table usually matches one entity), a RDF store is usually more flexible
than the code managing it, because of the AAA slogan. You needn't to be
flexible, of course - you choose to be. In other words, if you're sure
that only GeoLocations have been written to the store, the second
condition might be redundant.



At last, let's see how to dump a content of the store. A RDF store is
not related to XML as many might think: as I've previously said, indeed
we don't know which the internal representation is. RDF must not be
mistaken with XML: XML-RDF, unfortunately referred to as RDF for short
(thus making confusion), it just one way for representing a store.
Other alternatives exist (such as N3).



But since XML-RDF is popular, I'm using it for this first
example:      

final OrganizedRDFXMLWriter wr = new OrganizedRDFXMLWriter(System.err);
wr.handleNamespace("rdf", "http://www.w3.org/1999/02/22-rdf-syntax-ns#");
wr.handleNamespace("rdfs", "http://www.w3.org/2000/01/rdf-schema#");
wr.handleNamespace("wgs84", "http://www.w3.org/2003/01/geo/wgs84_pos#");
wr.handleNamespace("forceten", "http://www.tidalwave.it/rdf/geo/2009/02/22#");

em.getConnection().export(wr);
em.close();



Pasting together all the code sketches in my example and running the
application, we get the following output:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
    xmlns:wgs84="http://www.w3.org/2003/01/geo/wgs84_pos#"
    xmlns:forceten="http://www.tidalwave.it/rdf/geo/2009/02/22#">
<forceten:location rdf:nodeID="node14i99m09gx1">
    <wgs84:lat rdf:datatype="http://www.w3.org/2001/XMLSchema#double">45.0</wgs84:lat>
    <wgs84:long rdf:datatype="http://www.w3.org/2001/XMLSchema#double">9.0</wgs84:long>
    <forceten:code>GE</forceten:code>
</forceten:location>
<forceten:location rdf:nodeID="node14i99m09gx2">
    <wgs84:lat rdf:datatype="http://www.w3.org/2001/XMLSchema#double">46.0</wgs84:lat>
    <wgs84:long rdf:datatype="http://www.w3.org/2001/XMLSchema#double">9.0</wgs84:long>
    <forceten:code>MI</forceten:code>
</forceten:location>
</rdf:RDF>



Note that default, anonymous ids have been assigned to both entities,
as we didn't define any. The remaining stuff should be easily readable.
Note that we have only three statements for each entity: as we didnt'
assign any altitude, there are no altitude statements.



If you use Maven, the pom for the example is:



<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>javaday</groupId>
    <artifactId>example1</artifactId>
    <packaging>jar</packaging>
    <version>1.0-SNAPSHOT</version>
    <name>example1</name>
    <url>http://maven.apache.org</url>
    <repositories>
        <repository>
            <releases>
                <enabled>true</enabled>
            </releases>
            <snapshots>
                <enabled>false</enabled>
            </snapshots>
            <id>aduna-opensource.releases</id>
            <name>Aduna Open Source - Maven releases</name>
            <url>http://repo.aduna-software.org/maven2/releases</url>
        </repository>
    </repositories>
    <dependencies>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>3.8.1</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.openrdf.elmo</groupId>
            <artifactId>elmo-api</artifactId>
            <version>1.4</version>
        </dependency>
        <dependency>
            <groupId>org.openrdf.elmo</groupId>
            <artifactId>elmo-sesame</artifactId>
            <version>1.4</version>
        </dependency>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-api</artifactId>
            <version>1.5.0</version>
        </dependency>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-simple</artifactId>
            <version>1.5.0</version>
        </dependency>
    </dependencies>
    <build>
        <pluginManagement>
            <plugins>
                <plugin>
                    <artifactId>maven-compiler-plugin</artifactId>
                    <version>2.0.2</version>
                    <configuration>
                        <debug>true</debug>
                        <optimize>true</optimize>
                        <source>1.6</source>
                        <target>1.6</target>
                        <showDeprecation>true</showDeprecation>
                        <showWarnings>true</showWarnings>
                    </configuration>
                </plugin>
            </plugins>
        </pluginManagement>
    </build>
</project>




class="zemanta-pixie-img" alt=""
src="http://img.zemanta.com/pixy.gif?x-id=336b6e16-7a9a-86b2-9c0e-7863b7f91151">


Related Topics >>

Comments

@tcowan Yes, I know about

@tcowan Yes, I know about James Leigh, in fact his blog is in my aggregator. Henry, Harold and James are really semantic {web, stuff} experts, while I'm not! I'm just a software architect that added RDF to his portfolio of technologies.

@bblfish Hi Henry, glad you're following my blog so if I say something wrong I'll be corrected :-) The introduction of semantic stuff in my projects went slower than expected, mainly because at a certain point of this year I had to seriously improve the management of my projects (see my posts about introducting Mercurial and Maven). Now that I've completed this change for some projects containing RDF, I'm going to blog about that.

Yes, annotations are a very good thing, since they contribute in decoupling from the underlying technology.

Elmo

Fabrizio, Elmo = James Leigh. Harold Carr showed me some nice stuff at Jazoon. You two, along with Henry Story should share notes on semweb stuff. Taylor

so(m)mer and elmo

Hi Fabricio, nice to see things coming along so far.

yes, there is another project that is very similar to Elmo on dev.java.net called So(m)mer that also provides a mapper from Sesame to Java objects. It's not surprising really that Sommer and elmo are similar as we agreed somewhat on the use of the same annotations a few years ago. They have been working a lot more consistently on their code though...

I would not say that rdf/xml is popular by the way. It is a standard, and as such every implementation of rdf libraries knows how to parse it and generate rdf/xml. But it has caused a lot of trouble in getting rdf adopted. If people want to want to learn RDF - which is NOT RDF/XML a framework for describing resources - they should learn N3 a notation developed by semwebbers to discuss issues on irc and in email. RDF is about semantics, not syntax. So it is not tied to a syntax. XML on the other hand is the other way around, it is a pure syntax, without any pre-established semantics. That is why they can fit so well together.

Btw, for those who are interested in a 10 minute introduction to semantic web as applied to Social Networks check out my presentation at the Hackers At Random conference this summer, or my longer presentation at the Free and Open Source Conference. Those of you in the Bay Area should definitively attend the Social Web Camp in Santa Clara Nov. 2.