Skip to main content

forceTen geographic APIs, a simple example about RDF design

Posted by fabriziogiudici on December 2, 2009 at 12:22 AM PST

forceTen
has been born as the container of components for rendering geographic
views and representing the related modes for the geotagging
capabilities of blueMarine;
but it has been also reused in two more server-side projects, where a
special focus has been given to the models.



The most trivial feature is the capability of managing accurate geo
tags: for instance, no duplicates and no erroneous spelling when
entering the name of a place. These features imply the need to keep a
hierarchical structure of names, in order to disambiguate places that
might have the same name, but are in different provinces or countries.



At first, it seemed that all that I needed was the use of a geocoding
service. I started with GeoNames
and was able to import hierarchies in my applications (a GeoCoder is
the model behind the GeoExplorer of forceTen).


src="http://www.java.net/sites/default/files/forcetenScreenSnapz001.jpg">



The GeoExplorer is the
panel at the left side with flags. You can see it in action in a
style="font-style: italic;"
href="http://netbeans.dzone.com/videos/screencast-maven-and-netbeans">screencast style="font-style: italic;">.







The first trivial issue was the need for having a permanent connection
to the GeoNames web service in order to operate. This was easily solved
by caching queries - a trivial task with a REST web service.



The second less trivial issue was a certain lack of coherence in
GeoNames data: for instance, looking at how italian regions are named,
you discover that some are named in plain italian (e.g. “Liguria”),
others have the “regione” prefix (e.g. “Regione Lombardia”), others
have the english name (e.g. “Tuscany”). GeoNames also provides
localized names and with some work this part can be normalized - but in
the end, with at least one customer scenario the capability to
customize the display name of the entity has been requested.
This introduced for the first time the idea that you can't just import
an external database, but you need at least a few local overridding
information on a local storage, that must be bound to the external data.



Another issue was the capability to add more locations than those
present at GeoNames. While the list of places up to the municipality
level seems to be pretty complete, there are other named geo entities
around: for instance, the name of a mountain, or the mouth of a river;
inside a town, a customer needs to geotag at building and even at finer
level (e.g. the “Galleria degli Uffizi” museum in Florence and “Sala
del Caravaggio”, a specific room inside of it). This introduced the
idea that the local data aren't just plain properties attached to
GeoNames data, but a complete new hierarchy.



The latest problem appeared at last, but it should have been clear
since from the beginning. If you are creating a long-lived geotag
archive, you must consider that geopolitics are mutable. For instance,
in the latest ten years many new provinces have been introduced in
Italy, meaning that some subtrees of the hierarchy have been
re-parented. If you keep the GeoNames cache forever, you'll never see
these updates; if you expire it periodically, you have to reparent your
data structure, which is an annoyance.



One more thing: I didn't have any explicit request to change the
underlying geocoding service (e.g. to use Yahoo! in place of GeoNames),
but it might happen. So a well designed, reusable component library
should be able to work with multiple geocoders - maybe at the same time.



This lead me to the design of the new APIs of forceTen. The idea is to
keep as two separate concepts:

  • the GeoLocation that you use to tag your data
  • the GeoCoder data (just to give names, I call each node in
    a GeoCoder hierarchy a GeoCoderEntity).

GeoLocations are under your control: you can create and destroy them,
give them names, bind to your data, eventually create hierarchies where
you explicitly need them (such as in the example of the Galleria degli
Uffizi). The GeoCoder assists you in finding the initial names, the
coordinates and other attributes, and gives the whole hierarchy
structure; the important point is that GeoCoder data is “attached” to
your GeoLocations, which is a reversed view of the original idea.


src="http://weblogs.java.net/sites/default/files/GeoLocations.002.png">

The information about
which is the parent of “Firenze” is in the GeoCoderEntity hierarchy,
not in GeoLocations.



This means that you can strip the GeoCoder data at any time, and
preserve your GeoLocations (and bound information) as they are; and
later re-attach GeoCoder data to GeoLocations, for instance by matching
coordinates. Of course, this means that you can attach data from a
different GeoCoder too, and if the GeoCoder hierarchy has changed, this
is not a problem of your GeoLocations. All GeoLocations are locally
persisted, while only the strictly needed GeoCoderEntities behind them
are locally persisted as a cache, for performance reasons and to
support offline operations. The GeoCoder, at this point, becomes an
implementation detail, that I hide behind a more abstract “GeoSchema”
concept.



Let's look at some RDF stuff. I'm supposing I need to geotag to
Polanesi, a small village in the east Riviera. The GeoNames hierarchy
is: Earth / Europe / Italy / Regione Liguria / Provincia di Genova /
Recco, to which I add / Polanesi as a custom leaf (it is not in the
GeoNames database).



First I define a GeoSchema:



GeoSchemaManager
schemaManager = GeoSchemaManager.Locator.findSchemaManager();
style="font-family: monospace;">
GeoSchema geoSchema
= schemaManager.findSchemaByName("GeoNames");
style="font-family: monospace;">


At the first call, the above code stores the following statements into
the local RDF store:



<?xml
version="1.0" encoding="UTF-8"?>
style="font-family: monospace;">
<rdf:RDF style="font-family: monospace;">
       
xmlns:geo="http://www.tidalwave.it/rdf/geo/2009/02/22#"
style="font-family: monospace;">
       
xmlns:wgs84_pos="http://www.w3.org/2003/01/geo/wgs84_pos#"
style="font-family: monospace;">
       
xmlns:skos="http://www.w3.org/2004/02/skos/core#"
style="font-family: monospace;">
       
xmlns:owl="http://www.w3.org/2002/07/owl#"
style="font-family: monospace;">
       
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
style="font-family: monospace;">
       
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
style="font-family: monospace;">


<geo:schema
rdf:about="http://www.geonames.org">
style="font-family: monospace;">
       
<rdf:type
rdf:resource="http://www.w3.org/2004/02/skos/core#ConceptScheme"/>
style="font-family: monospace;">
       
<rdf:type
rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
style="font-family: monospace;">
 
     
<skos:prefLabel>GeoNames</skos:prefLabel>
style="font-family: monospace;">
</geo:schema> style="font-family: monospace;">


</rdf:RDF> style="font-family: monospace;">

style="font-style: italic;">I remind you that RDF is an
abstract thing, not necessarily related to XML - in a RDF store, data
are made persistent in some implementation-dependent format. The
XML/RDF I'm using in this example is what I see when I dump the
repository to a file.



Let's see where these statements originated from:

  • Since GeoNames is a web service, I decided to use its URL
    as the id of the entity which represents it.
  • If you read href="http://weblogs.java.net/blog/fabriziogiudici/archive/2009/11/29/using-standard-ontologies">my
    previous post where I introduced SKOS, you should know that
    SKOS is a reusable ontology (i.e. a set of standard definitions) that,
    among other things, is well suited to represent hierarchy of concepts.
    The ConceptScheme
    definition in SKOS actually can be used to identify such a thing. SKOS
    also defines a way to attach labels to concepts: style="font-family: monospace;">skos:prefLabel
    (a thing that we can consider equivalent to a “display name”).
  • Furthermore, you also learned that OWL is another reusable
    ontology that provides the concept of “semantic equivalence” (same-as)
    and works with a base concept named  style="font-family: monospace;">Thing: you'll
    see that I need it later.  style="font-family: monospace;">



Now I can get to Recco (the lowest level branch in GeoNames) in two
ways: navigating a hierarchy, or querying by its coordinates (for
instance because I clicked a pixel in a map). In the former case, my
code is:



GeoLocation earth =
geoSchema.getRoot();


GeoLocation europe =
earth.findChildren(geoSchema).name("Europe").result();
style="font-family: monospace;">
GeoLocation italy =
europe.findChildren(geoSchema).name("Italy").result();
style="font-family: monospace;">
GeoLocation liguria
= italy.findChildren(geoSchema).name("Liguria").result();
style="font-family: monospace;">
GeoLocation
provinciaDiGenova =
liguria.findChildren(geoSchema).name("Genoa").result();
style="font-family: monospace;">
GeoLocation recco =
provinciaDiGenova.findChildren(geoSchema).name("Recco").result();




or, as a quicker alternative:



GeoLocation earth =
geoSchema.getRoot();


style="font-family: monospace;">GeoLocation recco =
earth.findChildren(geoSchema).path("/Europe/Italy/Liguria/Genoa/Recco").result();



In the latter case, the code is:



List<GeoLocation>
results = geoSchema.findLocations().closeTo(new Coordinate(44.363244,
9.137166, km(1)).results();
style="font-family: monospace;">
GeoLocation recco =
results.get(0);




The data from the GeoCoder get imported into the local RDF repository
as:



<geo:entity
rdf:about="http://sws.geonames.org/6540563/">
style="font-family: monospace;">
       
<rdf:type
rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
style="font-family: monospace;">
       
<rdf:type
rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
style="font-family: monospace;">
       
<skos:prefLabel>Recco</skos:prefLabel>
style="font-family: monospace;">
       
<wgs84_pos:lat
rdf:datatype="http://www.w3.org/2001/XMLSchema#double">44.363244</wgs84_pos:lat>
style="font-family: monospace;">
       
<wgs84_pos:long
rdf:datatype="http://www.w3.org/2001/XMLSchema#double">9.137166</wgs84_pos:long>
style="font-family: monospace;">
       
<wgs84_pos:alt
rdf:datatype="http://www.w3.org/2001/XMLSchema#double">0.0</wgs84_pos:alt>
style="font-family: monospace;">
       
<geo:type>ADM3</geo:type>
style="font-family: monospace;">
       
<skos:inScheme rdf:resource="http://www.geonames.org"/>
style="font-family: monospace;">
       
<skos:broader
rdf:resource="http://sws.geonames.org/3176217/"/>
style="font-family: monospace;">
</geo:entity>



As before, let's see where these statements originated from:

  • I used the same id (http://sws.geonames.org/6540563/)
    as the one defined by GeoNames, also because it's a real URL -
    connecting to is you can download some further RDF assertions about
    this data item. More about this later. 
  • You already know what a style="font-family: monospace;">Thing, a style="font-family: monospace;">Concept and a style="font-family: monospace;">skos:prefLabel
    are. The news here are the three style="font-family: monospace;">wgs_84_pos:*
    statements: they are part of another standard ontology ( href="http://www.w3.org/2003/01/geo">Basic Geo)
    that allows to work with some geographic concepts.  style="font-family: monospace;">
  • geo:type
    is a custom statement from forceTen, and stores an attibute coming from
    GeoNames, describing the level of the tree where the node is located:
    ADM3 means “administrative level 3”, that for Italy is a
    municipality. 
  • skos:inScheme
    is another new thing, and means that this node is part of the GeoNames style="font-family: monospace;">ConceptScheme
    that I previously defined (note the matching id style="font-style: italic;">http://www.geonames.org).
  • At last, you should know from my previous post that style="font-family: monospace;">skos:broader is
    a way to say that this node is a child of an upper level node. 

What's http://sws.geonames.org/3176217/?



<geo:entity
rdf:about="http://sws.geonames.org/3176217/">
style="font-family: monospace;">
       
<rdf:type
rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
style="font-family: monospace;">
       
<rdf:type
rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
style="font-family: monospace;">
       
<skos:prefLabel>Genoa</skos:prefLabel>
style="font-family: monospace;">
       
<wgs84_pos:lat
rdf:datatype="http://www.w3.org/2001/XMLSchema#double">44.5</wgs84_pos:lat>
style="font-family: monospace;">
       
<wgs84_pos:long
rdf:datatype="http://www.w3.org/2001/XMLSchema#double">9.0666667</wgs84_pos:long>
style="font-family: monospace;">
       
<wgs84_pos:alt
rdf:datatype="http://www.w3.org/2001/XMLSchema#double">0.0</wgs84_pos:alt>
style="font-family: monospace;">
       
<geo:code>GE</geo:code>
style="font-family: monospace;">
       
<geo:type>ADM2</geo:type>
style="font-family: monospace;">
       
<skos:inScheme rdf:resource="http://www.geonames.org"/>
style="font-family: monospace;">
       
<skos:broader
rdf:resource="http://sws.geonames.org/3174725/"/>
style="font-family: monospace;">
       
<skos:narrower
rdf:resource="http://sws.geonames.org/6540563/"/>
style="font-family: monospace;">
</geo:entity>



It's the data for the Province of Genoa. We could recursively track
parents until we get to the root, which represents the Earth.



So, the GeoCoder data needed to back our GeoLocations is now part of
our local repository. The next time we will refer to the same
GeoCoderEntities, forceTen won't query the remote webservice any
longer, but use data in the local repository.



Now, let's look at what's happening with GeoLocations:



<geo:location
rdf:about="urn:tidalwave:geo/location#7f3a0f10-de68-11de-8523-002332c672e6">
style="font-family: monospace;">
       
<rdf:type
rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
style="font-family: monospace;">
       
<rdf:type
rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
style="font-family: monospace;">
       
<owl:sameAs
rdf:resource="http://sws.geonames.org/6540563/"/>
style="font-family: monospace;">
       
<skos:prefLabel>Recco</skos:prefLabel>
style="font-family: monospace;">
</geo:location>



This is the GeoLocation representing Recco. It has got its own id ( style="font-style: italic;">urn:tidalwave:geo/location#7f3a0f10-de68-11de-8523-002332c672e6,
using my own prefix plus a href="http://en.wikipedia.org/wiki/Universally_Unique_Identifier">UUID,
which is IMHO the most convenient way to generate internal IDs), and
its skos:prefLabel
- it's “Recco”, but I could change it later. The most important thing
is the owl:sameAs
 statement, that makes it semantically equivalent to style="font-style: italic;">http://sws.geonames.org/6540563/,
which - of course - is the way GeoNames represents Recco. This binds my
GeoLocation to the GeoNames hierarchy.



If I call:



GeoLocation
provinceOfGenoa = recco.findParent(geoSchema);




I will get the GeoLocation representing the Province of Genoa: forceTen
has been able to navigate the hierarchy by looking at the (cached)
GeoNames data. Note that in order to find a parent you need to specify
a GeoSchema: in fact, it's the latter to define a hierarchy (and
multiple GeoSchemata could define different hierarchies). If, by
absurd, Recco was moved to another province, I could just strip the
cache of GeoNames data and retrieve the fresh ones to get the update
(of course, supposing that GeoNames correctly plays and doesn't change
the ids of already existing entities, a thing that is guaranteed by
that service).



Now I can create Polanesi:



GeoLocation polanesi
= recco.createChild().
style="font-family: monospace;">
           
      
         
name(Locale.ITALIAN, "Polanesi").
style="font-family: monospace;">
                 
      
    code("XYZ").
style="font-family: monospace;">
 
    
                     
coordinate(new Coordinate(
style="font-family: monospace;">44.36667 style="font-family: monospace;">, 9.11667)).build();



which appears in the RDF repository as:



<geo:location
rdf:about="urn:tidalwave:geo/location#7f7f2e60-de68-11de-8523-002332c672e6">
style="font-family: monospace;">
       
<rdf:type
rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
style="font-family: monospace;">
       
<rdf:type
rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
style="font-family: monospace;">
       
<skos:prefLabel
xml:lang="en">Polanesi</skos:prefLabel>
style="font-family: monospace;">
       
<wgs84_pos:lat
rdf:datatype="http://www.w3.org/2001/XMLSchema#double">
style="font-family: monospace;">44.36667 style="font-family: monospace;"></wgs84_pos:lat> style="font-family: monospace;">
       
<wgs84_pos:alt
rdf:datatype="http://www.w3.org/2001/XMLSchema#double">
style="font-family: monospace;"> style="font-family: monospace;">9.11667</wgs84_pos:alt> style="font-family: monospace;">
       
<wgs84_pos:long
rdf:datatype="http://www.w3.org/2001/XMLSchema#double">0.0</wgs84_pos:long>
style="font-family: monospace;">
       
<geo:code>XYZ</geo:code>
style="font-family: monospace;">
       
<skos:prefLabel>Polanesi</skos:prefLabel>
style="font-family: monospace;">
       
<skos:broader
rdf:resource="urn:tidalwave:geo/location#7f3a0f10-de68-11de-8523-002332c672e6"/>
style="font-family: monospace;">
</geo:location>



Note that skos:broader
statement that makes it a child of Recco; furthermore, there is no style="font-family: monospace;">owl:sameAs
statement, as there's no equivalence in GeoNames. This is a piece of
hierarchy that I'm maintaining on my own and doesn't rely on any
external GeoCoder data. That's also the reason for which, in this case,
all the attributes such as the coordinates or the code are stored
within the GeoLocation, while they were previously inferred
from the equivalent GeoCoderEntity.



The final word is about URLs that can be referenced, thus making
themselves a good candidate for an id. I previously said that href="http://sws.geonames.org/6540563/"> style="font-style: italic;">http://sws.geonames.org/6540563/
is a real URL as it references a real document (not all URL-shaped
strings in a semantic database are necessarily doing that). If you
point your browser to it, you'll see a fact-sheet HTML page. The most
interesting thing occurs when you explicitly ask for a RDF document,
for instance by using the href="http://dowhatimean.net/2007/02/debugging-semantic-web-sites-with-curl">curl
command (needed because it makes it possible to specify the MIME type
of the requested document):



% curl -I -H
"Accept: application/rdf+xml" http://sws.geonames.org/6540563/
style="font-family: monospace;">
HTTP/1.1 303 See
Other


Date: Tue, 01 Dec
2009 12:17:02 GMT


Server:
Apache/2.2.10 (Linux/SUSE)
style="font-family: monospace;">
Location:
http://sws.geonames.org/6540563/about.rdf
style="font-family: monospace;">
Vary: Accept-Encoding style="font-family: monospace;">
Content-Type:
text/html; charset=iso-8859-1






Note that “See Other” and “Location” headers. Let's follow the
suggestion:



% curl
http://sws.geonames.org/6540563/about.rdf
style="font-family: monospace;">
<?xml
version="1.0" encoding="UTF-8" standalone="no"?>
style="font-family: monospace;">
<rdf:RDF
xmlns="http://www.geonames.org/ontology#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:wgs84_pos="http://www.w3.org/2003/01/geo/wgs84_pos#">
style="font-family: monospace;">
<Feature
rdf:about="http://sws.geonames.org/6540563/">
style="font-family: monospace;">
<name>Recco</name> style="font-family: monospace;">
<featureClass
rdf:resource="http://www.geonames.org/ontology#A"/>
style="font-family: monospace;">
<featureCode
rdf:resource="http://www.geonames.org/ontology#A.ADM3"/>
style="font-family: monospace;">
<inCountry
rdf:resource="http://www.geonames.org/countries/#IT"/>
style="font-family: monospace;">
<wgs84_pos:lat>44.363244</wgs84_pos:lat> style="font-family: monospace;">
<wgs84_pos:long>9.137166</wgs84_pos:long> style="font-family: monospace;">
<parentFeature
rdf:resource="http://sws.geonames.org/3176217/"/>
style="font-family: monospace;">
<childrenFeatures
rdf:resource="http://sws.geonames.org/6540563/contains.rdf"/>
style="font-family: monospace;">
<locationMap
rdf:resource="http://www.geonames.org/6540563/recco.html"/>
style="font-family: monospace;">
</Feature> style="font-family: monospace;">
</rdf:RDF>



Here you can see some further RDF assertions about Recco, directly
provided by GeoNames. In the specific case, they don't provide any
further information about Recco, but they could (e.g. the population or
other facts). This is a good example of how a distributed repository of
information has been created. If I export my RDF, thanks to the use of
the GeoNames real URL for identifying Recco, other people are enabled
to perform some aggregate queries. The same happens, for instance, if
they followed my scheme and used the same URL inside their own database.


Advanced
topic. Since the above document is RDFXML, I could have just imported
it and embedded into my store as is. Why didn't I do that - and wrote
instead some specific code to convert to my style="font-family: monospace;">geo:entity?
Because GeoNames doesn't use SKOS for mapping the hierarchy, but its
own ontology with parentFeature
and childrenFeature.
If they used SKOS, I would have saved some work.

Even more advanced topic: I *think* it could be possible to declare a
semantic equivalence between skos:broader
and parentFeature
(more doubts about skos:narrower
and childrenFeatures).
If it was possible, then I would really be able to save code and import
the GeoNames RDF as is. But it's too advanced for my current semantic
skills.



I must point out that while this distributed aspect is the true nature
of the Semantic Web, it's just one more thing you can do with it. I
find a RDF exciting enough just as a local database, thanks its greater
flexibility when compared with a rigid SQL schema.







PS I must say I lied:
Polanesi is indeed in the GeoNames database. It must have been added
after I wrote the referenced code (which actually comes from actual
tests of forceTen). The example, of course, is still valid if you think
of Polanesi as any other location not present in the GeoNames database.



AttachmentSize
GeoLocations.002.png125.33 KB
Related Topics >>