Skip to main content

The Great Database In The Sky

Posted by davidvc on November 13, 2006 at 5:34 PM PST

On Thursday morning at the Web 2.0 Summit, Marten Mikos of MySQL
talked about "The Great Database In The Sky."
His vision: open source structured data.
Today you can search unstructured data through Google, but there is no
open access to the world's structured data. He had an example:

SELECT CurrentWindDirection, CurrentWindSpeed SQL FROM
AllTheworldsWeatherStations, MyOwnWeatherStation, MyFriendsWeatherStation
WHERE ...;

Now, the question arises, how does this differ from the Semantic Web? Great question, and someone asked him that. I may be misquoting him as I wasn't able to write down his answer, but I am pretty sure he said that this could be thought of as a subset of the Semantic Web, because the Semantic Web can put structure around unstructured data (such as a collection of URIs). In my mind, the awkward silence that followed was because many of us were thinking "well, if it's a subset of the Semantic Web, hasn't the problem already been solved?"

As I understand it, the Semantic Web is placing structure and semantics around Internet resources, URIs, whereas relational schemas place structure and semantics around tables. But isn't a table conceptually the same as a URI? I know that the REST model argues that every concept or "object" in an application domain should have its own resource, its own URI. So if you apply this rule, isn't there a mapping between relational model concepts and the Semantic Web?

A quick Google shows that Tim Berners-Lee said something very much like this in 1998. So, there is already a web-based model out there that seems to handle relational data. So why haven't we seen The Big Database In The Sky?

In my opinion, the issue is the same issue that has dogged large companies trying to integrate various acquisitions they have made, or with companies trying to do share data: data integration. Nobody calls the same thing the same thing - you say tomayto, I say tomahto, a rose by any other name would smell as sweet... If you look at the Semantic Web, it allows for global schemas that span multiple domains, but doesn't require it, and they tend to recommend against "boil the ocean" attempts to unify schemas. This is a hard problem.

I can envision committees or open source communities coming together trying to define common schemas. And where the value is there, perhaps this will happen. But enabling global search and update across relational/structured data is just not as easy as what we're seeing with unstructured search (Google) and folksonomies like, flickr and YouTube. With "real" structured data, you have to define your structure ahead of time, and that's a problem, because things are always changing in the World Wild Web. It reminds me of this Dilbert I saw where Dilbert has this immensely complicated flow chart he's presenting, and he says, "and here we are having this meeting." Somebody says "I have a question" and Dilbert says "oh, now I have to rework everything."