Skip to main content

The Web as a Database?

Posted by davidvc on February 14, 2007 at 8:39 PM PST

Alex Iskold talks about how Yahoo! Pipes enables us to use the web as a database, showing the similarities between structured queries over the relational model and structured queries over RSS/Atom feeds.

The Semantic Web folks have a very similar vision, but the power of Yahoo! Pipes is that it takes advantage of an existing web standard (RSS/Atom) and does not try to impose yet another meta-model on top of what already exists. In this way it has the same advantage that REST does over WS-*. And of course you sacrifice exactness and completeness with this approach, but it's simple and it works.

I played with Yahoo! Pipes. I think it's a great vision and a great model. My own particular pipe, an aggregation of references to me in blogs and elsewhere, had no end of difficulties. I tried to filter out my own blogs, but that was almost all I saw. I tried to filter out other members of my family, no luck. I tried to order by date, and the stuff came in random order. Other people are much more successful, so I'm willing to chalk this up to my own "not getting it." But to me it's a sign that it doesn't have the same approachability as throwing together a web page in DreamWeaver.

But let's put that aside for now, and look at the vision. It's a very cool vision. Feeds in, feeds out, and then string them together, applying various operators and a little bit of looping and flow control, doing it all visually.
If you've ever looked at a query tree, it has a very similar model. Tuples in, tuples out, and various relational operators being applied at each node, either joining or sorting or filtering.

I can envision taking my internal relational data that I want to make available to the Web, and delivering it as an Atom feed, using the REST model where each domain entity (a database table or view) maps to a URI, a web resource. Then I could write some pipes that provide useful views into that data. Hm, that shouldn't be too hard to do - I may go try and pull that together...

Doing all this through a visual approach is going to get clumsy. At some point people will want to do JavaPipes and RubyPipes and AJAXPipes. Having a REST-based API to do this is one approach; another is to provide a library that does all the work on *your* server (rather than having it done on Yahoo's servers through their REST API).

This ties into my concerns about scale. If Yahoo! is hosting all these pipes, can they handle the demand? How do they handle millions of users hitting pipes that gather, sort, filter and apply foreach operations? I'm sure they have smart people looking at this; if they aren't already, they should be talking to the database folks, who know all about query optimization, query compilation, caching and so on. Luckily Pipes are read-only (they have to be, because they are produced through transformation of real data -- it's just like you can't update a computed column in a database). This means they don't have to worry about locking and contention.

The other issue about Yahoo! hosting the pipes is the issue with the freedom to leave, which I've written about before. I've written these cool pipes, and then I get made mad at Yahoo! and want to leave, or they get new management who wants to charge me for my pipes, and I've built a whole service around them. What I really want to see is an open source implementation of the API that I can run on my machine or host somewhere else. Make this something that helps the Internet take off in an incredible way, rather than tying it down to Yahoo's servers and Yahoo's storage and Yahoo's UI (cool as it may be). Come on Yahoo, tear down the walls and set this bird free.