Web 3.0: Enter the matrix
When I attended college decades back in the early 1990ies, for students of information technology the future looked bright and safe. The cold war was over, the web was growing fastly, and thanks to the pill's baby bust it was clear that everybody able to program computers would have a safe job for livetime. While I had some experience with modems and fido net before, it actually was in college where I first touched The Web. With a blazing speed of 64Kb/s over a single ISDN channel, 25 students at one time had the chance to surf it concurrently. While there was not so much thrilling entertainment in the web back then, though we spend lots of hours to try out this new medium. In my small home town, I was one of the first customers of the first internet provider, using the web mostly for scientific purposes.
Since that time, a lot has changed. The web made its way into the live of everybody in the western world. Having no Facebook account or not having seen the latest #fail videos on YouTube is a bigger reason for getting dissed for a current child than wearing no Reeboks and Levis in our times. And having no internet banking or web shopping account makes our lives rather complex in the times of closing branches. In fact, I have to confess to do more than 75% of my purchas on the web. Even my rather old parents-in-law, who never had used a computer for most of their lives, are booking their flights online. Yes, times have changed.
Lots of years later I'm back at the same college. This time standing in front of a crowd of young students. Lots of them majoring in the areas of media informatics or distributed systems. So those are the ones building the next generation of the web! What to tell them, shortly before leaving college and starting a professional live? Something about the lies around being always flexible and mobile and that the best guy allegedly is getting the job (which both are some of the myths my generation was told at graduation)? I chose to tell them something about Terminator and The Matrix.
For computer scientists, and especially for those majoring in the area of distributed computing like me, the internet, and the web in particular, are just fascinating: Information of any source, located anywhere, can be aggregated easily using one single command to return the answer to virtually any question typically asked by the majority of users. While this sounds like the holy graid of information warehousing (and it potentially is), it bears risks not currently foreseeable.
With existing technology and real information collections (using Semantic Media Wiki, a semantic extension to the Media Wiki software driving Wikipedia) I demonstrated an easy and simple query returning the names of all mayors of all municipalities in a particular county. The fascinating thing was not the result itself (which someone could find out using Google within few hours manually, too) but the fact that both, query and result was machine readable (thanks to existing standards), that the result was there in a few seconds and last but not least that the result was not set up explicitly for this demonstration but actually was already existing in a human readable Wiki which had been machine enabled within half a minute (by modifying a single template). What I wanted to show the students was: Right now, a machine is already able to understand and join lots of information found in the web. The semantic web is not a future thing. It is here already. Machines already can find out things within seconds already which humans can join only with hard work and googling a lot.
The actual problem is not that machines can do that or can do that fast. The problem is the future fusion of several technologies. In the future, most web access will be done by robots, according to the Google Internet Summit 2009. Most of the robots in first place will be cars and room cleaners, but as more robots (including already existing things like military combat robot, surveillance drones, house automation, but also remote controlled health care systems like heart monitors and insulin pumps) there are, the more problematic will be their usage of the semantic web. In first place, it is a good idea that a heart monitor learns about new abnormal ECG curves from scanning the web, so it could inform your doctor automatically. On the other hand, we have to drop rose-coloured glassed and look at the naked truth.
The web is, to a large extends, dominated by two types of providers: Commercial companies and criminals (and not always both are sharply separated due to divergent laws). A commercial company is not interested in your actual health. It is interested in getting your money. So it is clear that it will not try to reduce the amount of drugs automatically getting pumped into your veins. Just as not everybody trusts Microsoft Windows's "Auto Update" feature, how many would trust a similar feature included in a insuline pump? Even worse, in one that is automatically scanning the web for new information to learn. Even if this could be explained with profit maximisation, it shows that a machine controlling your live will be an increased risk if you attach it to the web. And for criminals, the semantic web is an invitation to maximize their profit, too. I don't say that there will be public killer services where you type in the name of a person you want to get rid of, but theoretically that could come true. What I actually fear is that terrorists learn how to hack security devices (like they already did in the past). It is not impossible that some day a terrorist hacker manages to confuse military devices by adding a viral firmware update to the vendor's site, or by adding misleading semantic information into the web (maybe replacing himself as the target person with the president of the united states by swapping two links on a CIA web server). If the virus contains AI, steps are not far from there to SkyNet (Terminator) or The Matrix: Machines that just get out of control, using their man-made intelligence to control man.
This all is impossible you think? Well... Did you also assume in 2000 that it is impossible to crash three planes into the World Trade Centre and even the Pentagon at the same day? Terrorists are not dumb. They also attend college, they also learn how to program computers, they also learn how to hack them, they also know what a Mashup is and what the semantic web is good for. Do you still think it is a good idea to have any information in a machine readable syntax stored in a public place? Do you still think it is a good idea to have more and more information stored not in a cache but actually as a live stream?
Example: There already is a live stream of all radar tracks of planes on the web. A smart service allowing me to see where my parents-in-law currently are when flying to some tropical island. But is it a good idea that this is really a live stream? If I were a terrorist, I won't dare the risk to go aboard to hijack a plane. I would instead hack into some badly secured Third World country's Air Raid Defense network and convince that web service that it's target is to be found on that radar track service and let the mashup do it's destructive work. Iraqi insurgents already hacked a US drone. As I said, they are not dumb.
We have to face the actual risks and deal with them before building the Web 3.0. This is the responsibility of the new generation of information scientists. Not to find out how to make machines understand the knowledge found in the web, but to find out how to prevent machines (either automatically or driven by humans) from using the web against man. That is what students must know when starting to build the next web.
You can find a collection of all my blog entries and more on Head Crasing Informatics.