Why the term “Linking Open Data” might be misleading
A lot of activities around Linking Open Data (“LOD”) and the associated data sets which are nicely visualised as a “cloud” are going on for quite a while now. It is exciting to see how the rather academic “Semantic Web” and all the work which is associated with this disruptive technology can be transformed now into real business use cases.
What I have obeyed in the last few months, especially in business communities, is the following:
- “Linked Data” sounds interesting for the business people because the phrase creates a lot of associations in a second or two; also the database crowd seems to be attracted by this web-based approach of data integration
- “Web of Data” is somehow misleading because many people think that this will be a new web which replaces something else. Same story with the “Semantic Web”
- “Linking Open Data” sounds dangerous to many companies
For insiders it is clear, that the “openness” of data, especially in commercial settings, can be controlled and has to be controlled in many cases. Which means, it can be one-way or mutual. In some use cases data from companies will be put into the cloud, and can be opened up for many purposes, in other use cases it will stay inside the boundaries. In other scenarios only (open) data from the web will be consumed and linked with corporate data, but no data will be exposed to the world (except the fact, that data was consumed by an entity).
And of course: In many other occasions datasets and repositories will be opened up partly depending on the CCs and the underlying privacy regulations one wants to use.
This makes clear that LOD / Linking Open Data is just one detail of a bigger picture. Since companies (and governments) play a crucial role to develop the whole infrastructure, we need to draw a new picture:

I´ll be happy to have a lively discussion about this topic also at the first Linked Data Camp in Vienna.
Linked Data for Enterprises – a one way scenario?
Since the Linked Data paradigm is attracting a lot of attention – also in a lot of different industries which aren´t used to internet driven business processes, I am wondering, which are the best cases for them to build parts of their information management strategy on top of the “Web of Data”.
It´s quite obvious for media industry how to use linked data, or for internet companies like Amazon, but what´s in for a, let´s say, insurance company, as soon as it comes to the point where they should expose parts of their data for the “Web of Data”.
One point to start the line of arguments with is: “Why do you have a website at all?”.
Has resistance something to do with the ability to “control” a traditional HTML-website? Because as soon as mashups play a role, it´s not as easy as it was before to see,
- who,
- when and
- why
someone uses one´s data.
On the other hand, is OpenCalais really “open”? It´s not possible to use the services without an API key. So all the questions above can be even better monitored than before, as soon as you have registered users using your LOD services. OpenCalais´ way to serve customers might become very soon a model for the business layer above the LOD cloud in general.
Since the LOD cloud is kind of the basic infrastructure which drives the whole process – this layer should remain a freely accessible one. But how could new business models be built on top of it (and constantly spend money on maintaining and extending the underlying infrastructure)?
Where could enterprises start using Linked Data? Only by retrieving data from the “outside” and mash it up with the “inside” – only one way?
I think we can learn a lot from one of the “oldest” networks on earth: The energy business is re-structuring a lot at the moment, and it foresees a lot more “mashups” than before – even single households with their own little plants will be able to PRODUCE energy, not only consuming it.
This has advantages for many stakeholders (except the shareholders), so what does it mean for the internet business – when will it transform from a centralised network into a more decentralised one? Which role will all the huge “information plants” like Google play?
Information has no “place” anymore, energy can´t be shipped around the world. We should rethink the meaning of a “data store” and information will flow without flooding us. Linked Data might become the essence.

It´s a web of data, stupid!
Character created with South-Park Create-a-character
3 Semantic Apps to Watch
As mentioned on Read/WriteWeb there are at least 10 (rather commercial) semantic web applications “around” which claim to use semantic web technologies for different purposes: “10 Semantic Apps to Watch”. (Besides this at least 100 prototypes from various research programms exist in this field).
My “short list” of those 10 apps consists of the following three:
To my opinion these 3 projects have the highest potential to become a “big player” in the next generation web. Instead of “improving” what Google does, they try to fulfill a totally new mission:
Twine isn´t organising the “knowledge of the whole world” (like Google would like to do) they rather focus on the users themselves: Using a semantic graph (including the social graph) for each user, information in a social network will flow in a more efficient way. Information will come to the users instead of searching around. Twine is a combination of many of the well known Web 2.0 applications like Facebook or del.icio.us but will use base technologies from the semantic web and will provide a SPARQL API and a REST API.
Possible Risks:
- It´s still not clear if people will accept personal semantic graphs rather as an advantage or rather as a possible danger for privacy
- Semantic Web database technologies (Triple stores) are still very young. Although some of the existing systems have already proved that they are scalable none of them have been used so far for really big systems.
USP:
Twine is the first company which will combine social tagging, social networking, natural language processing and semantic web on a professional level. So it has the potential to become a very popular service for many people to support their daily business. Sooner or later the same system might be offered also as a very attractive business solution. Nevertheless, twine hasn´t opened its portal for the public so far, so it´s still not clear if all the promises will be held…
Talis is a “domain-agnostic” technology platform which supports developers to build applications on the principles of “mass collaboration”. It is a new breed of a distributed programmatic interface heavily deploying all opportunities the Web of Data may offer. “DNS is used as a robust routing mechanism to connect requests with the closest data or service both for the native platform services, but also for third party data access services.“
Possible Risks:
Talis mission sounds great, and its success depends a lot on how this company will be able to build an ecosystem around its services. My forecast: Talis will be acquired in 2009 by one of the big web companies.
USP:
Talis tries to establish a new way of organizing information flows throughout the Web of Data. Since it relies on open standard protocols like RESTful Web Services a lot of applications will use Talis technologies. Talis as a company has a well founded background since it has been provided services for governmental organizations or libraries for the last 30 years. Some of the people working at Talis rank among the best semantic web thinkers.
(Clearforest was acquired by Reuters) was bought by Thomson. ClearForest’s technology automatically categorizes documents and structures entities contained inside text. The Semantic Web without text extraction algorithms which really work will never take place. And Clearforest really works. Just try it out!
Possible Risks:
Clearforest is well embedded in a giant: Thomson is the world´s largest media company. This is, of course a great opportunity to sell these new kind of semantic solutions to many of the global Top 5000. On the other hand, it might be a risk since “traditional” media companies still tend to forget about the long tail and open APIs.
USP:
Simply spoken, the USP of Clearforest is that the technology works and it can be integrated into existing architectures without being a semantic web expert. It can become one of the cornerstones of an integrated corporate semantic web architecture.
Triple-I 2007: The next big discussion will be about privacy
First of all: I enjoyed this year´s Triple-I very much. I met a lot of friends and colleagues in Graz, had plenty of interesting discussions, I gave a talk about “Enterprise n+1″ (together with Franz Novak and Henry Story) and a semantic wiki tutorial (with Denny Vrandecic and Sebastian Schaffert), I have seen interesting keynotes and other talks and we all enjoyed very good food and excellent wine.
To get an impression of the great atmosphere of the congress go to Leo Sauermann´s photoset about the triple-i on flickr.
The congress showed that the interest for semantic technologies from industrial perspective has exceeded a level where it´s not necessary anymore to be a missionar, but rather to offer concrete products. Together with the huge changes in people´s minds which came in with Web 2.0 the Semantic Web will also offer some answers to the privacy discussion.
Marc Smith´s (Microsoft Research) keynote was an excellent example how fascination about what can be done with technology sometimes covers possible dangers too much. Marc was fascinating the audience a lot. He showed nice applications in the mobile social web. And at the end he warned: “Hey guys, do you really want to be observed by some companies?”.
Same thing with Peter Reiser´s (Sun Microsystem) keynote: He said, that Web 2.0 principles can be applied for internal knowledge management, but it will only be accepted if the privacy issue will be discussed beforehand.
So my question: Do you really know what Amazon, Google & Co. know about YOU – the person of the year 2006? And isn´t that another important issues also for the Open Data philosophy?
Sparqling 5 years of semantic web evolution
The Semantic Web School has monitored press releases about semantic technologies and related stuff over the past 5 years. A collection of about 1.200 links and summaries on articles are the result. These articles were tagged and those tags are mainly linked to wikipedia – so this might be an interesting data collection for some web applications…
Inspired by the work of of LinkingOpenData (W3C SWEO) and dbpedia we´re happy to announce that now there is a Sparql endpoint for that data which is open for the public. Ideas for some mashups are:
- Showing on a timeline which topics came and went
- Publishing a list of press releases linked with wikipedia articles
- Extracting names of companies which are mentioned in the articles and calculating a tag cloud for them
Any other ideas?




