Why the term “Linking Open Data” might be misleading

A lot of activities around Linking Open Data (“LOD”) and the associated data sets which are nicely visualised as a “cloud” are going on for quite a while now. It is exciting to see how the rather academic “Semantic Web” and all the work which is associated with this disruptive technology can be transformed now into real business use cases.

What I have obeyed in the last few months, especially in business communities, is the following:

  • “Linked Data” sounds interesting for the business people because the phrase creates a lot of associations in a second or two; also the database crowd seems to be attracted by this web-based approach of data integration
  • “Web of Data” is somehow misleading because many people think that this will be a new web which replaces something else. Same story with the “Semantic Web”
  • “Linking Open Data” sounds dangerous to many companies

For insiders it is clear, that the “openness” of data, especially in commercial settings, can be controlled and has to be controlled in many cases. Which means, it can be one-way or mutual. In some use cases data from companies will be put into the cloud, and can be opened up for many purposes, in other use cases it will stay inside the boundaries. In other scenarios only (open) data from the web will be consumed and linked with corporate data, but no data will be exposed to the world (except the fact, that data was consumed by an entity).

And of course: In many other occasions datasets and repositories will be opened up partly depending on the CCs and the underlying privacy regulations one wants to use.

This makes clear that LOD / Linking Open Data is just one detail of a bigger picture. Since companies (and governments) play a crucial role to develop the whole infrastructure, we need to draw a new picture:


I´ll be happy to have a lively discussion about this topic also at the first Linked Data Camp in Vienna.

Linked Data for Enterprises – a one way scenario?

Since the Linked Data paradigm is attracting a lot of attention – also in a lot of different industries which aren´t used to internet driven business processes, I am wondering, which are the best cases for them to build parts of their information management strategy on top of the “Web of Data”.

It´s quite obvious for media industry how to use linked data, or for internet companies like Amazon, but what´s in for a, let´s say, insurance company, as soon as it comes to the point where they should expose parts of their data for the “Web of Data”.

One point to start the line of arguments with is: “Why do you have a website at all?”.

Has resistance something to do with the ability to “control” a traditional HTML-website? Because as soon as mashups play a role, it´s not as easy as it was before to see,

  • who,
  • when and
  • why

someone uses one´s data.

On the other hand, is OpenCalais really “open”? It´s not possible to use the services without an API key. So all the questions above can be even better monitored than before, as soon as you have registered users using your LOD services. OpenCalais´ way to serve customers might become very soon a model for the business layer above the LOD cloud in general.

Since the LOD cloud is kind of the basic infrastructure which drives the whole process – this layer should remain a freely accessible one. But how could new business models be built on top of it (and constantly spend money on maintaining and extending the underlying infrastructure)?

Where could enterprises start using Linked Data? Only by retrieving data from the “outside” and mash it up with the “inside” – only one way?

I think we can learn a lot from one of the “oldest” networks on earth: The energy business is re-structuring a lot at the moment, and it foresees a lot more “mashups” than before – even single households with their own little plants will be able to PRODUCE energy, not only consuming it.

This has advantages for many stakeholders (except the shareholders), so what does it mean for the internet business – when will it transform from a centralised network into a more decentralised one? Which role will all the huge “information plants” like Google play?

Information has no “place” anymore, energy can´t be shipped around the world. We should rethink the meaning of a “data store” and information will flow without flooding us. Linked Data might become the essence.


OpenCalais will become an essential part of the Semantic Web

Really large companies start to spur the semantic web. Reuters has recently launched a semantic web service which is free also for commercial purposes. It helps to extract significant phrases from any unstructured text (web documents or office documents). This new service is called “OpenCalais” and is based on ClearForest text-analytics solutions (which was acquired by Reuters in 2007). So finally a dream comes true: Web content can be tagged automatically in quite a high quality. Technically spoken: Any unstructured text can be transformed into an RDF-graph on the fly, important phrases or even statements can be extracted from plain text.

OpenCalais is the core service for many new web applications and most of them will deal with better search functionalities or will also help to identify similarities between different types of content. For instance, for any document which is published on a web site related blogs or videos (or whatever) can be retrieved and presented as relevant context information.

Whenever an application will use OpenCalais content will be delivered to Reuters. Thus, submitting a URL has a different meaning in the future than it had all the years before: It´s not only about “promoting” a website anymore, it´s rather about examining ways to get connected with the semantic web – and about teaching Reuter´s global knowledge base 😉

Try it out!

3 Semantic Apps to Watch

As mentioned on Read/WriteWeb there are at least 10 (rather commercial) semantic web applications “around” which claim to use semantic web technologies for different purposes: “10 Semantic Apps to Watch”. (Besides this at least 100 prototypes from various research programms exist in this field).

My “short list” of those 10 apps consists of the following three:

  1. twine
  2. Talis
  3. clearforest

To my opinion these 3 projects have the highest potential to become a “big player” in the next generation web. Instead of “improving” what Google does, they try to fulfill a totally new mission:


Twine isn´t organising the “knowledge of the whole world” (like Google would like to do) they rather focus on the users themselves: Using a semantic graph (including the social graph) for each user, information in a social network will flow in a more efficient way. Information will come to the users instead of searching around. Twine is a combination of many of the well known Web 2.0 applications like Facebook or del.icio.us but will use base technologies from the semantic web and will provide a SPARQL API and a REST API.

Possible Risks:

  • It´s still not clear if people will accept personal semantic graphs rather as an advantage or rather as a possible danger for privacy
  • Semantic Web database technologies (Triple stores) are still very young. Although some of the existing systems have already proved that they are scalable none of them have been used so far for really big systems.


Twine is the first company which will combine social tagging, social networking, natural language processing and semantic web on a professional level. So it has the potential to become a very popular service for many people to support their daily business. Sooner or later the same system might be offered also as a very attractive business solution. Nevertheless, twine hasn´t opened its portal for the public so far, so it´s still not clear if all the promises will be held…


Talis is a “domain-agnostic” technology platform which supports developers to build applications on the principles of “mass collaboration”. It is a new breed of a distributed programmatic interface heavily deploying all opportunities the Web of Data may offer. “DNS is used as a robust routing mechanism to connect requests with the closest data or service both for the native platform services, but also for third party data access services.

Possible Risks:

Talis mission sounds great, and its success depends a lot on how this company will be able to build an ecosystem around its services. My forecast: Talis will be acquired in 2009 by one of the big web companies.


Talis tries to establish a new way of organizing information flows throughout the Web of Data. Since it relies on open standard protocols like RESTful Web Services a lot of applications will use Talis technologies. Talis as a company has a well founded background since it has been provided services for governmental organizations or libraries for the last 30 years. Some of the people working at Talis rank among the best semantic web thinkers.


(Clearforest was acquired by Reuters) was bought by Thomson. ClearForest’s technology automatically categorizes documents and structures entities contained inside text. The Semantic Web without text extraction algorithms which really work will never take place. And Clearforest really works. Just try it out!

Possible Risks:

Clearforest is well embedded in a giant: Thomson is the world´s largest media company. This is, of course a great opportunity to sell these new kind of semantic solutions to many of the global Top 5000. On the other hand, it might be a risk since “traditional” media companies still tend to forget about the long tail and open APIs.


Simply spoken, the USP of Clearforest is that the technology works and it can be integrated into existing architectures without being a semantic web expert. It can become one of the cornerstones of an integrated corporate semantic web architecture.

Zimbra is the Collaboration Suite for the Social Semantic Web

We´ve been using Zimbra since a couple of months now, and I think it´s a fantastic opportunity for each small or medium enterprise, especially when virtual teamwork becomes more and more important (like in my working environment), to make a step towards Web 2.0. Besides typical features a collaboration suite usually offers, Zimbra comes with a couple of concepts which have the potential for a “Next Generation Collaboration” (collaborate or collapse):

  • Zimbra comes with a powerful search engine based on Lucene
  • Zimbra has an integrated tagging-system
  • Zimbra´s spam filter learns from each user in the system, if an email is spam or not – a very good example for an application of collective intelligence
  • Zimbra´s Zimlets support meshups and with that mechanism any message content can be made “live” by linking it with other services and content (eg. if in a textpassage the word “today” appears it will automatically be linked with your calendar entries from “today”)

And: Zimbra can be synchronised with your PDA… 😉

Sparqling 5 years of semantic web evolution

The Semantic Web School has monitored press releases about semantic technologies and related stuff over the past 5 years. A collection of about 1.200 links and summaries on articles are the result. These articles were tagged and those tags are mainly linked to wikipedia – so this might be an interesting data collection for some web applications…

Inspired by the work of of LinkingOpenData (W3C SWEO) and dbpedia we´re happy to announce that now there is a Sparql endpoint for that data which is open for the public.  Ideas for some mashups are:

  • Showing on a timeline which topics came and went
  • Publishing a list of press releases linked with wikipedia articles
  • Extracting names of companies which are mentioned in the articles and calculating a tag cloud for them

Any other ideas?