The Power of Linked Data

Just recently European Commission has published a comprehensive case study on how Linked Data is transforming eGovernment. One of the three “Further readings” which are mentioned in this publication is “Linked Open Data: The Essentials” in which I wrote an article about “The Power of Linked Data – Understanding World Wide Web Consortium’s (W3C) vision of a new web of data”

Imagine that the web is like a giant global database. You want to build a new application that shows the correspondence among economic growth, renewable energy consumption, mortality rates and public spending for education. You also want to improve user experience with mechanisms like faceted browsing. You can already do all of this today, but you probably won’t.

Today’s measures for integrating information from different sources, otherwise known as mashing data, are often too time-consuming and too costly. Two driving factors can cause this unpleasant situation: First of all, databases are still seen as „silos”, and people often do not want others to touch the database for which they are responsible. This way of thinking is based on some assumptions from the 1970s: that only a handful of experts are able to deal with databases and that only the IT department’s inner circle is able to understand the schema and the meaning of the data. This is obsolete.

In today’s internet age, millions of developers are able to build valuable applications whenever they get interesting data.

Secondly, data is still locked up in certain applications. The technical problem with today’s most common information architecture is that metadata and schema information are not separated well from application logics. Data cannot be re-used as easily as it should be. If someone designs a database, he or she often knows the certain application to be built on top. If we stop emphasising which applications will use our data and focus instead on a meaningful description of the data itself, we will gain more momentum in the long run. At its core, Open Data means that the data is open to any kind of application and this can be achieved if we use open standards like RDF to describe metadata.

Nowadays, the idea of linking web pages by using hyperlinks is obvious, but this was a groundbreaking concept 20 years ago. We are in a similar situation today since many organizations do not understand the idea of publishing data on the web, let alone why data on the web should be linked. The evolution of the web can be seen as follows:

web evolution

Although the idea of Linked Open Data (LOD) has yet to be recognised as mainstream (like the web we all know today), there are a lot of LOD already available. The so called LOD cloud covers more than an estimated 50 billion facts from many different domains like geography, media, biology, chemistry, economy, energy, etc. The data is of varying quality and most of it can also be re-used for commercial purposes.

All of the different ways to publish information on the web are based on the idea that there is an audience out there that will make use of the published information, even if we are not sure who exactly it is and how they they will use it. Here are some examples:

• Think of a twitter message: not only do you not know all of your followers, but you often don’t even know why they follow you and what they will do with your tweets.
• Think of your blog: it´s like an email to someone you don’t know yet.
• Think of your website: new people can contact you and offer new surprising kinds of information.
• Think of your email-address: you have shared it on the web and receive lots of spam since then.

In some ways, we are all open to the web, but not all of us know how to deal with this rather new way of thinking. Most often the „digital natives“ and „digital immigrants“ who have learned to work and live with the social web have developed the best strategies to make use of this kind of „openness.“ Whereas the idea of Open Data is built on the concept of a social web, the idea of Linked Data is a descendant of the semantic web.

The basic idea of a semantic web is to provide cost-efficient ways to publish information in distributed environments. To reduce costs when it comes to transferring information among systems, standards play the most crucial role. Either the transmitter or the receiver has to convert or map its data into a structure so it can be „understood“ by the receiver. This conversion or mapping must be done on at least three different levels: used syntax, schemas and vocabularies used to deliver meaningful information; it becomes even more time-consuming when information is provided by multiple systems. An ideal scenario would be a fully-harmonised internet where all of those layers are based on exactly one single standard, but the fact is that we face too many standards or „de-facto standards“ today. How can we overcome this chicken-and-egg problem?

There are at least three possible answers:
• Provide valuable, agreed-upon information in a standard, open format.
• Provide mechanisms to link individual schemas and vocabularies in a way so that people can note if their ideas are “similar” and related, even if they are not exactly the same.
• Bring all this information to an environment which can be used by most, if not all of us. For example: don’t let users install proprietary software or lock them in one single social network or web application!

Most systems today deal with huge amounts of information. All information is produced either within the system boundaries (and partly published to other systems) or it is consumed “from outside,” “mashed” and “digested” within the boundaries. Some of the growing complexity has been caused in a natural way due to a higher level of education and the technical improvements made by the ICT sector over the last 30 years.

Simply said, humanity is now able to handle much more information than ever before with probably the lowest
costs ever (think of higher bandwidths and lower costs of data storage). However, most of the complexity we are struggling with is caused above all by structural insufficiencies due to the networked nature of our society.

The specialist nature of many enterprises and experts is not yet mirrored well enough in the way we manage information and communicate. Instead of being findable and linked to other data, much information is still hidden. With its clear focus on high-quality metadata management, Linked Data is key to overcoming this problem. The value of data increases each time it is being re-used and linked to another resource. Re-usage can only be triggered by providing information about the available information. In order to undertake this task in a sustainable manner, information must be recognised as an important resource that should be managed just like any other.

Linked Open Data is already widely available in several industries, including the following three:
Linked Data in libraries: focusing on library data exchange and the potential for creating globally interlinked library data; exchanging and jointly utilising data with non-library institutions; growing trust in the growing semantic web; and maintaining a global cultural graph of information that is both reliable and persistent.
Linked Data in biomedicine: establishing a set of principles for ontology/vocabulary development with the goal of creating a suite of orthogonal interoperable reference ontologies in the biomedical domain; tempering the explosive proliferation of data in the biomedical domain; creating a coordinated family of ontologies that are interoperable and logical; and incorporating accurate representations of biological reality.
Linked government data: re-using public sector information (PSI); improving internal administrative processes by integrating data based on Linked Data; and interlinking government and nongovernment information.

The inherent dynamics of Open Data produced and consumed by the “big three” stakeholder groups – media, industry, and government organizations/NGOs – will move forward the idea, quality and quantity of Linked Data – whether it is open or not:

open data 2 lod

Whereas most of the current momentum can be observed in the government & NGO sectors, more and more media companies are jumping on the bandwagon. Their assumption is that more and more industries will perceive Linked Data as a cost-efficient way to integrate data.

Linking information from different sources is key for further innovation. If data can be placed in a new context, more and more valuable applications – and therefore knowledge – will be generated.

Big Data for NPOs & NGOs – linked, not stirred!

NPOs and NGOs are acting more-and-more as open data providers for various stakeholders like citizens, enterprises and communities. Linked open data becomes a key concept to meet several demands of information professionals, for instance interoperability and accessibility of data, multilinguality and harmonisation of metadata.

The open data value chain is about to change from a rather simple to a more complex network of data streams which produces new revenue models and more differentiated roles – linked open data plays a central role in this development.

This webinar (held on October 9, 2012) was about the use of linked open data and controlled vocabularies in the specific enviroments, NGOs and NPOs are working in.

Open Data for Enterprises

I gave a short talk at yesterday´s “Open Governemt Data BusinessDay” in Vienna. I was talking about “Open Data for Enterprises” pointing out the different needs enterprises have than typical “app-makers”.

Open data is obviously socio-politically relevant and helps to reduce administrative costs. It is kind of an infrastructure which is „invisible“ for the business community. But to make it attractive for enterprises, open/external data obviously should be integrable with internal databases. Are linked data and open semantic web standards the solution for this challenge?

The Power of Linked Open Data

It was my pleasure to contribute to “Linked Open Data: The Essentials – a quick start guide for decision makers” which has been published recently by Semantic Web Company and REEEP.

Here is an excerpt of my article, please find a link to download the publication in full as PDF at the bottom of this entry:

The Power of Linked Open Data – Understanding World Wide Web Consortium’s (W3C) vision of a new web of data

Imagine that the web is like a giant global database. You want to build a new application that shows the correspondence among economic growth, renewable energy consumption, mortality rates and public spending for education. You also want to improve user experience with mechanisms like faceted browsing. You can already do all of this today, but you probably won’t. Today’s measures for integrating information from different sources, otherwise known as mashing data, are often too time-consuming and too costly.

Two driving factors can cause this unpleasant situation: First of all, databases are still seen as „silos”, and people often do not want others to touch the database for which they are responsible. This way of thinking is based on some assumptions from the 1970s: that only a handful of experts are able to deal with databases and that only the IT department’s inner circle is able to understand the schema and the meaning of the data. This is obsolete. In today’s internet age, millions of developers are able to build valuable applications whenever they get interesting data. Secondly, data is still locked up in certain applications. The technical problem with today’s most common information architecture is that metadata and schema information are not separated well from application logics. Data cannot be re-used as easily as it should be. If someone designs a database, he or she often knows the certain application to be built on top.

If we stop emphasising which applications will use our data and focus instead on a meaningful description of the data itself, we will gain more momentum in the long run.

At its core, Open Data means that the data is open to any kind of application and this can be achieved if we use open standards like RDF to describe metadata.

Linked Data?

Nowadays, the idea of linking web pages by using hyperlinks is obvious, but this was a groundbreaking concept 20 years ago. We are in a similar
situation today since many organizations do not understand the idea of publishing data on the web, let alone why data on the web should be linked. The evolution of the web can be seen as follows:

Although the idea of Linked Open Data (LOD) has yet to be recognised as mainstream (like the web we all know today), there are a lot of LOD already available. The so called LOD cloud covers more than an estimated 50 billion facts from many different domains like geography, media, biology, chemistry, economy, energy, etc. The data is of varying quality and most of it can also be re-used for commercial purposes.

Open Data for Enterprises? – My short review of OKCon 2011, Berlin

I went to Berlin last week to attend Open Knowledge Conference 2011 (OKCon). The event was nice – I met great people and “Kalkscheune” is a location which is typical for new media events nowadays: It´s a place where “old industry” took place in former days, “conquered” by digital citizens today. But OKCon is not a typical event for people living in the digital society – it´s rather avant-garde. At OKCon visionaries (like Brewster Kahle) meet and discuss how internet will/should transform society within the next few years towards an open society. Of course not anybody has the same visions about an open society, e.g. some business people might think that this is too much of innovation and they will close up their firewall, their intranet and finally their ears.

Open Data movement is a great thing but it will only be a sustainable project if enterprises will start to build their business on top of it. I gave a talk at OKCon which covered questions like: “How to talk with business people when questions come up like ‘Open Data – what is the value for us?'”. My basic idea is to start with using “Open Data Mechanisms” internally: As a first step companies could establish “Open Data Services” enterprise wide but internally. Only as a sub-sequent step open data will be consumed from the web and finally published on the web. But first of all:

“Use Open Data in Enterprises as an internal mechanism to distribute data within the company”

As a consequence of this, companies will seek for standards based solutions, and voila: W3C can offer the whole range of such standards called “Semantic Web” and “Linked Data”. Have a look at my slides:

Open Data Thesaurus

I’m happy to announce the start of the first “Open Data Thesaurus“. The Open Data Thesaurus is a collection of key concepts and entities, their definitions and semantic links. Following the principle of “eat your own dog food” this thesaurus is available in machine-readable form based on open W3C standards and under a Creative Commons license. The thesaurus can thus be integrated into other applications, such as for creating mashups, or for indexing of documents. The thesaurus is available in English and German.
The thesaurus is maintained and expanded constantly by the Semantic Web Company in cooperation with Open Knowledge Forum Austria.
We warmly invite to co-manage this thesaurus. So please contact Andreas (a.blumauer@semantic-web.at) to get signed up.

Der “Open Data Thesaurus” dient als Einstiegshilfe in die Diskussion um das Themenfeld “offene Daten”: Es wird eine Sammlung zentraler Begriffe und Organisationen, ihrer Definitionen, semantischen Verknüpfungen und weiterführenden Links angeboten. Nach dem Prinzip “eat your own dog food” ist der Thesaurus unter einer Commons Creative Lizenz verfügbar und liegt in maschinen-lesbarer Form auf Basis offener W3C-Standards vor.
Der Thesaurus kann damit in andere Anwendungen eingebunden werden, z.B. zur Erstellung von Mashups oder zur Indexierung von Dokumenten herangezogen werden.
Der Open Data Thesaurus wird von der Semantic Web Company in Zusammenarbeit mit OGD Austria gewartet und laufend erweitert. Der Thesaurus liegt in englischer und deutscher Sprache vor.

“Gerade in der Phase der Etablierung eines Themas sind Thesauri eine wertvolle Ressource, weil Mißverständnisse schneller aufgelöst werden können oder eindeutige Ankerpunkte um oft noch unscharfe Begriffsdefinitionen schneller entstehen können.”

I-Semantics, September 2010, Graz/Austria

Call for Papers

I-Semantics 2010: 6th International Conference on Semantic Systems
Graz, Austria, 1 – 3 September 2010 http://www.i-semantics.at

Call for Submission
3rd Triplification Challenge

Scope
I-SEMANTICS 2010 (www.i-semantics.at) is the 6th conference in the I-SEMANTICS series
and provides a forum for academic and industrial research & development that focuses on
semantic technologies and the Semantic Web. I-SEMANTICS 2010 will bring together both
researchers and practitioners in the areas of Linked Data, Social Software and the Semantic
Web in order to present and develop innovative ideas that help realising the “Social Semantic
Web” and the “Corporate Semantic Web”.
I-SEMANTICS 2010 will be the host of this year`s regional Pragmatic Web Conference as
well as the third edition of the TRIPLIFICATION Challenge. Further on I-SEMANTICS will
be complemented by I-KNOW (www.i-know.at), the 10th International Conference on
Knowledge Management. This setup is aiming to reflect the increasing importance and
convergence of knowledge management and semantic systems.

Important Dates
• Paper Submission Deadline: 8 March 2010
• Notification of Acceptance: to be announced
• Camera-Ready Paper: to be announced
• I-SEMANTICS 2010: 1–3 September 2010

Why the term “Linking Open Data” might be misleading

A lot of activities around Linking Open Data (“LOD”) and the associated data sets which are nicely visualised as a “cloud” are going on for quite a while now. It is exciting to see how the rather academic “Semantic Web” and all the work which is associated with this disruptive technology can be transformed now into real business use cases.

What I have obeyed in the last few months, especially in business communities, is the following:

  • “Linked Data” sounds interesting for the business people because the phrase creates a lot of associations in a second or two; also the database crowd seems to be attracted by this web-based approach of data integration
  • “Web of Data” is somehow misleading because many people think that this will be a new web which replaces something else. Same story with the “Semantic Web”
  • “Linking Open Data” sounds dangerous to many companies

For insiders it is clear, that the “openness” of data, especially in commercial settings, can be controlled and has to be controlled in many cases. Which means, it can be one-way or mutual. In some use cases data from companies will be put into the cloud, and can be opened up for many purposes, in other use cases it will stay inside the boundaries. In other scenarios only (open) data from the web will be consumed and linked with corporate data, but no data will be exposed to the world (except the fact, that data was consumed by an entity).

And of course: In many other occasions datasets and repositories will be opened up partly depending on the CCs and the underlying privacy regulations one wants to use.

This makes clear that LOD / Linking Open Data is just one detail of a bigger picture. Since companies (and governments) play a crucial role to develop the whole infrastructure, we need to draw a new picture:

LinkedDataWorld

I´ll be happy to have a lively discussion about this topic also at the first Linked Data Camp in Vienna.

Linked Data for Enterprises – a one way scenario?

Since the Linked Data paradigm is attracting a lot of attention – also in a lot of different industries which aren´t used to internet driven business processes, I am wondering, which are the best cases for them to build parts of their information management strategy on top of the “Web of Data”.

It´s quite obvious for media industry how to use linked data, or for internet companies like Amazon, but what´s in for a, let´s say, insurance company, as soon as it comes to the point where they should expose parts of their data for the “Web of Data”.

One point to start the line of arguments with is: “Why do you have a website at all?”.

Has resistance something to do with the ability to “control” a traditional HTML-website? Because as soon as mashups play a role, it´s not as easy as it was before to see,

  • who,
  • when and
  • why

someone uses one´s data.

On the other hand, is OpenCalais really “open”? It´s not possible to use the services without an API key. So all the questions above can be even better monitored than before, as soon as you have registered users using your LOD services. OpenCalais´ way to serve customers might become very soon a model for the business layer above the LOD cloud in general.

Since the LOD cloud is kind of the basic infrastructure which drives the whole process – this layer should remain a freely accessible one. But how could new business models be built on top of it (and constantly spend money on maintaining and extending the underlying infrastructure)?

Where could enterprises start using Linked Data? Only by retrieving data from the “outside” and mash it up with the “inside” – only one way?

I think we can learn a lot from one of the “oldest” networks on earth: The energy business is re-structuring a lot at the moment, and it foresees a lot more “mashups” than before – even single households with their own little plants will be able to PRODUCE energy, not only consuming it.

This has advantages for many stakeholders (except the shareholders), so what does it mean for the internet business – when will it transform from a centralised network into a more decentralised one? Which role will all the huge “information plants” like Google play?

Information has no “place” anymore, energy can´t be shipped around the world. We should rethink the meaning of a “data store” and information will flow without flooding us. Linked Data might become the essence.

exit_450_157