Linked Data – The Next 5 Years: From Hype to Action

Linked Data and the Semantic Web have been around for quite a while and have been hyped again and again. In the meantime, a large number of enterprises and even whole industries have adopted semantic web technologies for several purposes (for example, visit Allotrope Foundation). “Gartner’s Hype Cycle 2015 for Advanced Analytics and Data Science” has put Linked Data into the trough of disillusionment, which is another clear indicator to be ready for takeoff.

Gartner 2015 Hype Cycle for Advanced Analytics and Data Science.png

The pace of semantic web technology adoption may vary from industry to industry, but in average it has increased even more than expected. Just in 2012, Gartner has predicted that the Semantic Web won’t reach the plateau of productivity within the next 10 years, only three years later it seems like it will be there in 5 to 10 years.

Gartner 2012 Hype Cycle for Big Data.png

Linked Data Hype or not, it has entered the adoption phase. In the next 5 years we finally can see to which degree enterprises will use semantic web technologies for data analytics, data integration, and knowledge discovery.

What are the main obstacles that are frequently mentioned by potential users? Which best practices for implementing linked data on a larger scale have already been developed? What are the ‘low-hanging fruits’, and how could a concrete action plan look like? Will the often predicted interlinking of an open semantic web and corporate semantic webs take place? Which other technology (of the above mentioned hype cycles) might play a crucial role as an enabler for enterprise linked data? Which other (mega-)trends will influence the pace of linked data adoption, and which related organisational challenges should be expected?

Please visit Andreas Blumauer’s talk ‘Linked Data – The Next 5 Years: From Hype to Action’ at SEMANTiCS 2016 in Leipzig to get some valuable impulses for your Linked Data project!


SPARQL is the new King of all Data Scientist’s tools

Inspired by the development of semantic technologies in recent years, in statistical analysis field the traditional methodology of designing, publishing and consuming statistical datasets is evolving to so-called “Linked Statistical Data” by associating semantics with dimensions, attributes and observation values based on Linked Data design principles.

The representation of datasets is no longer a combination of magic words and numbers. Everything is becoming meaningful when URIs replace their positions as dereferencable resources, which further establishes the relations between resources implicitly and automatically. Different datasets are no longer isolated and all datasets share a globally, uniquely and uniformly defined structure.

With “RDF Data Cube Vocabulary” ( there is already a W3C Recommendation available for linked statistical data . At this point, it is time to start building data-oriented applications and services with the traditional statistical computing languages such as R, while benefiting from the omnipotent semantic power of the SPARQL query language.

Most of the statistical analysis functions are set operations performed on subsets of a dataset (i.e., a slice, a facet, etc.). Calculation is dull machine work but how to group and create a subset is actually the innovation point to produce analytics. Thanks to SPARQL, now this subset can be created from a semantic perspective instead of mathematics and statistics.

Compared to the traditional filtering way using SQL queries, SPARQL queries eliminate the boundaries of data among datasets and among databases.

An example query can be “list the bestsellers of a supermarket for category science fiction movie in year 2014”. Someone may point out that it is also feasible with SQL if the database schema consists of all relevant fields. Well, it is absolutely correct. But what if there are more conditions such as “during weekends, directed by an American director, casted by European actors”? Is it necessary for a supermarket to maintain such data sets? Assume that there is a supermarket of which the boss is a movie fans and he would like to maintain such data and SQL is working perfectly so far. Can we reuse this query, and accordingly the Web application for another supermarket? Here we have good reasons why we use SPARQL. Any accessible resource can be used to construct the query results.

SPARQL is the new King of all Data Scientist’s tools because …

  1. SPARQL is close to how human beings actually think about the world.
  2. With SPARQL you can query knowledge graphs.
  3. SPARQL is based on knowledge models that can combine mindsets of subject-matter experts, data engineers and information architects.
  4. SPARQL is to the Semantic Web and the Web in general what SQL is to relational databases.
  5. SPARQL is a W3C recommendation and is supported by many different database vendors, so it doesn’t cause lock-in effects as we’ve become used to with various types of SQL engines (which are not standardized at all).
  6. With SPARQL you benefit from the potential to make a collection of data sources look and query like one big database.
  7. SPARQL provides pattern based search functionality. With such search capabilities you can find out unknown linkages or non-obvious patterns that give you new insights into your data.
  8. Not only is SPARQL a standardized query language, also the access via web interfaces is standardized (this is called a SPARQL endpoint). This makes the integration of different data sources a lot easier.
  9. SPARQL is also a standardized update and graph traversal language.
  10. SPARQL is a standardized protocol producing standardized results, thus making it a complete API alleviating developers from the necessity to reinvent an API with every single application.
  11. With SPARQL you can query over structured and unstructured information as a whole.
  12. SPARQL allows you to explore data. In contrast to traditional ways to query databases where knowledge about the database schema/content is necessary SPARQL allows you to ask “tell me what is there”.
  13. SPARQL property paths offer completely new ways to explore a data set, e.g. by detecting ‘hidden links’ between business objects.
  14. With SPARQL you can define inference rules to gain new information from existing facts.

Linked Data meets Data Science

As a long-term member of the Linked Data community, which has evolved from W3C’s Semantic Web, the latest developments around Data Science have become more and more attractive to me due to its complementary perspectives on similar challenges. Both disciplines work on questions like these:

  • How to extract meaningful information from large amounts of data?
  • How to connect pieces of information to other pieces in order to generate ‘bigger pictures’ of sometimes complex problems?
  • How to visualize complex information structures in a way that decision-makers benefit from it?

Two complementary approaches
When taking a closer look to the approaches taken by those two ‘schools of advanced data management’ one aspect becomes obvious: Both try to develop models in order to be able to ‘codify and to calculate the data soup’.

While Linked Data technologies are built on top of knowledge models (‘ontologies’), which try to describe first of all data in distributed environments like the web, are Data Science methods mainly based on statistical models. One could say: ‘Causality and Reasoning over Distributed Data’ meets ‘Correlation and Machine Learning on Big Data’.

Graph databases are key to success
In contrast to this supposed contradiction, correlations and complementarities between those two disciplines prevail. Both approaches seek for solutions to overcome the problem with rigid data structures which can hardly adapt to the needs of dynamic knowledge graphs. Whenever relational databases cannot fulfill requirements about performance and simplicity, due to the complexity of database queries, graph databases can be used as an alternative.

Thus, both disciplines make use of these increasingly popular database technologies: While Linked Data can be stored and processed by standards-based RDF stores like VirtuosoMarkLogicGraphDB or Sesame, are the most popular graph databases for Data Scientists mainly based on the property graph model, for example: Titan or Neo4J. Some vendors like Bigdata support even both graph models.

Both graph models are similar and can be mapped to each other, but they try to solve slightly different problems:

  • the property graph model serves better the needs of Graph Data Analysts (e.g. for Social Network Analysis or for real-time recommendations)
  • RDF graph databases are great when distributed information sources should be linked to each other and mashed together (e.g. for Dynamic Semantic Publishing or for context-rich applications).

Connect both approaches and combine methods
I can see at least two options where methods from Data Science will benefit from Linked Data technologies and vice versa:

  • Machine learning algorithms benefit from the linking of various data sets by using ontologies and common vocabularies as well as reasoning, which leads to a broader data basis with (sometimes) higher data quality
  • Linked Data based knowledge graphs benefit from Graph Data Analyses to identify data gaps and potential links (find an example for a semantic knowledge graph about ‘Data Science’ here:

Questions on the use of Linked Data in businesses
We want to learn more about the opinion of various stakeholders working in different industry verticals about the status of Linked Data technologies. The main question is: Is Linked Data perceived as mature enough to be used on a large scale in enterprises? The results will contribute to the development of the Linked Data market by reporting how enterprises currently think.


SKOS as a Key Element in Enterprise Linked Data Strategies

The challenges in implementing linked data technologies in enterprises are not limited to technical issues only. Projects like these deal also with organisational hurdles to be crossed, for instance the development of employee skills in the area of knowledge modelling and the implementation of a linked data strategy which foresees a cost-effective and sustainable infrastructure of high-quality and linked knowledge graphs. SKOS is able to play a key role in enterprise linked data strategies due to its relative simplicity in parallel with its ability to be mapped and extended by other controlled vocabularies, ontologies, entity extraction services and linked open data.

Read the full paper >>>