Featured

What is an Active Metadata Hub?

An active metadata hub – in other words, metadata middleware – uses a data catalog augmented by knowledge graphs and ML to enable the orchestration, interconnection, and enrichment of originally passive metadata. This means not only connecting and capturing metadata from a variety of data sources, but also integrating with other data management tools so that all metadata can be enriched and shared across silos via an active metadata hub that then becomes the authoritative source of metadata across the enterprise.

To make the most of this comprehensive and high-quality metadata treasure trove across a wide range of business processes, an active metadata hub should form the core of an open data ecosystem, be easily accessible via direct integrations and APIs, and thus integrate easily with the existing technology stack.

What was conceived in the early 2000s by Tim Berners-Lee at the W3C as the “Semantic Web” for the WWW is now manifesting itself as the Corporate Semantic Web, where active metadata hubs are at the core, in numerous enterprises worldwide.

Gartner refers to “Active Metadata” when, among other things, content analysis tools are used to derive missing metadata and evaluate content change patterns or data profiling tools are used to derive missing metadata and data value change patterns.

The PoolParty Semantic Suite is one of the world’s leading enterprise metadata platforms based on semantic graph principles.

With the end-to-end implementation of a metadata management system based on graph technologies and ML, PoolParty has pursued the vision of ‘Active and Augmented Metadata’ from the very beginning, even though the term was not yet explicitly discussed at the start of development. The vision of self-service data integration is linked to the possibilities of being able to semantically describe metadata with the help of meaningful ontologies and automatically link them with the help of ML procedures, in short: to be able to consistently describe contexts in order to be able to introduce, among other things, intelligent recommender systems or complex classifiers (e.g. for sense extraction from text).

For example, PoolParty can generate active metadata that subsequently triggers activity by combining “passive” metadata with content metadata that can be automatically generated by entity extraction and contextualized or interpreted using a knowledge graph.

In all cases, knowledge graphs play the central role, linking individual metadata, contextualizing them and drawing conclusions from them, which in turn initiate concrete next steps in a workflow.

Six Core Aspects of Semantic AI

Screen Shot 2018-05-07 at 11.32.29

Hybrid approach: Semantic AI is the combination of methods derived from symbolic AI and statistical AI. Virtuously playing the AI piano means that for a given use case various stakeholders, not only data scientists, but also process owners or subject matter experts, choose from available methods and tools, and collaboratively develop workflows that are most likely a good fit to tackle the underlying problem. For example, one can combine entity extraction based on machine learning with text mining methods based on semantic knowledge graphs and related reasoning capabilities to achieve the optimal results.
Data Quality: Semantically enriched data serves as a basis for better data quality and provides more options for feature extraction. This results in higher precision of prediction & classification calculated by machine learning algorithms. Example: PoolParty Semantic Classifier
Data as a Service: Linked data based on W3C Semantic Web standards can serve as an enterprise-wide data platform and helps to provide training data for machine learning in a more cost-efficient way. Instead of generating data sets per application or use case, high-quality data can be extracted from a knowledge graph or semantic data lake. Through this standards-based approach, also internal data and external data can be linked with little effort and can be used as a rich data set for any machine learning task.
See also: The Knowledge Graph as the Default Data Model for Machine Learning
Structured data meets text: Most machine learning algorithms work well either with text or with structured data, but those two types of data are rarely combined to serve as a whole. Semantic data models bridge the gaps between most used data formats such as XML, relational data, CSV or also unstructured text when using NLP and text mining methods. This allows us to link data across heterogeneous data sources to provide data objects as training data sets which are composed of information from structured data and text at the same time.
See also: Leveraging Taxonomy Management with Machine Learning
No black-box: In sharp contrast to AI technologies that ‘work like magic’, where only a few experts really understand the underlying techniques, Semantic AI seeks to provide an infrastructure to overcome information asymmetries between the developers of AI systems and other stakeholders, including consumers and policymakers. Semantic AI ultimately leads to AI governance that works on three layers: technically, ethically, and on the social and legal layer.
See also: A Layered Model for AI Governance and Explainable Artificial Intelligence
Towards self optimizing machines: Semantic AI is the next-generation Artificial Intelligence. Machine learning can help to extend knowledge graphs (e.g., through ‘corpus-based ontology learning’ or through graph mapping based on ‘spreading activation’) and, in return, knowledge graphs can help to improve ML algorithms (e.g., through ‘distant supervision’). This integrated approach ultimately leads to systems that work like self optimizing machines after an initial setup phase, while being transparent to the underlying knowledge models. Graph Convolutional Networks (in progress) promise new insights.
See also: How do we capture structure in relational data? and Snorkel: A System for Fast Training Data Creation

Download White Paper: Introducing Semantic AI – Ingredients for a sustainable Enterprise AI Strategy

Introducing PoolParty GraphSearch: Cognitive Search based on Graphs

What is PoolParty GraphSearch?

With PoolParty GraphSearch companies can search over a variety of content types and business objects and analyze the data on a more granular level. All content and data repositories that are connected to GraphSearch are annotated with semantic metadata that makes the search, recommendation and analytics operations highly precise. GraphSearch is a front-end application put on top of a semantic infrastructure and an API providing the following features:

Ontology-based data access (OBDA)
Faceted search including hierarchies
Autocomplete combined with context information
Custom views on entity-centric and document-centric data
Statistical charts for the unified data repositories
Plug-in system for recommendation and similarity algorithms

How does it work?

Business users query knowledge assets in GraphSearch along data models. As multiple systems can be connected to GraphSearch, the variety of knowledge models are additionally linked by an ontology layer.

System administrators can define which part of the ontology and corresponding entities in the taxonomy should be used in the GraphSearch front-end application. That way, they define specific views on data. They can also provide multiple search spaces within GraphSearch and enable the user to switch between them. A search space is a customized search configuration over a specific data set. The selected search facets for each search space are derived from the knowledge graph.

GraphSearch can be enhanced with recommendation algorithms. These can work with similarity-based recommendations, or for some use cases, a matchmaking algorithm is more suitable. The research team of Semantic Web Company has a strong focus on machine learning and is continuously extending the library of machine learning algorithms in GraphSearch.

Data analytics functionalities support the business user to derive even more granular insights. Search facets can be combined into statistical charts and display which kind of data is actually available for specific topics.

Agile Data Management and Integration

The implementation of PoolParty GraphSearch is the beginning of consolidating data silos without data migration. Various functional roles have to work together in order to deliver a unified data environment. PoolParty takes the heterogeneous technical backgrounds of the involved professionals into consideration.

Specific user-friendly solutions support the whole knowledge management team in their collaborative work processes:

Subject matter experts can define a semantic data layer to describe the meaning of metadata in the PoolParty taxonomy management tool.
Knowledge engineers can link separate taxonomies and maintain the knowledge graph in the same tool.
Information architects and developers can link various content and data repositories with the semantic metadata via the PoolParty API.
Data scientists can adapt embedded machine-learning algorithms to finetune the search, classification, and recommendation results that are mainly derived through the knowledge graph.
This semi-automatic knowledge engineering approach sustains that the query results will gradually get more precise and applicable to a continuously growing data environment.
On top of that, GraphSearch enables business users to search over data repositories and analyze available information.

Want to learn more?

Download White Paper (PDF): The Next Generation of Knowledge Discovery Enabled by the Fusion of Search, Recommendation and Analytics Enabled by the Fusion of Search, Recommendation and Analytics
Try it out: http://vocabulary.semantic-web.at/GraphSearch/

Recap SEMANTiCS 2017 (Amsterdam)

This was SEMANTICS 2017: Reaching out to new industries, vendors, and establishing good relations to neighbouring communities. These were the reasons why to bring SEMANTiCS conference series to Amsterdam.

The feedback and the pure figures show the success of that effort. Exactly 370 attendees joined the SEMANTiCS Conference in Amsterdam. This growth by 33% shows that SEMANTiCS meets a growing demand for such gatherings in industry and research.

The 13th edition of SEMANTiCS combined researchers, professionals, and practitioners from across the globe – from a total of 28 different countries. Most visitors and participants came from the Netherlands, Germany, Austria, UK and Belgium.

https://2017.semantics.cc/was-semantics-2017

Linked Data – The Next 5 Years: From Hype to Action

Linked Data and the Semantic Web have been around for quite a while and have been hyped again and again. In the meantime, a large number of enterprises and even whole industries have adopted semantic web technologies for several purposes (for example, visit Allotrope Foundation). “Gartner’s Hype Cycle 2015 for Advanced Analytics and Data Science” has put Linked Data into the trough of disillusionment, which is another clear indicator to be ready for takeoff.

The pace of semantic web technology adoption may vary from industry to industry, but in average it has increased even more than expected. Just in 2012, Gartner has predicted that the Semantic Web won’t reach the plateau of productivity within the next 10 years, only three years later it seems like it will be there in 5 to 10 years.

Linked Data Hype or not, it has entered the adoption phase. In the next 5 years we finally can see to which degree enterprises will use semantic web technologies for data analytics, data integration, and knowledge discovery.

What are the main obstacles that are frequently mentioned by potential users? Which best practices for implementing linked data on a larger scale have already been developed? What are the ‘low-hanging fruits’, and how could a concrete action plan look like? Will the often predicted interlinking of an open semantic web and corporate semantic webs take place? Which other technology (of the above mentioned hype cycles) might play a crucial role as an enabler for enterprise linked data? Which other (mega-)trends will influence the pace of linked data adoption, and which related organisational challenges should be expected?

Please visit Andreas Blumauer’s talk ‘Linked Data – The Next 5 Years: From Hype to Action’ at SEMANTiCS 2016 in Leipzig to get some valuable impulses for your Linked Data project!

SPARQL is the new King of all Data Scientist’s tools

Inspired by the development of semantic technologies in recent years, in statistical analysis field the traditional methodology of designing, publishing and consuming statistical datasets is evolving to so-called “Linked Statistical Data” by associating semantics with dimensions, attributes and observation values based on Linked Data design principles.

The representation of datasets is no longer a combination of magic words and numbers. Everything is becoming meaningful when URIs replace their positions as dereferencable resources, which further establishes the relations between resources implicitly and automatically. Different datasets are no longer isolated and all datasets share a globally, uniquely and uniformly defined structure.

With “RDF Data Cube Vocabulary” (http://www.w3.org/TR/vocab-data-cube/) there is already a W3C Recommendation available for linked statistical data . At this point, it is time to start building data-oriented applications and services with the traditional statistical computing languages such as R, while benefiting from the omnipotent semantic power of the SPARQL query language.

Most of the statistical analysis functions are set operations performed on subsets of a dataset (i.e., a slice, a facet, etc.). Calculation is dull machine work but how to group and create a subset is actually the innovation point to produce analytics. Thanks to SPARQL, now this subset can be created from a semantic perspective instead of mathematics and statistics.

Compared to the traditional filtering way using SQL queries, SPARQL queries eliminate the boundaries of data among datasets and among databases.

An example query can be “list the bestsellers of a supermarket for category science fiction movie in year 2014”. Someone may point out that it is also feasible with SQL if the database schema consists of all relevant fields. Well, it is absolutely correct. But what if there are more conditions such as “during weekends, directed by an American director, casted by European actors”? Is it necessary for a supermarket to maintain such data sets? Assume that there is a supermarket of which the boss is a movie fans and he would like to maintain such data and SQL is working perfectly so far. Can we reuse this query, and accordingly the Web application for another supermarket? Here we have good reasons why we use SPARQL. Any accessible resource can be used to construct the query results.

SPARQL is the new King of all Data Scientist’s tools because …

SPARQL is close to how human beings actually think about the world.
With SPARQL you can query knowledge graphs.
SPARQL is based on knowledge models that can combine mindsets of subject-matter experts, data engineers and information architects.
SPARQL is to the Semantic Web and the Web in general what SQL is to relational databases.
SPARQL is a W3C recommendation and is supported by many different database vendors, so it doesn’t cause lock-in effects as we’ve become used to with various types of SQL engines (which are not standardized at all).
With SPARQL you benefit from the potential to make a collection of data sources look and query like one big database.
SPARQL provides pattern based search functionality. With such search capabilities you can find out unknown linkages or non-obvious patterns that give you new insights into your data.
Not only is SPARQL a standardized query language, also the access via web interfaces is standardized (this is called a SPARQL endpoint). This makes the integration of different data sources a lot easier.
SPARQL is also a standardized update and graph traversal language.
SPARQL is a standardized protocol producing standardized results, thus making it a complete API alleviating developers from the necessity to reinvent an API with every single application.
With SPARQL you can query over structured and unstructured information as a whole.
SPARQL allows you to explore data. In contrast to traditional ways to query databases where knowledge about the database schema/content is necessary SPARQL allows you to ask “tell me what is there”.
SPARQL property paths offer completely new ways to explore a data set, e.g. by detecting ‘hidden links’ between business objects.
With SPARQL you can define inference rules to gain new information from existing facts.

Three shades of metadata

Search terms are like keys to documents or to any other type of content. In our days, not only the volume of available documents is increasing rapidly, but also the size of the keychain is growing.

Before we start to discuss how to organize/index the documents themselves, we should first talk about methods to organize the keys.

Take a look at these three approaches:

On the left-hand side you can see a nice example of implicit semantics. The old receptionist knows exactly which key fits into which lock. This knowledge is implicit and can be remembered by the shapes of the keys. Before the old receptionist has retired and the knowledge would have gone, labels have been attached to every key. The semantics of the keys has become more explicit, but this is still quite ambiguous (as visualised in the center column). There is no index of all the labels and colours being used, neither an explicit methodology how new keys should be labeled. As the hotel grows, the labeling system becomes quickly a mess. On the right-hand side the solution for this problem is offered: Not only the semantics of the keys becomes more explicit, but also the semantics of the semantics. For instance, the position of a key represents the position of the room, which can be unlocked by this. The number of the row of the key cabinet represents the storey of the room being unlocked, etc.

This methodology in order to organise keys helps to orientate and to remember; it can be explained with low effort to any new receptionist and it can be scaled-up in case your hotel should grow in the next few years.

Why semantic knowledge graphs matter

Most information professionals already know: separation of content and presentation helps to manage and deliver complex information. This can only be done by using enriched structured content. Some call this intelligent content.

But why exactly is metadata per document (some call it “tagging”) not enough?

Here is a very brief slide-deck, which explains the difference between the traditional approach and the graph-based approach to develop not only a metadata layer seperated from the content layer, but also a knowledge layer on top of it.

Why semantic knowledge graphs matter

Linked Data meets Data Science

As a long-term member of the Linked Data community, which has evolved from W3C’s Semantic Web, the latest developments around Data Science have become more and more attractive to me due to its complementary perspectives on similar challenges. Both disciplines work on questions like these:

How to extract meaningful information from large amounts of data?
How to connect pieces of information to other pieces in order to generate ‘bigger pictures’ of sometimes complex problems?
How to visualize complex information structures in a way that decision-makers benefit from it?

Two complementary approaches
When taking a closer look to the approaches taken by those two ‘schools of advanced data management’ one aspect becomes obvious: Both try to develop models in order to be able to ‘codify and to calculate the data soup’.

While Linked Data technologies are built on top of knowledge models (‘ontologies’), which try to describe first of all data in distributed environments like the web, are Data Science methods mainly based on statistical models. One could say: ‘Causality and Reasoning over Distributed Data’ meets ‘Correlation and Machine Learning on Big Data’.

Graph databases are key to success
In contrast to this supposed contradiction, correlations and complementarities between those two disciplines prevail. Both approaches seek for solutions to overcome the problem with rigid data structures which can hardly adapt to the needs of dynamic knowledge graphs. Whenever relational databases cannot fulfill requirements about performance and simplicity, due to the complexity of database queries, graph databases can be used as an alternative.

Thus, both disciplines make use of these increasingly popular database technologies: While Linked Data can be stored and processed by standards-based RDF stores like Virtuoso, MarkLogic, GraphDB or Sesame, are the most popular graph databases for Data Scientists mainly based on the property graph model, for example: Titan or Neo4J. Some vendors like Bigdata support even both graph models.

Both graph models are similar and can be mapped to each other, but they try to solve slightly different problems:

the property graph model serves better the needs of Graph Data Analysts (e.g. for Social Network Analysis or for real-time recommendations)
RDF graph databases are great when distributed information sources should be linked to each other and mashed together (e.g. for Dynamic Semantic Publishing or for context-rich applications).

Connect both approaches and combine methods
I can see at least two options where methods from Data Science will benefit from Linked Data technologies and vice versa:

Machine learning algorithms benefit from the linking of various data sets by using ontologies and common vocabularies as well as reasoning, which leads to a broader data basis with (sometimes) higher data quality
Linked Data based knowledge graphs benefit from Graph Data Analyses to identify data gaps and potential links (find an example for a semantic knowledge graph about ‘Data Science’ here: http://vocabulary.semantic-web.at/data-science)

Questions on the use of Linked Data in businesses
We want to learn more about the opinion of various stakeholders working in different industry verticals about the status of Linked Data technologies. The main question is: Is Linked Data perceived as mature enough to be used on a large scale in enterprises? The results will contribute to the development of the Linked Data market by reporting how enterprises currently think.

Link: http://j.mp/linked-data-survey

SEMANTiCS 2014 – the emergence of a European Marketplace for the Semantic Web

SEMANTiCS conference celebrated its 10th anniversary this September in Leipzig. And this year’s venue has been capable of opening a new age for the Semantic Web in Europe – a marketplace for the next generation of semantic technologies was born.

About the Social Semantic Web

Some context in the era of Linked Data

What is an Active Metadata Hub?

Six Core Aspects of Semantic AI

Download White Paper: Introducing Semantic AI – Ingredients for a sustainable Enterprise AI Strategy

Introducing PoolParty GraphSearch: Cognitive Search based on Graphs

What is PoolParty GraphSearch?

How does it work?

Agile Data Management and Integration

Want to learn more?

Recap SEMANTiCS 2017 (Amsterdam)

Linked Data – The Next 5 Years: From Hype to Action

SPARQL is the new King of all Data Scientist’s tools

SPARQL is the new King of all Data Scientist’s tools because …

Three shades of metadata

Why semantic knowledge graphs matter

Linked Data meets Data Science

SEMANTiCS 2014 – the emergence of a European Marketplace for the Semantic Web