Really large companies start to spur the semantic web. Reuters has recently launched a semantic web service which is free also for commercial purposes. It helps to extract significant phrases from any unstructured text (web documents or office documents). This new service is called “OpenCalais” and is based on ClearForest text-analytics solutions (which was acquired by Reuters in 2007). So finally a dream comes true: Web content can be tagged automatically in quite a high quality. Technically spoken: Any unstructured text can be transformed into an RDF-graph on the fly, important phrases or even statements can be extracted from plain text.
OpenCalais is the core service for many new web applications and most of them will deal with better search functionalities or will also help to identify similarities between different types of content. For instance, for any document which is published on a web site related blogs or videos (or whatever) can be retrieved and presented as relevant context information.
Whenever an application will use OpenCalais content will be delivered to Reuters. Thus, submitting a URL has a different meaning in the future than it had all the years before: It´s not only about “promoting” a website anymore, it´s rather about examining ways to get connected with the semantic web – and about teaching Reuter´s global knowledge base 😉
Thanks for the link, Andreas. I’m glad you were able to successfully try out the Calais service through the little tool I made.
Hi Abhay, I wonder which interesting apps will be built on top of opencalais – what do you think?
Entity extraction itself is a very interesting aspect of natural language processing. For me, I’m more interested in tagging content in the wild and being able to find the “next interesting page” stumbleupon-style but with actual semantic meaning.