Produce and Consume Linked Data with Drupal!

Drupal in the Linked Data CloudProduce and Consume Linked Data with Drupal! is the title of the paper I will be presenting next week at the 8th International Semantic Web Conference (ISWC 2009) in Washington, DC. I wrote it at the end of M.Sc. at DERI, in partnership with the Harvard Medical School and the Massachusetts General Hospital which is where I am now working.

It presents the approach for using Drupal (or any other CMS) as a Linked Data producer and consumer platform. Some part of this approach were used in the RDF API that Dries committed a few days ago to Drupal core. I have attached full paper, and here is the abstract:

Currently a large number of Web sites are driven by Content Management Systems (CMS) which manage textual and multimedia content but also - inherently - carry valuable information about a site's structure and content model. Exposing this structured information to the Web of Data has so far required considerable expertise in RDF and OWL modelling and additional programming effort. In this paper we tackle one of the most popular CMS: Drupal. We enable site administrators to export their site content model and data to the Web of Data without requiring extensive knowledge on Semantic Web technologies. Our modules create RDFa annotations and - optionally - a SPARQL endpoint for any Drupal site out of the box. Likewise, we add the means to map the site data to existing ontologies on the Web with a search interface to find commonly used ontology terms. We also allow a Drupal site administrator to include existing RDF data from remote SPARQL endpoints on the Web in the site. When brought together, these features allow networked RDF Drupal sites that reuse and enrich Linked Data. We finally discuss the adoption of our modules and report on a use case in the biomedical field and the current status of its deployment.

Update: This paper won the Best Semantic Web In Use Paper Award at ISWC 2009! I have added the slides to this post, they are also available on slideshare.

AttachmentSize
corl-etal-2009iswc.pdf2.03 MB
slides_iswc2009_final2.pdf8.54 MB

Interesting. Very detailed write-up

I'm glad you've found a niche (with biomedical research) where all this theory can be put to practice!
I've looked a little at those huge seas of data (Pubmed & MeSH) and seen the potential ... but not gone out and found anyone to pay me for working with it :-)

Personally, I'm not sold on RDFa and SPARQL, preferring to be able to publish raw RDF+XML as a straightforward means of interlinking sites or producing semantics, but I see how SPARQL offers greater power in the future.

I REALLY want our taxonomy_xml.module to be able to produce semantic exports of our vocabs, and have been trying for a while to get RDF as the preferred syntax to represent drupal taxa. It could be that with enough RDF API in D7 I can do that natively without dragging in all the RDF* dependencies that would have prevented that adoption in D6.

Back to the article ... I think we are still looking for that killer app to make RDF+Drupal show its shine. Producing potentially queryable semantic summaries via a machine-readable endpoint is only one thing. What we need is a nice user-end consumer of those resources to make all the setup overhead worthwhile.
No, I still don't know what that will be :-)

Hey Dan, re RDFa vs. RDF/XML,

Hey Dan, re RDFa vs. RDF/XML, they both have their pros/cons, but remember that RDFa is the only format which is being parsed at the moment for the crawlers (SearchMonkey, Google). The advantage of RDFa is that it's self contained in the HTML output saving one request to the RDF output page. It also ensures the URIs dereference well as you don't have to worry about whether you should serve the human readable or machine readable version for a given URI (one of the Linked Data principles).

Producing potentially queryable semantic summaries via a machine-readable endpoint is only one thing.

yes, and it's a necessary thing before you can go any further. without data to play with, we're stuck :)

What we need is a nice user-end consumer of those resources to make all the setup overhead worthwhile.

totally agreed, this is still an open question. Have you looked at VisiNav?

re RDFa vs. RDF/XML

Hi guys,

Re the question of which formats are being parsed by search engines, it appears that Google has been crawling some of the data.gov.uk data by following links in RDF/XML alone, before HTML pages even existed. Therefore the situation may be more evolved than we realise.

Cheers, Tom.

Yeah, that thing about the

Yeah, that thing about the single URI is a bonus. I started building it how I wanted it to work, then realized I couldn't refer to other data docs (URL+RDF) when I was supposed to be referring to other docs (the actual URI)
... well, not without content-negotiation.

Not seen VisiNav, but it's a long way from a killer app from what I can see. Like the Similestuff, it looks like a proof-of-concept ... but is not yet the consumer product we need.

Killer apps

This isnt one, but an example of how things are starting out elsewhere : http://dbpedia.neofonie.de/browse/

The killer app for me comes in the field knowledge representation and dissemination - allowing "unusual" interlinks and perspectives over information that may not have been targetted at a particular audience. Make it public and wait for the weirdness !

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <h3> <img> <span> <blockquote> <div> <h1> <h2> <h3>
  • Lines and paragraphs break automatically.
  • You may post code using <code>...</code> (generic) or <?php ... ?> (highlighted PHP) tags.

More information about formatting options