Produce and Consume Linked Data with Drupal!

Drupal in the Linked Data CloudProduce and Consume Linked Data with Drupal! is the title of the paper I will be presenting next week at the 8th International Semantic Web Conference (ISWC 2009) in Washington, DC. I wrote it at the end of M.Sc. at DERI, in partnership with the Harvard Medical School and the Massachusetts General Hospital which is where I am now working.

It presents the approach for using Drupal (or any other CMS) as a Linked Data producer and consumer platform. Some part of this approach were used in the RDF API that Dries committed a few days ago to Drupal core. I have attached the full paper, and here is the abstract:

Currently a large number of Web sites are driven by Content Management Systems (CMS) which manage textual and multimedia content but also - inherently - carry valuable information about a site's structure and content model. Exposing this structured information to the Web of Data has so far required considerable expertise in RDF and OWL modelling and additional programming effort. In this paper we tackle one of the most popular CMS: Drupal. We enable site administrators to export their site content model and data to the Web of Data without requiring extensive knowledge on Semantic Web technologies. Our modules create RDFa annotations and - optionally - a SPARQL endpoint for any Drupal site out of the box. Likewise, we add the means to map the site data to existing ontologies on the Web with a search interface to find commonly used ontology terms. We also allow a Drupal site administrator to include existing RDF data from remote SPARQL endpoints on the Web in the site. When brought together, these features allow networked RDF Drupal sites that reuse and enrich Linked Data. We finally discuss the adoption of our modules and report on a use case in the biomedical field and the current status of its deployment.

Update: This paper won the Best Semantic Web In Use Paper Award at ISWC 2009! I have added the slides to this post, they are also available on slideshare.


I'm glad you've found a niche (with biomedical research) where all this theory can be put to practice!
I've looked a little at those huge seas of data (Pubmed & MeSH) and seen the potential ... but not gone out and found anyone to pay me for working with it :-)

Personally, I'm not sold on RDFa and SPARQL, preferring to be able to publish raw RDF+XML as a straightforward means of interlinking sites or producing semantics, but I see how SPARQL offers greater power in the future.

I REALLY want our taxonomy_xml.module to be able to produce semantic exports of our vocabs, and have been trying for a while to get RDF as the preferred syntax to represent drupal taxa. It could be that with enough RDF API in D7 I can do that natively without dragging in all the RDF* dependencies that would have prevented that adoption in D6.

Back to the article ... I think we are still looking for that killer app to make RDF+Drupal show its shine. Producing potentially queryable semantic summaries via a machine-readable endpoint is only one thing. What we need is a nice user-end consumer of those resources to make all the setup overhead worthwhile.
No, I still don't know what that will be :-)

Hey Dan, re RDFa vs. RDF/XML, they both have their pros/cons, but remember that RDFa is the only format which is being parsed at the moment for the crawlers (SearchMonkey, Google). The advantage of RDFa is that it's self contained in the HTML output saving one request to the RDF output page. It also ensures the URIs dereference well as you don't have to worry about whether you should serve the human readable or machine readable version for a given URI (one of the Linked Data principles).

Producing potentially queryable semantic summaries via a machine-readable endpoint is only one thing.

yes, and it's a necessary thing before you can go any further. without data to play with, we're stuck :)

What we need is a nice user-end consumer of those resources to make all the setup overhead worthwhile.

totally agreed, this is still an open question. Have you looked at VisiNav?

Hi guys,

Re the question of which formats are being parsed by search engines, it appears that Google has been crawling some of the data by following links in RDF/XML alone, before HTML pages even existed. Therefore the situation may be more evolved than we realise.

Cheers, Tom.

Yeah, that thing about the single URI is a bonus. I started building it how I wanted it to work, then realized I couldn't refer to other data docs (URL+RDF) when I was supposed to be referring to other docs (the actual URI)
... well, not without content-negotiation.

Not seen VisiNav, but it's a long way from a killer app from what I can see. Like the Similestuff, it looks like a proof-of-concept ... but is not yet the consumer product we need.

This isnt one, but an example of how things are starting out elsewhere :

The killer app for me comes in the field knowledge representation and dissemination - allowing "unusual" interlinks and perspectives over information that may not have been targetted at a particular audience. Make it public and wait for the weirdness !

RDFCCK sounds really promising, and is close to what I want to do. Here is a use case, I'd love to know if its already possible with some simple configuration or if this is something I could code and contribute to the community:

I have RDFCCK installed on a site I am building,, whose purpose is to let parents, teachers, kids and other community members share ideas for lessons, tips on existing lessons and so on. I would like to let users make structured annotations to each other's entries (as well as comments) such as

[text of new node to be created, that extends subject node]

[text of new node to be created, that implements the idea in subject node]

I can certainly do this using RDFCCK by importing my 'abra' vocabulary and entering the triples manually under the RDF data section. But, I would like users to be able to do it in the same way that they add comments, except that they would pick the predicate from a select drop-down and the object would generally be a new auto-generated node from the text that they enter (but sometimes would be a URI instead if they want to talk about an external link).

Is this something that is already done, or if not do you think it would be a useful extension of the RDFCCK module?

Then also I will want to make blocks that display nodes based on their RDF relationships, that is probably done? I am sorry if this is a dumb question, I haven't done that much with Drupal & RDF yet.

Thanks for the great work!


Hi Golda,

The mappings defined in RDF CCK cannot be changed on a per node basis: once you've assigned a mapping to a field for example, all the nodes of that type will get that mapping. RDFCCK could be extended to invoke something like hook_rdfcck_mapping_alter() which could be implemented by your module. An alternative would be to store the mapping as part of the node data. Not something we've done before but why not. It would look like the RDF CCK UI but in the node form. You would probably only want to select a few fields to be overridden by the user though, so the form does not become overloaded.

Regarding the RDF relationship, that has not been done, but I would try to integrate this with Views which can already do similar things with arguments.

One last point, and rather important one: I would strongly encourage you to switch to Drupal 7 as it has a much better RDF support, and it is where all the new features are going to be implemented. RDF CCK is being merged into the main RDF module in Drupal 7 which will support RDF for more than just nodes and CCK fields. Get started with Drupal 7 by downloading it and subscribe to to follow the latest news on the topic.


Makes sense, it sounds like RDF CCK is intended for importing vocabularies and so creating custom content types, which is a good thing to do. I do want to do that, but also to make arbitrary assertions about any node of any type. I will take your advice and see what Drupal 7 has to offer!

Add new comment