The Open Graph protocol and Drupal

A couple of days after Dries Buytaert gave his keynote at DrupalCon San Francisco and reaffirmed his support for the Semantic Web in Drupal, Facebook co-founder Mark Zuckerberg announced at the f8 developer conference the brand new Open Graph protocol, a technology to turn webpages into social objects and capture them in a social graph. The announcement was backed up by a lot of PR and according to Facebook, 50,000 websites have already implemented OGP including IMDb, NHL, Posterous, Pandora, Rotten Tomatoes, Yelp and more. There is plenty to read about the marketing around this announcement, but I'm going to keep this post at a technical level only.

The good news is that the Open Graph protocol is built atop existing Semantic Web standards like RDF and RDFa, the same standards which have been integrated into Drupal 7. Facebook is joining Yahoo! SearchMonkey and Google Rich Snippets which now all consume RDFa. Although it has been designed and created by Facebook, OGP can be used by anyone, Facebook being the first consuming this data produced by the sites having the right OGP markup. In fact, there is little information about Facebook on the main OGP documentation page, they even refer the reader to Facebook documentation as "their documentation", keeping OGP as generic as possible. Any web application is free to markup their webpages with the Open Graph protocol markup, and any web application is free to consume this data like Facebook does today - in essence, it's no different than tackling the Semantic Web chicken an egg issue, making the data available as machine readable format (RDFa in this particular case) so that other peers can consume it. Kudos to Facebook for the right intention. The like button is not part of the Open Protocol, it's a Facebook specific implementation which is detailed on the Facebook developer documentation.

I've created an opengraphprotocol module for Drupal 7 which takes advantage of its new core functionalities such as the use of namespaces in RDFa. OPG requires to add the og and fb namespaces in the HTML output. This is something which would have required users to hack their theme in Drupal 6, but which is only a couple of lines in a Drupal 7 modules thanks to hook_rdf_namespaces:

<?php
function opengraphprotocol_rdf_namespaces() {
  return array(
   
'og'      => 'http://opengraphprotocol.org/schema/',
   
'fb'      => 'http://www.facebook.com/2008/fbml',
  );
}
?>

The rest of the module adds the Open Graph protocol RDFa markup in the head HTML element of the page: og:title, og:type, og:url and og:image. Most importantly, taking full advantage of Drupal's content types, the module offers a basic mapping interface to define what type of social object you want your content types to be mapped to which is then reflected in the page markup via the og:type property. With fields now in core, the module will also output whatever field is recognized as one of the Open Graph protocol properties like description, image, latitude, longitude, locality, region, email, phone_number, fax_number. So for instance, if you create a field 'description' (machine name field_description) its content will be marked up with OGP. Similarly you can create a field of type integer 'phone_number' and it will be exported as well. Finally the module adds the Like button for commodity and automatic integration with Facebook. You can see the module in action on this site and note the Like button below this article.

The Open Graph protocol's not perfect, but none of Google or Yahoo! got it right the first time either, and I believe OGP will align with the best practices. I wish the Open Graph protocol was not so specific and was encouraging developers to write richer RDFa markup like what we have in Drupal 7.

  • Open Graph protocol doesn't promote the "Don't Repeat Yourself" (DRY) pattern which RDFa enables: OGP asks developer to reiterate information which is likely to exist in the page. For example when a field is marked up with RDFa in Drupal, the related semantic markup is directly added to the HTML markup surrounding the field data. I take it that OPG is targeting applications which might not be have a flexible rendering engine like what we have in Drupal, but how about those which do?
  • OGP redefines vocabulary terms which have been around for many years, to name a few:
    og:image        -> foaf:depiction
    og:latitude     -> geo:lat
    og:postal-code  -> vcard:postal-code
    og:email        -> foaf:mbox
    og:phone_number -> foaf:phone

    The problem is that existing RDF data which might already be using legacy vocabularies need to add OPG's specific terms if they want to be included in the Open Graph. This is a recurrent problem which happens every time a new big player adopts RDF, it happened with Yahoo! and Google too. RDF datasets end up with duplicate terms for the same semantic and have to add, say og:postal-code and google:postal-code even though they already have annotated their data with vcard:postal-code.

It also has some limitations which would not exist if more standard RDFa markup was used. More specifically:

  • The Open Graph protocol is not able to disambiguate a webpage and all the resources it might describe. In OGP's eyes, the social objects are the pages (HTML documents) and not the real concepts or physical objects people are likely to show an interest in. Let's look at some examples:
    • Take a user profile page (typically of type sioc:UserAccount) and the real person it describes (foaf:Person): what do you mean when you hit the "like" button, is it that you like that Person, or only that particular profile page of that person (say because it has a funny picture). Drupal makes the difference between the two entities in its RDFa markup, but OPG cannot capture that.
    • What if you want to like a particular comment on a page, and not the whole page?
    • Same goes for a page about a music album and all the songs it contains.

    The Web of Data Tim Berners-Lee and the Semantic Web community has been advocating for years is not what the Open Graph protocol enables, we're still at the old document linking stage here.

  • The Open Graph protocol introduces og:type, an alternative to the widely used rdf:type. The rationale behind it is to keep the markup consistent in line with their <@property> <@content> syntax. However, because the @content attribute is used, it means it requires a string as the type of object. The first consequences is a limitation in OGP: it is not possible to specify several types for the same object, for example you cannot say that someone is both an actor and director, something which would easily be specified using RDFa's typeof attribute if only we had proper URIs instead of string. Compare the following snippets. Here is what OPG promotes:
    <meta property="og:type" content="actor" />

    and this is what a more RDF friendly markup would look like:
    <meta about="" typeof="og:Actor og:Director" />

    By using the @typeof attribute (a shortcut in RDFa to specify the type of the object you're talking about), you get rid of the single type limitation, and you get to use real RDF classes which look like strings thanks to the CURIE syntax. Another benefit of using RDFa's typeof is that you are not limited to using types defined by OGP, but any type from any namespace such as foaf: or sioc:, and that's exactly what we do in Drupal 7.

I understand many of the critics above are justified by decisions the Facebook team took to keep the markup very simple for developers. David Recordon explained them in his talk The Open Graph Protocol Design Decisions at the Linked Data Camp at WWW2010, which was followed by a breakout session during which many people worked on the Open Graph protocol RDF schema, proof that Facebook seems to be fairly open to following the standards, or at least acknowledging them. I trust many of the issues I highlighted above will be fixed in the future.

RDFa in Drupal 7: last call for feedback before alpha release

The first alpha release of Drupal 7 will be created next Friday Jan 15th. We've already incorporated most of the feedback we received from the semweb community so far, but I wanted to give the community a last chance to review the RDFa markup and the default RDF mappings we use before it's too late. I should emphasize that all the markup and default RDF mappings that we ship in core will be pretty much set in stone after the stable release of Drupal 7, hence this call for feedback. Site administrators who care about semantics will be able to alter these mappings by installing extra modules, but many people (read several 10K sites) will just install Drupal 7 and not care about the semantics it generates. Therefore we want to make sure the RDFa generated by Drupal out of the box is somewhat correct and does not make folks from the semantic/pedantic web community angry :) - we've tried to keep the semantics as generic as possible for that reason.

RDF mappings

I've created a diagram representing the default semantics of the core data structure which has been committed and I would appreciate feedback on the RDF terms we've used.

Drupal 7 core RDF schema

RDFa markup

To make the RDFa markup review process easier, I've updated the usual testing site at http://drupalrdf.openspring.net/. It features a blog post with some comments which represents a typical Drupal 7 page annotated with RDFa. Some other pages have been randomly generated to be able to test the tracker which acts as a very simple sitemap in RDFa.

Status of RDF in Drupal (November 09) and wrap up of ISWC2009

ISWC 2009 logo flagI had the pleasure to give a presentation of the paper "Produce and Consume Linked Data with Drupal!" at ISWC2009 last, and I was very honored we won the Best Semantic Web in Use Paper award! The 30 minutes of presentation + Q/A passed very quickly and I didn't have much time to expand on the status of RDF in Drupal 7 vs. Drupal 6 after describing the inner workings of the modules we developed. I'm sure this will also interest some people outside the attendees. First of all, the current stable version of Drupal is Drupal 6 (the latest version at the time of this writing being Drupal 6.14). This is the version on which we started to implement the contributed modules presented at ISWC2009, namely RDF CCK, RDF external vocabulary importer (Evoc), SPARQL Endpoint and RDF SPARQL Proxy. Contributed modules means they do not get included in the core Drupal package, but people can download them from drupal.org for free and drop them on their server so Drupal core can be extended. These 4 modules work pretty well on Drupal 6, you can get RDF export in RDF/XML, N-Triples, turtle, json. However generating RDFa is not very easy as it requires to patch the CCK on which we rely to generate the content pages and store the various field data. We made sure this would not be a problem in the next version of Drupal (Drupal 7) which is still under development, and due to be released sometime next year. While we were at it, we also worked on porting one of the functionality present in the RDF CCK and Evoc module to Drupal 7 core: the ability to map the data structure to RDF and expose this in RDFa. This means that, by default and without requiring any knowledge about RDF from their administrator, Drupal 7 sites will expose the following elements as RDFa: title, date, author, content, comments, terms, users, etc. Of course, only publicly available data will be available as RDFa, whatever is private (like user emails addresses) will remain private. This will be part of Drupal 7 core. Needless to say that the rest of the functionalities offered by the set of already existing RDF contributed modules for Drupal 6 will also be available for Drupal 7 once these modules have been ported. We're starting to port these to Drupal 7 next Sunday, as part of the #D7CX Contrib upgrade code sprint in Boston. If you plan to use RDF in your next site, and can wait until Drupal 7 is released, I'd strongly encourage you to start looking at the new Drupal APIs and functionalities. Some RDF features which were not addressed in Drupal 6 will be much easier to achieve in Drupal 7. Try the latest development snapshot of Drupal 7 and report any bug you encounter.

Produce and Consume Linked Data with Drupal!

Drupal in the Linked Data CloudProduce and Consume Linked Data with Drupal! is the title of the paper I will be presenting next week at the 8th International Semantic Web Conference (ISWC 2009) in Washington, DC. I wrote it at the end of M.Sc. at DERI, in partnership with the Harvard Medical School and the Massachusetts General Hospital which is where I am now working.

It presents the approach for using Drupal (or any other CMS) as a Linked Data producer and consumer platform. Some part of this approach were used in the RDF API that Dries committed a few days ago to Drupal core. I have attached the full paper, and here is the abstract:

Currently a large number of Web sites are driven by Content Management Systems (CMS) which manage textual and multimedia content but also - inherently - carry valuable information about a site's structure and content model. Exposing this structured information to the Web of Data has so far required considerable expertise in RDF and OWL modelling and additional programming effort. In this paper we tackle one of the most popular CMS: Drupal. We enable site administrators to export their site content model and data to the Web of Data without requiring extensive knowledge on Semantic Web technologies. Our modules create RDFa annotations and - optionally - a SPARQL endpoint for any Drupal site out of the box. Likewise, we add the means to map the site data to existing ontologies on the Web with a search interface to find commonly used ontology terms. We also allow a Drupal site administrator to include existing RDF data from remote SPARQL endpoints on the Web in the site. When brought together, these features allow networked RDF Drupal sites that reuse and enrich Linked Data. We finally discuss the adoption of our modules and report on a use case in the biomedical field and the current status of its deployment.

One way ticket to Boston

It's a new chapter in life... This summer I finished my research Master at DERI, specializing in the Semantic Web, this somewhat new trend which promises the enhance the Web by making it more machine friendly and thus providing more powerful applications to humans. If you follow this blog or have heard of me in the past, you may not be surprised to hear that the topic of my thesis was on Drupal and the Semantic Web. It is still under review and should be available soon once it's been approved.

Boston at nightEarlier this year I was invited at the Harvard's Initiative in Innovative Computing, based in Cambridge, MA, where I very much enjoyed the working environment and the innovation-driven area which is Cambridge/Boston. Tim Clark and Sudeshna Das offered me to come back to work on the Science Collaboration Framework after the end of my studies, an appealing offer I could not refuse! So after a couple of month of traveling in Europe including France and Italy, and waiting for the US government to process the application and prepare a visa, here I am moving to Boston! This was my first time applying for a visa, and I must say the application process was quite easy, despite the lengthy process which took about 12 weeks. The next couple of weeks should be hectic with the new job, house hunting and all the administrative steps to settle down. My first stay in March allowed me to meet a good number of people so I am confident that starting out in Cambridge should not be a problem.

Half way through the RDF code sprint - Google Announces Support for RDFa

We're half way through the sprint and we just heard that Google has announced support for RDFa. What a coincidence, surely another sign proving Drupal's heading in the right direction! Now is time for some update on the sprint.

During day 1, we decided to split the group into two in order to enable some parallel developement and to make use of the 2 main skills we had at hand: RDF semantics and Drupal coding.

New book - Drupal 6 Social Networking: Communicating with Users

Our users can communicate with one another, which is great, but quite often as an administrator the need may arise for us to communicate with a user or users on our site. It may be to remind them about the web site or to inform active users about new changes to the site, which they may not have been made aware of.
This article is extracted from the "Drupal 6 Social Networking" book. In this article, you will learn:

  • About mailing lists, and how to use them with our Drupal social network
  • How to use an offline approach for contacting our users
  • How to use blocks of content to get a message across to your users

Getting started

We are going to look at a few different modules, some of which we will use. These need to be installed first, so let's do that now to save us time.

The modules

We need to download the following modules:

And extract them to our sites/all/modules folder, and enable them via the administration interface.

RDFa in Drupal: Bringing Cheese to the Web of Data

"RDFa in Drupal: Bringing Cheese to the Web of Data" is the title of our short paper which was recently accepted at the 5th Workshop on Scripting and Development for the Semantic Web. It seems that the topic of food on the semantic web is the new black as this paper comes out at the same time as Boris Mann's announcement about the Open Restaurants aka "BaconPatioBeer".

This paper illustrates how a CMS like Drupal can be used on the Semantic Web and make every Drupal site part of the growing Web of Data. We created a cheese review site as a use case. It relies on the RDF API and the RDF CCK modules.

The good news is that we are working to get this RDF goodness into Drupal core! We are organizing an RDF code sprint. This sprint builds on Dries' ideas expressed in his recent posts Drupal, the semantic web and search and RDFa and Drupal. With RDF in the core of Drupal and RDFa output by default, it's dozens of thousands of websites which will all of a sudden start publishing their data as RDF.

So far, Stéphane Corlosquet, Florian Loretan, Benjamin Melançon and Rolf Guescini have signed up. How about you?

Some others are willing to come but cannot afford the trip until some funding is secured. To help us fund the sprint and bring more Drupal rockstars on board, please consider making a donation using the ChipIn widget on this page. The money will be used to cover flight, food and hotel costs for the sprinters. All sprinters are generously donating their time to make this happen. It would also be great to fly in a few additional people with extensive testing and Fields experience. Any excess money will be used to add more people, or will be donated to the Drupal Association.

Sponsor the RDF code sprint

There are only a few months left before the code freeze on September 1st. Now that Fields API has settled in core, it's time to extend it with some RDF semantics. DERI Galway is hosting an RDF in Drupal code sprint during the week of May 11th until May 15th.

Goals of the code sprint

The RDF code sprint will focus on Drupal core and aim at integrating RDF semantics in it.

  1. Extend Fields API to integrate RDF mappings for each field instance. The semantics of a field can differ from a bundle to another. This can be stored either in the existing settings property or by adding a rdf_mappings property to the Field Instance objects.
  2. Modify the Fields UI (contrib) to allow RDF mappings editing.
  3. Define the appropriate mappings for the core modules, based on the RDF core mapping proposal.
  4. Patch core modules with the mappings defined above.
  5. Export these mappings in RDFa via the theme layer and keep it as generic as possible in order to ease the work of the themers.
  6. Write tests for RDF in core.
  7. Identify other non-fieldable entities in core which could benefit from being RDF-ized, and see how to annotate them. Comment is one example. Terms also, though they might become fieldable.
  8. RSS 1 (RDF) in core. Arto volunteered to get started with that.

Report on my recent trip to the US: Harvard, DrupalCon...

During my 5 week stay in the US, I was based at the Harvard's Initiative in Innovative Computing where I worked on the Drupal based Science Collaboration Framework (SCF) project with Tim Clark, Sudeshna Das and Benjamin Melançon. I had the chance to meet Tim last year when he visited DERI and presented the SCF project. Our goal was to align the efforts which were put into SCF with the efforts of the Drupal community in terms of RDF, and see what requirements are emerging from a project such as SCF and contribute them back to the Drupal community. Tim and Benjamin had arranged for me to present the latest RDF module developments at the semantic web interest group gatherings in Cambridge, Mass and New York. Many more popped up as I was there. They are detailed below.

New York for 2 days

Feb 26th: presentation at the meetup.com NYC semantic web user group organized by Marco Neumann. This was the description of the presentation:

Pages

Subscribe to OpenSpring RSS