Can microdata support multiple vocabularies?

In Drupal 7 we often use multiple types to describe pieces of content, for example a node is a sioc:Post and a foaf:Document, a comment a sioc:Post and a sioct:Comment. This is useful when you want to be generic and specific at the same time, or when a consumer only recognizes one vocabulary and not the other. For example imagine that you want to combine Facebook's Open Graph protocol and schema.org vocabularies onto the same page elements so that your blog posts are recognized by Facebook, Google and Bing. Another use case is a news website using rNews, a proposed standard developed by the IPTC for annotating news-specific metadata. There is great incentive for such news website to also be indexed by the major search engines, which will only recognize the schema.org vocabularies.

Update: alex b posted another interesting use case on the schemaorg-discussion mailing list:

I can imagine that some businesses may fit into more than one Schema
tag. Will there be a problem with applying more than one? For
example I might want to apply the following tags to a car dealer:
AutoDealer,
AutoRepair, AutoPartsStore if they perform all of these functions of
course. Is this going to fly or is this something we're not
supposed to do?

From the use cases above, it is clear that the ability to combine vocabularies is essential for allowing a decentralized extensibility on the Web.

When authoring an HTML document, you need to be able to juggle between vocabularies. In Drupal 7 this was achieved by the use of CURIEs in RDFa, where you can shorten a vocabulary base URL into a prefix (e.g. foaf for the FOAF vocabulary http://xmlns.com/foaf/0.1/). Then, when referring to an element from the FOAF vocabulary, all you need to do is prepend the prefix in front of it, for example foaf:Document. The microdata folks decided that this mechanism was not appropriate to put in the hands of regular web developers because it is too error prone. The rest of the article explores how microdata handles the case of multiple vocabularies.

The first thing I did to learn about microdata was to read the microdata chapter of the Dive Into HTML5 book. The title of the chapter is '“Distributed,” “Extensibility,” & Other Fancy Words.' . With such title, I thought if microdata was designed to allow multiple vocabularies, it would definitely be explained in there. I found nothing about multiple vocabularies. I later had a quick look at the microdata specification, and could not find anything about what I was looking for. Well, I actually missed an interesting example as Lin Clark later pointed out to me later. This example illustrates a itemprop attribute with two tokens from different vocabularies. Microdata allows multiple property names in the itemprop attribute: either defined property names from the vocabulary of the typed item (defined in the item's itemtype attribute), or a absolute URLs. Because only one "default" vocabulary can be specified via the item's type, and to avoid any ambiguity, you can only make use of the short property names (e.g. description) for that default vocabulary, and you have to use full URLs when referring to other vocabularies inside the item. Here is a snippet inspired by the microdata spec example where the title of an article has two properties from schema.org and Open Graph protocol:

<section itemscope itemtype="http://schema.org/Article">
  <h1 itemprop="name http://ogp.me/ns#title">How to Tie a Reef Knot</h1>
</section>

Next, let's look at using multiple vocabularies for typing microdata items. This is what the microdata specification says about the itemtype attribute:

The type for an item is given as the value of an itemtype attribute on the same element as the itemscope attribute. [...]
An item can only have one type. [...]
The itemtype attribute, if specified, must have a value that is a valid URL that is an absolute URL for which the string "http://www.w3.org/1999/xhtml/microdata#" is not a prefix match.

Here is a microdata snippet using the itemtype attribute:

<section itemscope itemtype="http://schema.org/Person">
  <h1 itemprop="name">Stéphane Corlosquet</h1>
</section>

and the same snippet using RDFa 1.1:

<section vocab="http://schema.org/" typeof="Person">
  <h1 property="name">Stéphane Corlosquet</h1>
</section>

Let's see how we can set the type of a person using schema.org and FOAF at the same time. This is how we could do it in RDFa 1.1.

<section prefix="schema: http://schema.org/ foaf: http://xmlns.com/foaf/0.1/" typeof="schema:Person foaf:Person">
  <h1 property="schema:name foaf:name">Stéphane Corlosquet</h1>
</section>

All the vocabularies needed in this snippet are first registered in the prefix attribute, they can then be reused using their respective prefix. In RDFa, the typeof attribute does not have any limitation and can include as many types as needed. Alternatively, you could also use full URLs in the typeof attribute if you preferred.

Given the limitation of the microdata's itemtype attribute to one token, the above is not as straight forward to achieve in microdata. Earlier this week I asked about this in #whatwg, and Simon Pieters gave me the following snippet, which makes use the itemref and id attributes:

<div itemscope itemtype="http://schema.org/School" itemref=a></div>
<div itemscope itemtype="http://facebook.com/School" itemref=a></div>
<span id=a itemprop="http://schema.org/classname http://facebook.com/class">A</span>

Another solution was proposed by Martin Hepp on the semantic web mailing list, which asserts extra types via additional itemprop attributes:

<div itemscope itemtype="http://www.foo.com/Type1" itemid="http://acme.org/things#1">
  <link itemprop="http://www.w3.org/1999/02/22-rdf-syntax-ns#type" href="http://www.foo.com/Type2" />
</div>

Update: The microdata specification was changed to allow for multiple types in @itemtype with the constraint that they need to belong to the same microdata vocabulary. Mixing types from multiple vocabularies is still not supported however.

Given the low number of examples including multiple vocabularies in the microdata spec, it is clear that the ability to handle multiple vocabularies was not a high priority feature for microdata, focusing instead on making the markup simple for web authors, and avoiding CURIEs. It is however possible to reference more than one vocabulary in microdata, but at the cost of verbosity and more complex markup. I'm personally glad that the RDFa WG has integrated feedback from the microdata community into RDFa 1.1. As a result, web authors choosing to use RDFa can benefit from the same level of simplicity as microdata, while those addressing more complex use cases can decide to use advanced features such as CURIEs if they wish to do so.

Add new comment