Thursday, May 21, 2009

Semantic Web - Microformat

A microformat is a web-based[1] approach to semantic markup that seeks to re-use existing XHTML and HTML tags to convey metadata[2] and other attributes. This approach allows information intended for end-users (such as contact information, geographic coordinates, calendar events, and the like) to also be automatically processed by software.

Although the content of web pages is technically already capable of "automated processing", and has been since the inception of the web, such processing is difficult because the traditional markup tags used to display information on the web do not describe what the information means.[3] Microformats are intended to bridge this gap by attaching semantics, and thereby obviate other, more complicated methods of automated processing, such as natural language processing or screen scraping. The use, adoption and processing of microformats enables data items to be indexed, searched for, saved or cross-referenced, so that information can be reused or combined.[3]

Current microformats allow the encoding and extraction of events, contact information, social relationships and so on. More are being developed. Version 3 of the Firefox browser,[4] as well as version 8 of Internet Explorer[5] are expected to include native support for microformats.

Background

Microformats emerged as part of a grassroots movement to make recognizable data items (such as events, contact details or geographical locations) capable of automated processing by software, as well as directly readable by end-users.[3][6] Link-based microformats emerged first. These include vote links that express opinions of the linked page, which can be tallied into instant polls by search engines.[7]

As the microformats community grew, CommerceNet, a nonprofit organization that promotes electronic commerce on the Internet, helped sponsor and promote the technology and support the microformats community in various ways.[7] CommerceNet also helped co-found the Microformats.org community site.[7]

Neither CommerceNet nor Microformats.org is a standards body. The microformats community is an open wiki, mailing list, and Internet relay chat (IRC) channel.[7] Most of the existing microformats were created at the Microformats.org wiki and associated mailing list, by a process of gathering examples of web publishing behaviour, then codifying it. Some other microformats (such as rel=nofollow and unAPI) have been proposed, or developed, elsewhere.

Technical overview

XHTML and HTML standards allow for semantics to be embedded and encoded within the attributes of markup tags. Microformats take advantage of these standards by indicating the presence of metadata using the following attributes:

  • class
  • rel
  • rev (in one case, otherwise deprecated in microformats[8])

For example, in the text "The birds roosted at 52.48,-1.89" is a pair of numbers which may be understood, from their context, to be a set of geographic coordinates. By wrapping them in spans (or other HTML elements) with specific class names (in this case geo, latitude and longitude, all part of the geo microformat specification):

The birds roosted at
<span class="geo">
<span class="latitude">52.48</span>,
<span class="longitude">-1.89</span>
</span>

Machines can be told exactly what each value represents and can then perform a variety of tasks such as indexing it, looking it up on a map and exporting it to a GPS device.

Example

In this example, the contact information is presented as follows:

 <div>
<div>Joe Doe</div>
<div>The Example Company</div>
<div>604-555-1234</div>
<a href="http://example.com/">http://example.com/</a>
</div>

With hCard microformat markup, that becomes:

 <div class="vcard">
<div class="fn">Joe Doe</div>
<div class="org">The Example Company</div>
<div class="tel">604-555-1234</div>
<a class="url" href="http://example.com/">http://example.com/</a>
</div>

Here, the formatted name (fn), organisation (org), telephone number (tel) and web address (url) have been identified using specific class names and the whole thing is wrapped in class="vcard", which indicates that the other classes form an hCard (short for "HTML vCard") and are not merely coincidentally named. Other, optional, hCard classes also exist. It is now possible for software, such as browser plug-ins, to extract the information, and transfer it to other applications, such as an address book.

In-context examples

For annotated examples of microformats on live pages, see HCard#Live example and Geo (microformat)#Three_classes.

Specific microformats

Several microformats have been developed to enable semantic markup of particular types of information.

  • hAtom - for marking up Atom feeds from within standard HTML
  • hCalendar - for events
  • hCard - for contact information; includes:

Microformats under development

Among the many proposed microformats[13], the following are undergoing active development:

  • hAudio - for audio files and references to released recordings
  • hRecipe [14]
  • citation - for citing references
  • currency - for amounts of money
  • figure - for associating captions with images [15]
  • geo extensions - for places on Mars, the Moon, and other such bodies; for altitude; and for collections of waypoints marking routes or boundaries
  • species - For the names of living things.
  • measure - For physical quantities, structured data-values.[16]

Uses of microformats

Using microformats within HTML code provides additional formatting and semantic data that can be used by applications. These could be applications that collect data about on-line resources, such as web crawlers, or desktop applications such as e-mail clients or scheduling software. They can also be used to facilitate "mash ups" such as exporting all of the geographical locations on a web page into Google Maps, to visualize them spatially.

Several browser extensions, such as Operator for Firefox and Oomph for Internet Explorer, provide the ability to detect microformats within an HTML document and export them into formats compatible with contact management and calendar utilities, such as Microsoft Outlook. Yahoo! Query Language can be used to extract microformats from web pages.[17]

Microsoft expressed a desire to incorporate Microformats into upcoming projects;[18] as have other software companies.

In Wikipedia - and more generally in MediaWiki - microformats are used as part of templates like {{coord}}.

Alex Faaborg summarizes the arguments for putting the responsibility for microformat user interfaces in the web browser rather than making more complicated HTML:[19]

  • Only the web browser knows what applications are accessible to the user and what the user's preferences are
  • It lowers the barrier to entry for web site developers if they only need to do the markup and not handle "appearance" or "action" issues
  • Retains backwards compatibility with web browsers that don't support microformats
  • The web browser presents a single point of entry from the web to the user's computer, which simplifies security issues

No comments:

Post a Comment