Is it worthwhile to embed structured data in Web content?

Short answer: not really!

A few years ago, I was fascinated with the idea of microformats. The concept is to add structural data to common HTML tags so that certain types of data on a web page can be clearly identified. Adding structural information makes it easier to automate the parsing and indexing of web content. The original microformats included common data sets that contain well-defined data, such as contact information or a calendar event. In the intervening years, microformats were replaced by microformats2, with additional draft standards for more items such as recipes, resumes, etc.

I decided to use my resume as a “use case” for structured HTML data. My original plan was to use the h-resume draft specification from microformats2. I modified my plan when I realized that major search engines (Bing, Google, Yahoo! and Yandex) have decided to parse another semantic technology, microdata (microdata W3C spec). There is no point in creating semantic web pages if nobody is going to parse them, so I used the microdata specifications that are available at schema.org and placed them into an h-resume framework.  I used the event class to represent each position I held or educational achievement, because the event class is the only one with start and end dates.

Is it worth the effort to add structured data to HTML pages? Generally, no. Here’s why:

  1.  Approximately nobody is parsing it. Google and the other major search engines do extract some structured data, but (as far as I can tell) it is not done in a sophisticated way. Google’s own Structure Data Testing Tool is unable to extract complex structures. If you run the tool on my resume, you’ll see that Google only gets a bunch of unrelated events, instead of a resume. Therefore, it only makes sense to embed structure for simple data (like contact information) that is likely to be parsed and understood.
  2. Adding structured data is time-consuming and scales poorly. Ideally, I want to build and maintain ONE data structure that can be transformed and displayed in any required format. For example, I want to create and maintain ONE data source (such as a JSON file) that contains all the information about my career. If I want to put my resume on the web, I want to use a tool to parse the JSON file and generate HTML with appropriate structure. Unfortunately, I cannot find any tools to automatically generate structure. Even Google’s solution is a tool that allows you to retroactively add structure to existing HTML. Almost all web pages today are generated by server-side programs, and there need to be standard tools to translate data sources into structured HTML markup.
  3. There is a profusion of standards. The idea of the Semantic Web lead to the Resource Description Framework (RDF) standard, which is a very complex system that nobody actually seems to use. Then there are microformats and microdata (already described), and JSON-LD. It would be great if you could create all the data as JSON-LD and then generate structured HTML! Unfortunately, those tools do not exist. As it stands, microdata is the winner, simply because it’s the one that major search engines are (sort of) supporting.

Useful Links

Advertisements

2 thoughts on “Is it worthwhile to embed structured data in Web content?

  1. I doesn’t have any ideas for this for now but I would love to learn more about this structured data in web content. I still want to continue My studies about this and would love to comeback in this blog to learn more..

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s