kantenwerk » linked data

By: Kantenwerk  05/12/2011

kantenwerk » linked data

The serving of linked data on the dog food server works through content negotiation – basically, the first request by an agent would be to the URI of the resource (“plain” in the graph), specifying in the header whether an RDF or HTML representation is desired. The server then redirects to either the HTML or RDF document with the desired representation. In theory, this means that requests(rdf) + requests(html) = requests(plain). However, since it is perfectly feasible to request the HTML or RDF documents directly, the total of RDF+HTML is slightly higher. The total numbers are:

HTML: 238486
RDF: 35491
HTML+RDF: 273977
Plain: 247576

As the graph and the numbers show, the usage in terms of RDF requests is relatively low at the moment, indicating that there is still a long way to go for the Semantic Web to really take off (and that we need to work on making the site more popular).

This second graph shows the distribution of hits over time for the different kinds of resources which the server offers, as indicated by the requested namespace (dogfood:person, dogfood:conference, …). Interest in people resources is highest almost all of the time. Partially, this may be due to ego surfing of Semantic Web researchers. However, as the graphs below will show, bot traffic far exceeds traffic by human visitors, so my hunch is that the preference of people pages can be explained through the search strategies of the big search engine players out there – people information is probably considered more valuable. Of course, another factor is the fact that there are about three times as many people resources on the dog food server than e.g. conference resources.

Regarding the conference and workshop resources, those need to be examined in a more fine-grained fashion, since the respective namespaces cover everything connected to an event: papers, talks, chairs, the event itself, etc.

No self-respecting analysis can live without a nice longtail graph these days. Looking at visiting agents, we get such a distribution (y-scale is logarithmic). The agents in the head are the big search engine crawlers – GoogleBot, Yahoo! Slurp and MSNBot -, as well as the big name browsers. In the middle and long tail we find lots and lots of different other bots, crawlers and browsers, as well as various tools, data services and agents who didn’t give themselves a proper identifier and instead just show up as “Java” or “perl-libwww” (very naughty behaviour indeed…).

More interesting is probably this graph, which shows the agent distribution after I had sliced and diced it manually according to some criteria:

  • What type of agent is it: bot/crawler, browser (=human visitor), unspecified programming library, debugging or scripting tool (curl, wget, …) or data-service. The latter is term for agents which provide a service for other agents by processing some data on the Web. In contrast to crawlers, the purpose here is not archiving or indexing. Examples are format converters, snapshot generators, etc.
  • What is the “semanticity” of the agent: is it a conventional agent, or one that operates in a Semantic Web-aware fashion?
  • Mobile or not: I noticed a (small) amount of visits by mobile browsers, which I thought could be interesting to record separately.

All this and more will become part of my thesis and also (hopefully) make into some sort of more polished publication soon.


Other products and services from Kantenwerk

05/12/2011

kantenwerk » conference

SPARQL to your heart’s content, making use of the named graphs we have established for each event in the database. Enjoy eye-candy like the map of all organisations in the repository (provided we have their geo-coordinates). Browse thousands of people, papers and organisations in your Web browser,. What can the Dog Food site do for you.


05/12/2011

kantenwerk » internet of things

) It reminds me of a story where someone demonstrated the principle of topic maps with strings and other physical artifacts, thereby moving from the digital over to the physical domain. Conceptually it’s located somewhere in the vicinity of topics such as the internet of things, ubiquitous computing – only it doesn’t involve computing.


05/12/2011

kantenwerk » mobile computing

) It reminds me of a story where someone demonstrated the principle of topic maps with strings and other physical artifacts, thereby moving from the digital over to the physical domain. Conceptually it’s located somewhere in the vicinity of topics such as the internet of things, ubiquitous computing – only it doesn’t involve computing. I think this is a nice example of using technology to provide a better experience and understanding of a city.