In my last post, I followed my readers' advice and checked out the book "Programming the Semantic Web" published by O'Reilly. The full reference is below:
"Programming the Semantic Web by Toby Segaran, Colin Evans, and Jamie Taylor. Copyright 2009 Toby Segaran, Colin Evans, and Jamie Taylor, 978-0-596-15381-6."
I stopped in the middle of chapter 3, in this post we keep going with the review. The book tells us that:
"Inference is the process of deriving new information from information you already have." (page 43)
For example, you might have one piece of information, then download a second from the web, and then from these two pieces of information, derive a third. One of the examples given in the chapter is "If I know a restaurant's address, I can use a geocoder to find its coordinates on a map" (page 43).
The chapter goes onto work out which restaurants in Washington DC are likely to be touristy. It does that by working out which restaurants are near a tourist attraction and are at the same time cheap. It uses this example to explain how inference rules can chain together to generate new information:
"What's important to realize here is that the rules exist totally independently. Although we ran the three rules in sequence, they weren't aware of each other - they just looked to see if there were any triples that they knew how to deal with and then created new ones based on those. These rules can be run continuously—even from different machines that have access to the same triplestore - and still work properly, and new rules can be added at any time." (page 49)
The chapter then looks at merging graphs together, allowing queries across data from different sources. Then the chapter ends with some fun, we get to generate graphic visualisations with the program graphviz (which I discovered that I already had on my system).
Image by eteela used with permission.
Chapter 4 dives straight into RDF. In RDF, everything is a resource, identified by a URI (page 65). A URI does not have to be retrievable as a URL, though to aid uniqueness, it is a convention to use a hostname that you control as the first part of the URI. RDF allows the use of a blank node for situations where you do not know the URI (page 67), these are given an arbitrary ID starting with underscore colon _:
RDF can be expressed in different serialization formats (page 69), the chapter demonstrates these using a set RDF format, the Friend of a Friend (FOAF) vocabulary, as the primary example.
The first RDF serialization format covered is N-Triples, a series of statements, each one "containing a subject, predicate, and object followed by a dot" (page 71). N3 is very similar to N-Triples but various shorthands are introduced to remove redundancies (page 72).
Then the XML representation of RDF, which is perhaps what most people think of as RDF, is covered next (pages 73-76). Lastly RDFa, where XML attributes are added to XHTML tags, allowing one document to be both the human and machine-readable content (page 76). The extra XML attributes "specify the semantics behind the information that is displayed" (page 76).
Chapter 4 leaves behind the simplified tools from the previous chapters and breaks out RDFlib and SPARQL. SPARQL is a query language for RDF graphs. It is read only, and if you (dis)like SQL then you will equally (dis)like SPARQL. The chapter uses RDFlib to demonstrate this. I might cover RDFlib myself in a future post, so I will skip talking about it for now. Briefly, it seems a really useful library and the first call for dealing with all things RDF in Python.
Chapter 5 is "source of semantic data" which seems to have extensive examples of the work covered so far, I skipped this chapter for now, in order so I could press straight on to chapter 6 about ontologies.
As I have been going along, I have been trying out all the examples, and there have been several small errors in the code, especially inconsistently named references and files. This is not supposed to be a book aimed just at Python users, this is a general book for anyone, with the examples just happening to be in Python. Therefore there could be a little more proofreading, i.e. user testing, of the examples.
This did not spoil it for me, as a more experienced Python programmer I could just fix the examples as I went. So far I have found the book really engaging and useful, and I am very keen to read the rest.