RDF on RDF: RDF Schema

Moving on from my previous post about RDF, I felt the appropriate next step would be to explore RDF Schema. One assumption I had about RDF Schema (often abbreviated as RDFS though I'm going to continue spelling it out like this because my brain keeps expanding "DFS" to "distributed file system") was that it would enable some kind of validation. That's what I'd generally understood as the purpose of a schema. I'd like this post to document what I thought I was going to learn and what I think I stumbled into instead.

my assumptions

I knew a few things about RDF that I had lumped into the "RDF Schema" category. One is that triples can annotate terms with datatypes and that some of those datatypes are RDF-specific while others are borrowed from XML Schema. Another is that, somehow, RDF graphs can be "validated" in some way. Like I mentioned above, I heard "schema" and "validation" and assumed that RDF Schema was the mechanism by which validation is enabled. I knew, kind of, that RDF Schema represented its own kind of graph to describe RDF itself. I also knew that it was vaguely object-oriented-shaped: there's a notion of classes and properties and subclasses. Terms can have a "type" represented by the a shorthand predicate I mentioned in my last post.

I also knew about two other acronyms: OWL and SHACL. I didn't really know much about those, especially SHACL, so it was hard for me to relate them to RDF and RDF Schema in a concrete way. My assumption about OWL was that it was responsible for taking two graphs and linking them somehow. While RDF and RDF Schema could describe an internally-consistent graph, OWL would come in and specify how two such graphs could be merged in some way.

In order to dispel my confusion, I wanted to test a few things. The two big questions were: how do I add RDF Schema statements to my graph and how do I check my data against my schema? To answer these questions, I turned once again to Ontotext's GraphDB.

getting more confused

So it turns out that adding RDF Schema statements to a graph is the exact same as adding regular RDF triples. For instance, the RDF Schema namespace contains a predicate called domain. A triple of the form A domain B means that any statements with A as the predicate should have a subject of type B. This seems like a way to add something akin to type definitions to our graph, so I tried updating the little "knowledge graph about knowledge graphs" from my last post to add a domain constraint to one of the predicates. Here's what that query looked like:

prefix m: <my:>
prefix rdfs: <https://www.w3.org/2000/01/rdf-schema#>
insert data {
    graph m:blog-post {
    	# m:knowledge-graph m:uses m:RDF.
        m:uses rdfs:domain rdfs:literal.
    }
}

The original triple is commented out and precedes the actual statement I'm inserting. To try and get some interesting errors (and because I was too lazy to type out the real types of things in the graph) I intentionally set the domain for the uses predicate to be inconsistent with the data already in the graph. The insert statement ran without errors. "Ah, okay" I thought to myself. "Maybe this is a kind of 'permissive import' situation and data is validated somewhere else." I tried querying for statements that include uses as a predicate, and that query ran successfully. Finally, I asked the graph if our rule was ever actually respected in the data:

PREFIX m: <my:>
prefix rdfs: <https://www.w3.org/2000/01/rdf-schema#>
ask {
    m:knowledge-graph a rdfs:literal .
} limit 100

This, unfortunately, responds with NO. Our data doesn't match the schema. The graph knows this and will tell us so. I then looked up terms like "RDFS validation" or "knowledge graph consistency checking", but these searches inevitably led to OWL and SHACL, two things I was specifically not looking into right now. This search also led to a group of software referred to as "reasoners" that would check a graph for consistency, but also only against an OWL and/or SHACL specification. Whatever problems RDF Schema solves, validation doesn't seem to be one of them. While trying to find a home for all things "RDF graph validation" in my memory palace, an understanding slowly showed up.

what I think RDF Schema enables

Among some of the "this isn't what RDF Schema is for" Stack Overflow answers, I think I settled on a decent mental model for now. RDF is about representing data as triples. RDF Schema is something called a semantic extension to RDF. I initially translated "semantic extension" as "something I don't understand", but now I'm settling on RDF Schema as being fundamentally RDF, fundamentally about data representation. OWL and SHACL are, I think, for the more logistical "how do I work with this stuff in the real world", and I'll certainly try to get a post about those together as soon as I understand them. For now, what I want to focus on is a specific predicate in RDF Schema: reifies.

The RDF primer does not gradually introduce its abstractions. When introducing the RDF data model, it starts with the humble triple before talking about what triples can be made of. It took me a minute to fit IRIs and blank nodes in my mental model of graphs, but ultimately I think I got there. However, right after introducing blank nodes the primer introduces reifiers. The primer uses as an example ones interest in the Mona Lisa. The subject is a made-up person named Bob, the object is the painting, and the predicate is "is interested in". The "reifier" is something that will take us out of the general "Bob is interested in the Mona Lisa" and into the domain of talking about that interest specifically. We're not talking about the idea of being interested in something, nor are talking about all of Bob's interests or everyone's interest in the Mona Lisa. We're talking about the triple "Bob is interested in the Mona Lisa".

This, I think, is the key to understanding RDF Schema's purpose in all this. It enables a single graph to describe things on different levels of abstraction. Initially I thought RDF was a Framework to Describe Resources, but what I didn't appreciate until now was how that Framework was itself a Resource to be Described.

I don't like how meta things are getting after even this one step up, but hopefully some more concrete examples will help us wade through the waters of OWL without getting set adrift on that dreaded ocean of abstraction.