Five Rules of IRIs

From Tripletalk

Jump to: navigation, search

These "rules" are the result of a mail exchange on the semantic web list (archived at http://mailman.few.vu.nl/mailman/private/sw-meetings/2015-September/001737.html) These rules were introduced by Jacco in summary to the discussion as a whole and extended by Tobias. Feel free to edit.

  • rule 1) mint humanly meaningful, easy to remember IRIs, whenever possible. Your data users will appreciate it.
  • rule 2) if you care about cool IRIs that do not change (most of us do, unless you are in the throw-away-prototyping business), you need to do an assessment (for each IRI you mint!) of the risk that you will be forced to change the meaningful part in the near future for external reasons (change of spelling, syntax, semantics, pragmatics). Such an assessment is not a trivial task, and probably not something that is explained in your first order logic text book. For some IRIs, you might come to the conclusion that any meaningful IRI you would mint now, will not stand that test of time (for the dutch speakers: cornetto:pannekoek-noun-1 or cornetto:pannenkoek-noun-1? cornetto:arbeidsmigrant-noun-1 or cornetto:gastarbeider-noun-1?). For such concepts, there could be good arguments to violate rule 1.
  • rule 3) for really huge efforts such as wikidata, doing these assessments for millions of concepts and properties _may_ be too costly, and _could_ be a reason to simply violate rule 1 for _all_ concepts (I do not like it, but would understand this position). For cornetto, I did go for the meaningful names approach, and because the dataset is too big to pick them manually, I minted IRIs based on a more or less random choice among the available synonyms. This worked well for many synsets, but really unfortunate for some others. I would not this again, and I recently noticed Princeton came to the same conclusion and went for the opaque approach in WN3.1 as well: http://wordnet-rdf.princeton.edu/wn31/100001740-n Again, I can see the disadvantages from both angles. It is really a hard problem, not just for wikidata.
  • rule 4) I guess that by "meaningful IRI” you implicitly assume “meaningful in English”. I am fine with that, because personally, I see this as a pragmatic, and not a political choice. But I know many people have different opinions about language choices, and I fully understand that projects such as wikidata need to take these opinions into account. Most meaningful IRIs will be language-biased, so this can, for some projects, also be a good reason to violate rule 1 for all IRIs.
  • rule 5): Source code (be it in RDF, Java, etc.) should be easy to read by developers no matter what, even without tool support. For undescribtive URIs in RDF, this can be achieved, for example, by defining prefix shorthands in Turtle.

So yes, data publishers have sometimes good reasons to prefer opaque IRIs, and yes, data consumers have good reasons to prefer meaningful, easy to remember IRIs. To me, this is a problem for which we should develop workable technical solutions, not a problem we should address in terms of who is wrong and who is right.

Personal tools