TIL: Wikidata SPARQL trick - getting item and subclasses
If you are using the Wikidata Query Service to see how data is structured in Wikidata, one frequent query you might want to do is as follows.
Count the number of items which are an instance of a subclass of X, or an instance of X itself.
This is useful as you can see roughly the structure of how objects are classified.
The following query answers half of the query above (replace THING
with the
item you’re interested in): count the number of items which are an instance of
subclass of X.
SELECT DISTINCT ?category ?categoryLabel (COUNT (DISTINCT ?item) AS ?count) WHERE {
?category wdt:P279 wd:THING .
?item wdt:P31 ?category .
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
GROUP BY ?category ?categoryLabel
ORDER BY DESC(?count)
LIMIT 100
But we want to bind ?category
to include the thing itself as well as the
subclasses. Barber paradox? Who gives a damn?
An easy but hacky way of binding ?category
to the thing itself? UNION
it
together with a sitelink.
SELECT DISTINCT ?category ?categoryLabel (COUNT (DISTINCT ?item) AS ?count) WHERE {
{ ?category wdt:P279 wd:THING . }
UNION
{ <https://en.wikipedia.org/wiki/ARTICLE_ABOUT_THING> schema:about ?category . }
?item wdt:P31 ?category .
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
GROUP BY ?category ?categoryLabel
ORDER BY DESC(?count)
LIMIT 100
Some helpful person might go and change the name of the Wikipedia article about the thing, so these kinds of queries might break. (But they might go edit Wikidata too. C’est la vie.) You could always find another statement where the object of the statement uniquely picks the thing out.
Alternatively, you could use BIND
and VALUES
and subqueries and CONSTRUCT
but I would suggest this method has significantly lower cognitive overhead.