
From Wikidata
Jump to navigation Jump to search

There are about 13.6 billion triples on Wikidata. But what even *is* a triple, you might ask?

The best way to see how stuff on Wikidata is mapped to triples might be to show how many are added or removed when certain things are done:

Additions of triples[edit]

How many triples are added...

Scenario Example triples Count added
...when you add a label? wd:Q42 rdfs:label "Douglas Adams"@en. 1
...when you add a description? wd:Q42 schema:description "ব্রিটিশ লেখক"@bn. 1
...when you add an alias? wd:Q42 skos:altLabel "डग्लस अ‍डम्स"@hi. 1
...when you add a statement? wd:Q42 p:P31 wds:Q42-ABCDEF01-ABCD-ABCD-ABCD-ABCDEF012345
wds:Q42-ABCDEF01-ABCD-ABCD-ABCD-ABCDEF012345 wikibase:rank wikibase:NormalRank
2 (minimum)
...and that statement has an item[1] as a value? wds:Q42-ABCDEF01-ABCD-ABCD-ABCD-ABCDEF012345 ps:P31 wd:Q5 (2 +) 1
...and that statement has a string[2] as a value? wds:Q42-ABCDEF01-ABCD-ABCD-ABCD-ABCDEF012345 ps:P1559 "Douglas Noël Adams"@en . (2 +) 1
...and that statement has an external ID as a value? wds:Q42-ABCDEF01-ABCD-ABCD-ABCD-ABCDEF012345 ps:P214 "113230702" (2 +) 1
...and that statement has an external ID as a value
and that external ID has a formatter URL?
wds:Q42-ABCDEF01-ABCD-ABCD-ABCD-ABCDEF012345 psn:P214 "" ((2 +) 1 +) 1
...and that statement has a coordinate as a value? wds:Q42-ABCDEF01-ABCD-ABCD-ABCD-ABCDEF012345 ps:P119 "Point(0.0 50.0)"^^geo:wktLiteral
wds:Q42-ABCDEF01-ABCD-ABCD-ABCD-ABCDEF012345 psv:P119 v:a10564107110b2d5739b8fe235cddf73
v:a10564107110b2d5739b8fe235cddf73 a wikibase:GlobecoordinateValue
v:a10564107110b2d5739b8fe235cddf73 wikibase:geoLatitude "50.0"^^xsd:double
v:a10564107110b2d5739b8fe235cddf73 wikibase:geoLongitude "0.0"^^xsd:double
v:a10564107110b2d5739b8fe235cddf73 wikibase:geoPrecision "0.000277778"^^xsd:double
v:a10564107110b2d5739b8fe235cddf73 wikibase:geoGlobe wd:Q2
(2 +) 7
...and that statement has a quantity as a value? wds:Q42-ABCDEF01-ABCD-ABCD-ABCD-ABCDEF012345 ps:P2048 "+1.96"^^xsd:decimal
wds:Q42-ABCDEF01-ABCD-ABCD-ABCD-ABCDEF012345 psv:P119 v:a10564107110b2d5739b8fe235cddf73
v:a10564107110b2d5739b8fe235cddf73 a wikibase:QuantityValue
v:a10564107110b2d5739b8fe235cddf73 wikibase:quantityAmount "+1.96"^^xsd:decimal
v:a10564107110b2d5739b8fe235cddf73 wikibase:quantityUnit wd:Q11573
(2 +) 5
...and that statement has a quantity as a value
and that quantity isn't exact?
v:a10564107110b2d5739b8fe235cddf73 wikibase:quantityUpperBound "+1.97"^^xsd:decimal
v:a10564107110b2d5739b8fe235cddf73 wikibase:quantityLowerBound "+1.95"^^xsd:decimal
((2 +) 5 +) 2
...and that statement has a quantity as a value
and that quantity isn't exact
and the units of that quantity
can be expressed in some normalized form?
(e.g. furlongs → meters, stone → kilograms)
v:a10564107110b2d5739b8fe235cddf73 wikibase:quantityNormalized v:85374998f22bda54efb44a5617d76e51
v:85374998f22bda54efb44a5617d76e51 a wikibase:QuantityValue
v:85374998f22bda54efb44a5617d76e51 wikibase:quantityAmount "+1.96"^^xsd:decimal
v:85374998f22bda54efb44a5617d76e51 wikibase:quantityUnit wd:Q11573
v:85374998f22bda54efb44a5617d76e51 wikibase:quantityUpperBound "+1.97"^^xsd:decimal
v:85374998f22bda54efb44a5617d76e51 wikibase:quantityLowerBound "+1.95"^^xsd:decimal
(((2 +) 5 +) 2 +) 4 + 2
...and that statement has a time as a value? wds:Q42-ABCDEF01-ABCD-ABCD-ABCD-ABCDEF012345 ps:P569 "+1952-03-11T00:00:00Z/11"^^xsd:dateTime
wds:Q42-ABCDEF01-ABCD-ABCD-ABCD-ABCDEF012345 psv:P569 v:a10564107110b2d5739b8fe235cddf73
v:a10564107110b2d5739b8fe235cddf73 a wikibase:Time
v:a10564107110b2d5739b8fe235cddf73 wikibase:timeValue "+1948-04-12T00:00:00Z"^^xsd:dateTime
v:a10564107110b2d5739b8fe235cddf73 wikibase:timePrecision "11"^^xsd:integer
v:a10564107110b2d5739b8fe235cddf73 wikibase:timeTimezone "0"^^xsd:integer
v:a10564107110b2d5739b8fe235cddf73 wikibase:timeCalendarModel wd:Q1985727
(2 +) 7
...and that statement has somevalue (unknown value) as its value? wds:Q42-ABCDEF01-ABCD-ABCD-ABCD-ABCDEF012345 p:P2021 _:genid1 (2 +) 1
...and that statement has novalue as its value? wd:Q42 a wdno:P6553 (2 +) 1
...and that statement is "truthy"
(is preferred, or is normal in the absence
of preferred statements for that property)?
wd:Q42 wdt:P31 wd:Q5
wds:Q42-ABCDEF01-ABCD-ABCD-ABCD-ABCDEF012345 a wikibase:BestRank
(2 +) 2
...and the "truthy" statement is an external ID with a formatter URL? wd:Q42 wdtn:P214 "" ((((2 +) 1 +) 1 +) 2 +) 1
...when you add a qualifier? wds:Q42-ABCDEF01-ABCD-ABCD-ABCD-ABCDEF012345 pq:P407 wd:Q1860 1
...and that qualifier has X as its value? (see "...and that statement has X as its value?", substituting (2 +) with (1 +))
...when you add a reference? wds:Q3-24bf3704-4c5d-083a-9b59-1881f82b6b37 prov:wasDerivedFrom wdref:87d0dc1c7847f19ac0f19be978015dfb202cf59a 1
...and that reference has a property with X as its value? (see "...and that statement has X as its value?", substituting (2 +) with (1 +))
...when you add a sitelink? <> a schema:Article
<> schema:about wd:Q3
<> schema:inLanguage "en"
<> schema:isPartOf <>
<> schema:name "Douglas Adams"@en
...and that sitelink has a badge? <> wikibase:badge wd:Q17437796 (5 +) 1

Net-zero triple changes[edit]

In some of the rows of the table above, there were some italicized triples. These are special because the value of the statement—more specifically, the set of predicate-object pairs among the italicized statements—is hashed to generate a unique string (a10564107110b2d5739b8fe235cddf73 in most of the examples above) which is then tied to all of those triples by setting it as their subject. Since this string is a hash, any other values of the same type (on statements, qualifiers, claims in references, normalized quantities) whose corresponding triple sets yield the same hash are not stored again, and instead the hash is merely linked to the statement/qualifier/claim in reference (the triples with "psv:" in them). As long as some statement/qualifier/claim in reference/normalized quantity has that exact value, there will be exactly one set of triples for it in the store.

More concretely, consider the date

  • "20 January 2009" stored with day precision using the proleptic Gregorian calendar and time zone "0"

Every use of that exact date configuration, whether as the point in time (P585) of the inauguration of Barack Obama, the start time (P580) qualifier of Barack Obama's position held (P39) President of the United States (Q11696) claim, or the retrieved (P813) date of some reference, will point to the same hash.

That date's hash will be different from that of

  • "19 January 2009" with day precision using the proleptic Gregorian calendar and time zone "0",
  • "20 January 2009" with year precision using the proleptic Gregorian calendar and time zone "0",
  • "20 January 2009" with day precision using the proleptic Julian calendar and time zone "0", and
  • "20 January 2009" with day precision using the proleptic Gregorian calendar and time zone "1"

(bearing in mind that it's currently impossible to change the time zone).

Removing the P585 date mentioned above without changing anything else would remove 3 triples (the ones listed for "...when you add a statement?" and the non-italicized triple for "...and that statement has a time as a value?"), and conversely adding the same date as P585 to some other item without changing anything else would add 3 triples. On the other hand, if ("19 January 2009" with day precision using the proleptic Gregorian calendar and time zone "0") was not already stored in Wikidata somewhere, then adding that as a P585 statement would add 2 + 7 = 9 triples.

A similar analysis to the above holds for references, where the triples for all property-value pairs in the reference are hashed to yield the object of the "prov:wasDerivedFrom" triples.

Removals of triples[edit]

Notwithstanding the conditions in "Net-zero triple changes" above, removing one of the objects mentioned in "Additions of triples" above will remove that many triples from the store.

There is one special case, however, which has not been dealt with above: when merging an item X into an item Y, a triple of the form "wd:X owl:sameAs wd:Y" is created, in addition to any triple removals due to equivalences or supersessions (which will differ depending on what tool you use to merge things and the settings for that tool).

Implications for future removals[edit]

As of ~21:00 UTC, 21 January 2022,


(Thanks to mw:Wikibase/Indexing/RDF_Dump_Format for being around when we need it!)

  1. includes properties, lexemes, forms, and senses
  2. includes monolingual text, Commons file links, other URLs, LaTeX snippets, LilyPond snippets, and some others