Wikidata talk:Requests for comment/Use of dates in the descriptions of items regarding humans

From Wikidata
Jump to navigation Jump to search

Current state of description[edit]

Hi,

@Epìdosis, Emu, ArthurPSmith, Pigsonthewing, JAn Dudík, PKM: do we have any idea what is the current state of use of date in description ? also on the data side, do we have any idea of the number of homonyms, and homonyms with same occupation and same citizenship ? and other weird cases (I stumble upon Henri Tauzin (Q713804) Henri Tauzin (Q104033248), same name, same citizenship, same birthyear, same deathyear, different occupations). I tried to do SPARQL queries but I guess we need a different tool (SQL Quarry ?).

Cheers, VIGNERON (talk) 18:24, 3 June 2022 (UTC)[reply]

Well I just did a "random sample" - I was looking for people named "Samuel Edwards" - search "Samuel Edwards haswbstatement:P31=Q5" and there are 14 in Wikidata right now with something approximating that name (at least that show up with that search). Of those 14, 13 had descriptions, and of those 13 there were 9 with some sort of date in their description. So about 2/3 of descriptions of people with homonym names might have dates in their descriptions? But I think that would depend a lot on the cultural background of the name - I don't think it's anywhere near that level for homonym Chinese names for instance. ArthurPSmith (talk) 19:19, 3 June 2022 (UTC)[reply]
@Epìdosis, Emu, ArthurPSmith, Pigsonthewing, JAn Dudík, PKM: Answering to myself: there seems to be *a lot* of homonyms. This SPARQL query (on QLever): https://qlever.cs.uni-freiburg.de/wikidata/05MBmw gives more than 1 million shared name, with some names shared by a lot of people, more than 3,364 persons are called Wang Shi (and the infamous "John Smith" by "only" 325 people).
Also a recent discussion (with @Cinéma-1930: made me realise that in most languages, dates are still unusual (and presumably limited to homonyms and special case), like French there is around 1-2 % of the item about people with a date in the description. But there is around 30 % for English ! (see this very crude query https://qlever.cs.uni-freiburg.de/wikidata/oD0ZC6 that probably need improving)
Should we try to move on with this RFC? I'm still unsure what should be the preferred solution but the more time goes, the more « a loss of time which could be used in a better way » (to quote Epìdosis) so any clear answer would be better than nothing I guess. At the very least, could we agree on some basic trade-off (something like "dates could be XXX, there is no consensus, proceed with care" that could be added to Help:Description).
Cheers, VIGNERON (talk) 15:52, 21 September 2023 (UTC)[reply]
Since the "multilingual descriptions on Wikidata based on the items' statements" which I mentioned in the RfC are still very far in the future (the basic use of P31 in easy cases like disambiguation pages isn't there yet, phab:T303677), of course I very much agree with the point of @VIGNERON:: as I initially stated when I opened the RfC, some prudent rule is better than no rule, because having some rule could both reduce a bit the significant disomogeneity of our items and also avoid some unpleasant edit wars on adding or removing dates. But of course, we need a minimum of consensus, e.g. on the fact that dates are fine for homonyms and in these cases cannot be removed, or something similar. BTW, indagating homonyms wasn't possible through our Query Service, the possibility of using QLever's one to indagate them opens some interesting possibilities for the future, in view of data cleaning etc. --Epìdosis 16:02, 21 September 2023 (UTC)[reply]
Nice use of QLever @VIGNERON:! @Epìdosis: I think this RFC could be closed based on responses we have - on the questions listed here is how I would assess consensus: (1) No hard rule across all Wikidata, (2) If a particular language community wants to set a rule for their language nobody seems to see a problem with that, (3) Batch editing to add or remove dates should be avoided; dates (not necessarily birth/death, can be just approximate) can be added when they help to disambiguate items; dates should not be removed from descriptions even if not needed for disambiguation (unless there is some other reason for the change; for example if the dates are incorrect), (4) How (optional) dates are shown in descriptions will depend on language; for English it seems reasonable to follow the enwiki guidelines - dates at the end, formatted with parentheses for birth-death years, and using words like "from start to finish" for dates referencing an office or other date or range of dates. ArthurPSmith (talk) 16:16, 21 September 2023 (UTC)[reply]
I agree with @ArthurPSmith's proposals above, and I will add that if a language community establishes consensus for guidelines, then they should clearly document those guidelines in a standardized place, so that multilingual editors and all others can easily access and understand them. Elizium23 (talk) 16:57, 21 September 2023 (UTC)[reply]
Seems to be a fair assessment of current consensus. --Emu (talk) 19:14, 21 September 2023 (UTC)[reply]
This seems like a good way forward. Thanks for the efforts to summarize. PKM (talk) 01:04, 24 September 2023 (UTC)[reply]