Wikidata:Property proposal/in sense

From Wikidata
Jump to navigation Jump to search

in sense[edit]

Originally proposed at Wikidata:Property proposal/Lexemes

   Withdrawn
Descriptionqualifier for lexeme statements that apply to some senses but not others
Data typeSense
Domainlexeme
Example 1džús (L402822): instance of (P31): mass noun (Q489168) → L402822-S1
Example 2džús (L402822): instance of (P31): count noun (Q1520033) → L402822-S2
Example 3bežec (L409973): grammatical gender (P5185) = masculine personal (Q27918551) → L409973-S1
Example 4ucho (L299083): L299083-F7 → L299083-S1 (after merge with ucho (L249402))
Planned use400+ Slovak lexemes with ambiguous massness, 50+ Slovak lexemes with ambiguous gender, potentially millions of uses across all languages
See alsosubject sense (P6072), object sense (P5980)

Motivation[edit]

The idea for this property originally came up when discussing noun massness. It turns out such qualifier would be useful on several statements:

  1. Some nouns are countable in some senses and uncountable in others. See water in Wiktionary for an example. To represent sense-dependent massness, Wikidata lexeme would have two instance of (P31) statements, one for mass noun (Q489168) and one for count noun (Q1520033), both of which would have one or more "in sense" qualifiers.
  2. Verbs can be transitive or intransitive depending on sense. This is currently modeled by adding grammatical aspect (P7486): transitive verb (Q1774805)/intransitive verb (Q1166153) statements to the lexeme, which could be disambiguated with "in sense" qualifiers like massness above.
  3. Some nouns have sense-dependent gender. For example, bežec in Slovak can be a running person, a running animal, or a moving part of a machine. These three senses have genders masculine personal (Q27918551), masculine animate non-personal (Q52943193), and masculine inanimate (Q52943434) respectively. At the moment, every sense gets its own "homograph" lexeme, but such lexemes are not true homographs, because they share etymology and the senses are closely related. With "in sense" qualifier on grammatical gender (P5185), they could be merged into single lexeme.

The following cases are NOT considered valid applications:

  • Usage examples have their own subject sense (P6072) property.
  • Etymology differences create true homographs. True homographs should not be merged.

Alternatives:

  • Aggressively splitting lexemes works, but it creates false "homographs" and it makes the data harder to understand for both users and editors.
  • Statements can be placed on sense level too, but adding instance of (P31): mass noun (Q489168) statement to a sense feels wrong, because the sense is not an instance of mass noun, the lexeme is.

Robert Važan (talk) 11:37, 20 May 2021 (UTC)[reply]

May 26 update: The property would be also used on sense-specific forms. — Robert Važan (talk) 14:50, 26 May 2021 (UTC)[reply]

June 5 update: I am retracting the May 26 change. Use of this property on forms is problematic and requires separate discussion. — Robert Važan (talk) 09:28, 5 June 2021 (UTC)[reply]

Discussion[edit]

  •  Support Nice proposal, this definitely solves an issue with our current data model for lexemes. ArthurPSmith (talk) 17:05, 20 May 2021 (UTC)[reply]
  •  Comment @Fnielsen, VIGNERON: I am requesting your comment since you participated in the original discussion. Thanks. — Robert Važan (talk) 10:20, 26 May 2021 (UTC)[reply]
  •  Comment This seems to be a good proposal. In Danish (Q9035) where have the special lexeme øl (L39743) that has two sense "a beer" and "some beer" where the grammatical gender is different and where such a proposal would be wanted. Currently, the P31 is used. In this case, we also seems to miss a connection between form and senses. So either the form should have a link to the sense, or the sense should have a link to the sense. For the first case the "in sense" proposal could be used if the scope is expanded beyond "just" being a qualifier on lexeme level? Perhaps the name should be changed to something like "relates to sense"? — Finn Årup Nielsen (fnielsen) (talk) 11:59, 26 May 2021 (UTC)[reply]
    @Fnielsen: I have encountered similar case in Slovak: ucho (L249402) and ucho (L299083) could be merged with "in sense" on the differing forms. So yes, I would be in favor of using the property on forms too. Unless dedicated property would be more preferable to allow for tighter constraints on both properties. — Robert Važan (talk) 14:40, 26 May 2021 (UTC)[reply]
    @Fnielsen: Naming the property "relates to sense" would encourage its use to express soft statements like "usually in sense". Wording "in sense" implies hard association in line with translation (P5972) and synonym (P5973). — Robert Važan (talk) 14:43, 26 May 2021 (UTC)[reply]
    @Fnielsen: I have meantime found several issues with using "in sense" on forms. Annotation of sense-specific forms requires its own discussion and most likely separate property. Briefly, "in sense" on forms is often better expressed via additional grammatical features (e.g. gender, massness), "in sense" on related lexeme statements (e.g. singulare/plurale tantum), or possible future form grouping (via shared stem or otherwise). Having separate property for forms will also allow tighter property constraints on both properties. — Robert Važan (talk) 09:28, 5 June 2021 (UTC)[reply]
  •  Comment I am withdrawing this proposal, because it is now a duplicate of the generalized subject sense (P6072). — Robert Važan (talk) 15:51, 26 June 2021 (UTC)[reply]