Wikidata:WikiProject sum of all paintings/Automated image uploads

From Wikidata
Jump to navigation Jump to search

This page describes how automated image uploads work as part of the sum of all paintings and how you can help.

Background[edit]

In the past uploading to Commons and adding items to Wikidata were two distinct things. If you wanted to have an illustrated collection of paintings here, you had to work on every painting twice: Create an item here on Wikidata and upload a file to Commons. Trying to do the two things at the same time made the process to complicated, so it got split up in two parts:

  1. Import all the needed data to Wikidata
  2. When the data is complete enough, upload the image to Commons

To store the needed metadata, Commons compatible image available at URL (P4765) was created. The uploading to Commons part is automated and runs twice a day.

Metadata to add[edit]

To make the upload bot function, this metadata should be added. These statement don't have to be added all at once.

Property Status Description
Commons compatible image available at URL (P4765) Mandatory This statement should contain a deeplink to a file that can be uploaded to Commons. It shouldn't contain a link to a page having an image on it
file format (P2701) (qualifier to P4765) Mandatory Link to a valid file format item like JPEG (Q2195). This is used to construct the file name on Commons.
URL (P2699) (qualifier to P4765) Mandatory Link to the page where the file was found. This is used to source the file on Commons.
title (P1476) (qualifier to P4765) Mandatory The title of the painting. This is used to construct the file name on Commons.
author name string (P2093) (qualifier to P4765) Mandatory The name of the painter. This is used to construct the file name on Commons. Only optional in case of an anoymous work.
operator (P137) (qualifier to P4765) Optional Link to the item of the source website. This is used to source the file on Commons.
copyright license (P275) (qualifier to P4765) Optional Link to the item of a valid Commons license like Creative Commons CC0 License (Q6938433). This will trigger the usage of the Licensed-PD-Art template.
data size (P3575) (qualifier to P4765) Optional The size of the file in bytes. Currently not used.
creator (P170) Mandatory Link to a valid painter like Vincent van Gogh (Q5582). This is used to determine the copyright status of the file to be uploaded to Commons.
inception (P571) Optional Date when the painting was made. In case the painter is anonymous (Q4233718) or died after 1925, this is needed to determine the copyright status of the file to be uploaded to Commons.
instance of (P31) Mandatory Currently only instances of painting (Q3305213) are uploaded. Might be expanded later.
collection (P195) Mandatory Link to the collection the painting is from. This is used to construct the file name on Commons.
inventory number (P217) Mandatory The inventory number in this collection. This is used to construct the file name on Commons.
copyright status (P6216) Optional Link to a valid copyright status like public domain (Q19652). Should have applies to jurisdiction (P1001) and determination method (P459) as qualifiers.

In the future the required statements might change to make the bot more flexible.

Example[edit]

Commons compatible image available at URL

(P4765)

https://ids.si.edu/ids/download?id=HMSG-66.2404.jpg
file format

(P2701)

JPEG (Q2195)
URL

(P2699)

https://www.si.edu/object/hmsg_66.2404
author name string

(P2093)

Childe Hassam
title

(P1476)

Isle Of Shoals (English)
operator

(P137)

Smithsonian Institution (Q131626)
copyright license

(P275)

CC0 (Q6938433)

Quickstatements examples[edit]

Example Version 1 Quickstatements for the above:

Q106632929|P4765|"https://ids.si.edu/ids/download?id=HMSG-66.2404.jpg"

Q106632929|P2701|Q2195

Q106632929|P2699|"https://www.si.edu/object/hmsg_66.2404"

Q106632929|P2093|"Childe Hassam"

Q106632929|P1476|en:"Isle Of Shoals"

Q106632929|P275|Q6938433

Example Version 2 Quickstatements, CSV format:

qid,P4765,qal2701,qal2699,qal2093,qal1476,qal275

Q106632929,"https://ids.si.edu/ids/download?id=HMSG-66.2404.jpg",Q2195,"https://www.si.edu/object/hmsg_66.2404","Childe Hassam",en:"Isle Of Shoals",Q6938433

Uploading[edit]

A bot runs twice a day that has two parts:

  1. Find files which are free to upload to Commons
  2. For each file, try to upload it

Find files to upload[edit]

The bot uses several SPARQL queries to find files to upload to Commons. These are based on the principles of the Hirtle chart and stay on the conservative side: Only upload files for which we're certain these are free to upload as PD-Art. A painting has to be in the public domain in both the US and the source country. To spread the uploads in time and to prevent abuse, the bot will only consider items that haven't been edited for at least three days and for which the creator (P170) hasn't been edited for three days either.

Painter died more than 95 years ago[edit]

If the painter died more than 95 years ago, the painting is in the public domain in the US (published more than 95 years ago) and outside the US (painter died more than 95 years ago). Query

Anonymous works before 1890[edit]

If the painting is from 1890 or earlier, the painting is in the public domain in he US (more than the 120 years required for unpublished works) and outside of the US (usually 70 years after publication, at least a lot shorter than US). The 10 years extra is just a safety margin. Query

Painter died between 95 and 70 years ago and painting at least 95 years old[edit]

If the painter died between 95 and 70 years ago (more than 95 years ago gets covered by earlier case), the painting is in the public domain outside of the US. It has to be published at least 95 years ago to also be public domain inside of the US. Query

Painting is marked as public domain with 100 years pma[edit]

Someone else already did the copyright assessment and set copyright status (P6216) to public domain (Q19652), qualified with applies to jurisdiction (P1001) -> countries with 100 years pma or shorter (Q60332278) & determination method (P459) -> 100 years or more after author(s) death (Q29940705). This is a subset of the first case and won't be reached often in practice. Query

Upload individual file[edit]

For each file the bot will try to upload it. If something goes wrong, the bot will skip it and go to the next file. The bot will first download the file and check based on the SHA1 hash if it hasn't been uploaded already. If it's already on Commons, it will just try to add that file to the Wikidata item.

The title is constructed in the format <creatorname> - <title> - <inventory number> - <collectionLabel>.<extension> so if any of this metadata is missing, the file won't be considered for upload.

The wikitext for the upload is an empty Artwork template which will show the data from Wikidata, either the PD-Art template or the Licensed-PD-Art template (depending if copyright license (P275) is provided) and categories for the painter and the collection.

This title and wikitext is used to upload the file. When the upload is successful, the image (P18) statement will be added to the file and the Commons compatible image available at URL (P4765) will be removed. On Commons source of file (P7482) and digital representation of (P6243) (with a link to the item) gets added to fill the artwork template.

Help add missing metadata[edit]

Quite a few files are queued to be uploaded (all queued paintings). Some files will be just uploaded in the next couple of days, but quite a few files are missing information for the bot to be able to upload the file.

Report Description
Missing creator and missing inception For these paintings both creator (P170) and inception (P571) are missing. Both should be added, but just adding creator (P170) might be enough to trigger uploading.
Known creator missing inception The creator is known, but either the date of death (P570) is missing or the date of death was more recent than 95 years ago and the inception (P571) needs to be added.
Anonymous missing inception For these paintings the creator is anonymous (Q4233718) so the inception (P571) needs to be added.
Missing creator and recent inception The inception (P571) is after 1850 and creator (P170) is missing. The painter should be added.
Creator missing date of death The creator (P170) is known, but the painter doesn't have date of death (P570).
Missing inception Just a very large report of missing inception (P571). Previous reports are easier.

The constraint violations report and the complex constraint violations report also contain a lot of items in need of various improvements. Fixing these is generally complex and a lot of work.

Possible improvements[edit]