Wikidata talk:WikiProject 20th Century Press Archives

From Wikidata
Jump to navigation Jump to search

English Wikipedia's WikiProject Newspapers[edit]

Although this project is based in English Wikipedia, it contributes content to Wikidata and shares many goals with the project here.

Blue Rasberry (talk) 17:07, 2 July 2019 (UTC)[reply]

Persons archive completed: data donation of ZBW to Wikidata[edit]

Jneubert (talk) 08:13, 7 March 2019 (UTC) Maxwan (talk) 15:00, 7 March 2019 (UTC) YULdigitalpreservation (talk) 13:12, 8 March 2019 (UTC) Mfchris84 (talk) 14:59, 28 May 2020 (UTC)[reply]

Notified participants of WikiProject 20th Century Press Archives

As a nice present to Wikidata's 7th birthday, all of the 5266 persons from the 20th Century Press Archives has been linked to Wikidata (with the help of more than 60 Mix-n-match users). As far as possible, metadata from PM20 has been added to linked Wikidata items, which resulted in more than 6,000 new statements. A joint press release by ZBW - Leibniz Information Centre for Economics (Q317179) and Wikimedia Deutschland (Q8288) announces the donation, and an article in ZBW Labs gives details about the process. --Jneubert (talk) 16:58, 29 October 2019 (UTC)[reply]

Integration of the country/subject archives (Länder/Sacharchiv) into Wikidata[edit]

Jneubert (talk) 08:13, 7 March 2019 (UTC) Maxwan (talk) 15:00, 7 March 2019 (UTC) YULdigitalpreservation (talk) 13:12, 8 March 2019 (UTC) Mfchris84 (talk) 14:59, 28 May 2020 (UTC)[reply]

Notified participants of WikiProject 20th Century Press Archives

Detail of the old category system

The large country/subject archives (> 280.000 articles in 9000 digital folders) is the next part of ZBW's data donation which is to be integrated into Wikidata through it's metadata. It's structure is much more challenging than the persons' archives, because each folder is defined by a "country" and a subject category. The subject categories of PM20 form a classification system on it's own. Crafted as containers for collecting a wide variety of clippings, they have no "real world entities" as counterparts, and thus - exceptions aside - no Wikidata items.

For the draft in the data strcuture section of this project a few illustrative items had been created - e.g. Germany : Individual diseases and their control (Q91257808). Points to note:

  • A few external-id proterties are needed to implement the scheme (described here). For illustration and exploration in advance, these properties are mocked up in the example items temporarily with existing properties, combined with an URL qualifier.
  • location (P276) is used to connect the folder to the "country" category (not using country (P17), because it may also represent a continent, a city, such as "Hamburg", or a geographic region, such as "Northern Europe")
  • facet of (P1269) is used to connect the folder to the subject category.
  • For the subject category system of PM20, new items have to be created, such as Individual diseases and their control (Q92707235) (signature e1). The hierarchy in the category system is implemented via part of (P361)/has part(s) (P527).
  • Several millions of documents of the country/subject archives have not been published due to intellectual property rights issues. The draft also tries to provide structures for the integration of these parts of the archives later on.

The details of the proposal are given on the data structures page. It has been created with lots of input from User:maxwan - shortcomings and errors are mine.

Feedback on the model as a whole as well as on it's details is highly welcome. Please use Wikidata talk:WikiProject 20th Century Press Archives/Data structure for discussion. Jneubert (talk) 13:38, 30 May 2020 (UTC)[reply]

Property Proposals open to review Properties PM20 geo code (P8483) and PM20 subject code (P8484) created[edit]

The property proposals for PM20 geo code and PM20 subject code are now online. Jneubert (talk) 06:19, 14 July 2020 (UTC)[reply]

The properties have been created, the example items, e.g. Germany : Individual diseases and their control (Q91257808), have been updated accordingly.
A first approach to the new geo category pages (Ländersystematik) is available (in German - a rudimentary English version will follow soon). --Jneubert (talk) 06:30, 25 July 2020 (UTC)[reply]
Jneubert (talk) 08:13, 7 March 2019 (UTC) Maxwan (talk) 15:00, 7 March 2019 (UTC) YULdigitalpreservation (talk) 13:12, 8 March 2019 (UTC) Mfchris84 (talk) 14:59, 28 May 2020 (UTC)[reply]
Notified participants of WikiProject 20th Century Press Archives

Country/subject archives completed: data donation of ZBW, part 2[edit]

Jneubert (talk) 08:13, 7 March 2019 (UTC) Maxwan (talk) 15:00, 7 March 2019 (UTC) YULdigitalpreservation (talk) 13:12, 8 March 2019 (UTC) Mfchris84 (talk) 14:59, 28 May 2020 (UTC)[reply]
Notified participants of WikiProject 20th Century Press Archives

During the last year, all 9004 folders of the country/subject archive (Länder-/Sach-Archiv) have been integrated into Wikidata and are now represented as WD items. This map, created from a WDQS query, shows the geographic distribution and allow access to all folders (example: Turkey : Railways (Q99716474)) for a location: 150 geographical items have been linked to overview pages for the according PM20 category (with links to all folders and all articles for, e.g., for Japan). The subject category system is represented also in Wikidata, with 1452 hierarchically organized PM20 subject category items.

More about data modeling issues and general background in ZBW Labs blog.

See also more queries on the Use Cases project page.

--Jneubert (talk) 19:31, 9 February 2021 (UTC)[reply]

Integrating PM20 company/organization folders into Wikidata[edit]

Jneubert (talk) 08:13, 7 March 2019 (UTC) Maxwan (talk) 15:00, 7 March 2019 (UTC) YULdigitalpreservation (talk) 13:12, 8 March 2019 (UTC) Mfchris84 (talk) 14:59, 28 May 2020 (UTC)[reply]

Notified participants of WikiProject 20th Century Press Archives

The companies and organizations archive of PM20 comprises more than 8.300 folders with digitized clippings and annual reports, along with according metadata (example). As third part of the ZBW - Leibniz Information Centre for Economics (Q317179) data donation to Wikidata, and started by April 2021, these companies are systematically linked to existing Wikidata items, or are created as new items from the PM20 metadata (example: Steel Brothers & Company (Q106809286)).

To organize the process, the institutions have been segmented to different current and future Mix-n-match catalogs, according to the Wikipedia language edition which is primarily used for matching.

Current status of the PM20 companies data donation (number of links and new items, per catalog/wiki language)

After Dutch as a pilot, currently the English segment is under way:

Segments matched against the French and German Wikipedia will follow. Details of the process are outlined here as part of the Wikidata:WikiProject 20th Century Press Archives.

--Jneubert (talk) 05:49, 18 May 2021 (UTC)[reply]

Company/organizations archives: Part three of ZBW data donation to Wikidata completed[edit]

Jneubert (talk) 08:13, 7 March 2019 (UTC) Maxwan (talk) 15:00, 7 March 2019 (UTC) YULdigitalpreservation (talk) 13:12, 8 March 2019 (UTC) Mfchris84 (talk) 14:59, 28 May 2020 (UTC)[reply]
Notified participants of WikiProject 20th Century Press Archives, Notified participants of WikiProject Companies

8982 Wikidata items have been linked to all folders with scanned documents of ZBW's Companies Archives. In that course, 3897 items have been supplemented with additional metadata, 5085 items were created from scratch. Almost all of these items - pre-existing and new ones - now have information about type, industry, headquarter location and country. Inception and dissolution dates have been added where available. In total 5052 organization items got GND IDs sourced in PM20. For 885 pre-existing items which had no "instance of" statement (as often automatically created from new Wikipedia pages), an according type was added.

Additional information about connections between companies/organizations and connections to people have been added from converted PM20 metadata:

1936 parent organization (P749) / subsidiary (P355)
2014 follows (P155) / followed by (P156)
222 founded by (P112)
2019 board member (P3320)
215 supervisory board member (P5052)

Please see also a map of headquarters locations and the item count per country.

--Jneubert (talk) 19:28, 5 August 2021 (UTC)[reply]

Example item: Steel Brothers & Company (Q106809286) --Jneubert (talk) 07:01, 6 August 2021 (UTC)[reply]
ZBW Labs blog entry: Integrating the PM20 companies archive: part 3 of the data donation to Wikidata with more details about the matching process and the mapping of industry assignments. --Jneubert (talk) 09:12, 14 December 2021 (UTC)[reply]

Property proposal PM20 ware ID for review[edit]

Jneubert (talk) 08:13, 7 March 2019 (UTC) Maxwan (talk) 15:00, 7 March 2019 (UTC) YULdigitalpreservation (talk) 13:12, 8 March 2019 (UTC) Mfchris84 (talk) 14:59, 28 May 2020 (UTC)[reply]

Notified participants of WikiProject 20th Century Press Archives

In order to link the last part of PM20, the commodities/wares archive, the proposal for an according property is online now for review. --Jneubert (talk) 04:55, 11 May 2022 (UTC)[reply]

Is the PM20 subject category system a private classification?[edit]

No private classifications please[edit]

Talk:Q106589826 has mentioned many arguments against private ontologies inside Wikidata. Please don't create redundant items for your private classification.

Thank you very much. Coward at heart (talk) 20:22, 11 November 2022 (UTC)[reply]

Hi @Coward_at_heart: Thank you, you are touching the topic of an important and lively prior discussion in the project chat (archived), which I unfortunately missed due to holidays. The original poster of the chat topic summarized, "it may be useful in some limited cases to keep the original nomenclature and classification scheme, but mostly it's just laziness on the importer's part, or misunderstanding of the import process".
I want to argue here that the subject category system of the en:20th Century Press Archives is such a special case. It is structurally different from the geographical category system and the commodity/ware category system of the same archives, for which I happily decided to use existing items (just as recommended in general). The subject category system is rather special in that most categories have no equivalent in real world entites: Categories were created, in the first half of the 20th century, according to the expected article volume, often conflate subjects or are too arbitrary for regular Wikidata items (e.g., "The country and its people, politics and economy, general", "Postal services, telegraphy and telephony)", "Historical events 1900-1914"), or "State borders with individual countries"). Many of them make sense only in combination with a geographical category - which is no coincidence: The folders in the Countries-Subjects Archive, for which the category system was created, had been actually defined by a combination of a country and a subject facet.
Due to the historical structure of the whole thing and most of it's categories, to me it made most sense to create a uniform bunch of separate items for all subject categories. This also allows to keep the original hierarchy (which would not suit Wikidata), and makes it open for possible future extensions.
But why at all do we want to have this in Wikidata? If it simply would be a "private classification", as you supposed, we wouldn't. But the subject category system does not just stand for itself, it is a structural backbone for the about 9,200 country/subject folders of 20th Century press archives represented in Wikidata, where each folder is defined by a geographical entity (regular Wikidata item) and a subject (PM20 subject category item) (e.g., Japan : The country and its people, politics and economy, general (Q99718812)). These folders are part of a larger data donation of ZBW to Wikidata (press release Wikimedia Germany and ZBW). The folder items link to scanned press clippings and other documents, in total about 200,000 documents. Together with more folders about persons, companies and commodities/wares they make all published material of the press archives available to the Wikimedia projects and to the general public. (More background on countries/subjects, persons and companies).
That is why I pledge to keep this subject category system as a whole, even when it overlaps in some places (like "horticulture") with existing Wikidata items for a subject. Cheers, Jneubert (talk) 13:10, 16 November 2022 (UTC)[reply]
Notification to the participants of the prior project chat discussion (sorry for not answering there in time!): @Vojt%C4%9Bch_Dost%C3%A1l, Vicarage, ArthurPSmith, Jheald, Tagishsimon, Germartin1: @Jean-Fr%C3%A9d%C3%A9ric, PKM, Silver_hr:
Hi @Coward_at_heart: Can you follow my reasoning for keeping the items separate? I'm going to un-merge them again. Cheers, Jneubert (talk) 07:16, 2 January 2023 (UTC)[reply]
Un-merged a handfull of PM20 subject category items from according "real-world" items. Jneubert (talk) 14:35, 12 January 2023 (UTC)[reply]

Commodities/wares archives: Part four of ZBW data donation to Wikidata completed[edit]

Jneubert (talk) 08:13, 7 March 2019 (UTC) Maxwan (talk) 15:00, 7 March 2019 (UTC) YULdigitalpreservation (talk) 13:12, 8 March 2019 (UTC) Mfchris84 (talk) 14:59, 28 May 2020 (UTC)[reply]

Notified participants of WikiProject 20th Century Press Archives

During the last year, all 2891 online-available folders of the commodities/wares archive, the fourth and last part of the 20th Century Press Archives, have been integrated into Wikidata, and according items have been created. Each item links to the content of the folder (via PM20 folder ID (P4293)), to the - normally pre-existing - item for the ware or commoditiy (via main subject (P921)) and to the geogrphical scope of the folder (via facet of (P1269)).

For an example, see Coal : United States (Q113428466); for a graphical overview, the result of this query gives a colourful picture. Unfortunately, only a fraction of the wares covered in the archive are on the web. Due to intellectual property reasons, the whole archives, which was digitized until 1960, is accessible only on the premises of ZBW - Leibniz Information Centre for Economics (Q317179).

A more detailled description on the archives and the data modeling and integration into Wikidata can be found on ZBW Labs. -- Jneubert (talk) 15:28, 12 January 2023 (UTC)[reply]

Fulltext search function for PM20 folders[edit]

A tool for searching folders has been added: https://pm20-search.toolforge.org. It is based on text from the Wikidata items. Particularly useful für persons and companies, where lots of synonyms (in different languages) exist - added in part by this project. Try e.g. "abdul hamid II" or "berliner schloßbräu". The search function has been integrated into the pages of https://pm20.zbw.eu.

The query (with credits to the original authors) is here. Jneubert (talk) 06:11, 4 December 2023 (UTC)[reply]