Dataset

WeStAc will gather massive textual resources from 1945 to 1989 in three corpora that represent modern Swedish governmental publications and political propositions (“Politics”), newspaper content (“Media”), and literature and fiction (“Culture”). These three datasets contain textual output from Swedish society, media and culture, with each dataset in turn made up of different sources. Importantly, materials and documents within two of WeStAc’s major datasets—“Politics” and “Media”—have already been digitised by the National Library and the Swedish Parliament. The dataset of “Culture”, however, remains to be digitised and curated. The first dataset, “Politics”, is divided into SOU reports containing some 300 million tokens, and recently digitized collections from the Swedish Parliament (political propositions, proposals, debates, resolutions and bills). Most Swedish parliamentary documents have been digitised, and can be downloaded from data.kb.se (1945–1970) and at data.riksdagen.se (1971–1989). Political proposals and parliamentary motions contain some 350 million tokens, political speeches and debates 200 million tokens, and parliamentary committee proposals 150 million tokens. Research within WP3 will use these different corpora—but also merge them. In all, the dataset of “Politics” contains an estimated one billion tokens. WeStAc’s second dataset, “Media”, is made up of two digitised Swedish newspapers from 1945 to 1989, Aftonbladet and Dagens Nyheter . The number of pages in these two newspapers approximately amount to 1,4 million. The sum of words (or tokens) on each page, however, differ substantially. Dagens Nyheter is a considerable longer paper than Aftonbladet , and a rough estimation indicates that Aftonbladet contains around 1,000 words on each page, and Dagens Nyheter the double. Aftonbladet would thus generate some 500 million tokens, and Dagens Nyheter 1,5 billion—in all two billion tokens. WeStAc’s third dataset, “Culture”, will be digitised and compiled from scratch. It will contain, (A.) all Swedish novels published between 1945 and 1989—in all, some 22,000 titles, an approximate three million pages, equaling 750 million tokens. (B.) Sweden’s most emblematic cultural and literary journal during the welfare state decades, Bonniers Litterära Magasin (BLM). BLM was published between 1932 and 2004, and approximately encompasses some 28 000 pages, containing some eight million tokens.

For more information on datasets provided by the National Library of Sweden, see https://data.kb.se/