Zuletzt aktualisiert
5. April 2022
Open use. Must provide the source.



The ZIP file contains all data and code to replicate the analyses reported in the following paper.

Reber, U., Fischer, M., Ingold, K., Kienast, F., Hersperger, A. M., Grütter, R., & Benz, R. (2022). Integrating biodiversity: A longitudinal and cross-sectoral analysis of Swiss politics. Policy Sciences. https://doi.org/10.1007/s11077-022-09456-4

If you use any of the material included in this repository, please refer to the paper. If you use (parts of) the text corpus, please also refer to the sources used for its compilation listed below. The content of the texts may not be changed.

Data folder

The data folder contains the following files.

  • corpus.parquet: Text corpus of Swiss policy documents
  • _dictde.csv: Biodiversity dictionary (German)
  • _dictfr.csv: Biodiversity dictionary (French)
  • _dictit.csv: Biodiversity dictionary (Italian)
  • _topiclabels.csv: labels/codes for policy sectors
  • topics.csv: labels/codes for policy sectors

The corpus and the dictionary were compiled by the authors specifically for this project. The labels/codes for policy sectors are based on the coding scheme of the Swiss Parliament.

Text corpus

The text corpus consists of 439,984 Swiss policy documents in German, French, and Italian from 1999 to 2018. The corpus was compiled from the following source between 2020-10-01 and 2021-01-31.

  • Transcripts and parliamentary businesses (e.g. questions, motions, parliamentary initiatives) via the Web Services (WS) provided by the Swiss Parliament
  • The official compilation of federal legislation ("Amtliche Sammlung", AS) via opendata.swiss provided by the Swiss Federal Archives (SFA)
  • The federal gazette ("Bundesblatt") via fedlex.admin.ch
  • Decisions of federal courts via entscheidsuche.ch (ES)

The corpus is stored in a single data frame to use with R saved as PARQUET file (corpus.parquet). The data frame has the following structure.

  • _textid: Unique identifier for each text (source information as prefix, e.g. "t_")
  • _doctype: Document type (see coding scheme below)
  • branch: Government branche (1 legislative, 2 executive, 3 judicative)
  • stage: Stage of policy process (1 drafting, 2 introduction, 3 interpretation)
  • year: Year of publication
  • topic: Policy sector (coding scheme in separate file in data folder)
  • lang: Language (de, fr, it)
  • text: Text

The following list contains the coding scheme for the doc_type variable.

  • 101: Federal gazette // Draft for public consultation ("Vernehmlassungsverfahren")
  • 102: Federal gazette // Explanation of draft for parliament ("Botschaft")
  • 103: Federal gazette // Strategy, action plan
  • 104: Federal gazette // Federal council decree ("Bundesratsbeschluss")
  • 105: Federal gazette // (Simple) Federal decree ("(Einfacher) Bundesbeschluss")
  • 106: Federal gazette // General decree ("Allgemeinverfügung")
  • 107: Federal gazette // Treaty ("Übereinkommen")
  • 108: Federal gazette // Treaty ("Abkommen")
  • 109: Federal gazette // Draft for parliament ("Entwurf")
  • 110: Federal gazette // Report ("Bericht")
  • 111: Federal gazette // Report of parliamentary comission ("Bericht")
  • 112: Federal gazette // Report of federal council ("Bericht")
  • 201: Parl. businesses // Submitted text
  • 202: Parl. businesses // Reason text
  • 203: Parl. businesses // Federal council response
  • 204: Parl. businesses // Initial situation
  • 205: Parl. businesses // Proceedings
  • 301: Parl. transcripts // Speech of MP
  • 302: Parl. transcripts // Speech of federal council
  • 401: Federal legislation // Legal text of the official compilation (law, ordinances, etc.)
  • 501: Court decisions // Federal Supreme Court
  • 502: Court decisions // Federal Criminal Court
  • 503: Court decisions // Federal Administrative Court

    Code folder

The code folder contains all R code for the analyses. The files are numbered chronologically.

  • _1_classifiertraining.R: Training of classifiers for classification of policy sectors
  • _2_classifierapplication.R: Classification of documents in corpus
  • _3_dictionaryapplication.R: Biodiversity indexing of documents in corpus
  • _4_stmtruncation.R: Truncation of indexed documents to keep only relevant parts
  • _5_stmtranslation.R: Translation of FR and IT documents to DE
  • _6_stmmodel.R: Preprocesssing and structural topic model
  • _7plots.R: Plots and numbers as included in the paper

The code/functions folder contains custom functions used in the scripts, e.g. to support topic model interpretation.

Package versions and setup details are noted in the code files.


Please direct any questions to Ueli Reber (ueli.reber@eawag.ch).


Zusätzliche Informationen

28. Februar 2022
5. April 2022
Weitere Informationen
Landing page
Zeitliche Abdeckung
Räumliche Abdeckung
API (JSON) XML herunterladen

Haben Sie Fragen?

Fragen Sie den Publisher direkt

Ueli Reber