BurnoutText - Frequent Words in Texts about Burnout, Depression and a Control Group

This dataset was generated in the context of a research project funded by the Swiss National Science Foundation (grant nr. 196483, see https://data.snf.ch/grants/grant/196483). In this project, new methods from natural language processing are applied to develop new methods for burnout detection in clinical psychology/psychiatry. For details refer to: https://www.bfh.ch/en/research/research-projects/2021-288-996-826/ The source data for this derived dataset was collected from Reddit and consists of a "Burnout" dataset with 352 samples, a "No burnout" dataset with 13,216 samples and a "Depression" dataset with 979 samples. More details about the original dataset can be found in the following publication: https://doi.org/10.3389/fdata.2022.863100 All contractions were expanded (ex. "I'm" to "I am") using the contractions python library. We used the spacy en-core-web-sm pre-trained English language pipeline to tokenize each text sample, remove stopwords and punctuation, and lemmatize the remaining tokens. For example, the text "I feel like I have been working too much. Everything is exhausting." would be converted to "feel like work exhausting". The dataset presented here was then compiled by counting the top 20 lemmatized tokens in each of the classes (Burnout, No burnout and Depression). The words are ordered from more frequent to less frequent.

    Organizational unit
    BFH - Institute for Data Applications and Security (IDAS)
    Type
    Dataset
    DOI
    10.34914/olos:kfp3d6vg6fgvra7khhrg2xqdsy
    License
    Creative Commons Attribution 4.0 International
    Keywords
    burnout, natural language processing, machine learning, augmented intelligence, ensemble classifier, psychology, mental health
Publication date06/21/2022
Retention date
accessLevelPublicAccess levelPublic
SensitivityBlue
licenseContract on the use of data
License
Contributors
  • Nath, Sukanya orcid
  • Puttick, Alexandre orcid
  • Merhbene, Ghofrane orcid
  • Kurpicz-Briki, Mascha orcid
186
16
  • Quality (0 Reviews)
  • Usefulness (0 Reviews)

Datacite metadata

Packages information

All rights reserved by DLCM and the University of GenevaunigeBlack