University of Rostock, Ludwig-Maximilians-Universität, 2025
https://doi.org/10.18453/rosdok_id00005015
Abstract: The dataset contains data from a qualitative study on texts generated by large language models (Mistral Large Instruct, Gemma 3, DeepSeek R1, Meta Llama 3.1, Llama Sauerkraut, Qwen 3) using various comparable prompts in three different languages (German, English, French) to define diversity in order to identify political and cultural bias in the training material. Each result was generated using a new context window and the same or comparable settings between the LLMs (medium temp, top_p and the same system prompt). The process was repeated at least five times for each prompt in the respective language. In addition, the settings were experimented with in an additional run. In total, the dataset comprises more than 270 comparable documents and more than 50 experimental documents, which are stored as .rtf files and .txt files in the dataset.
Datenpublikation
Freier
Zugang
CC BY-NC 4.0Dieses Werk ist lizenziert unter einer
Creative Commons Namensnennung-Nicht kommerziell 4.0 International Lizenz.