Logo des Repositoriums
  • English
  • Deutsch
Anmelden
Keine TU-ID? Klicken Sie hier für mehr Informationen.
  1. Startseite
  2. Publikationen
  3. Publikationen der Technischen Universität Darmstadt
  4. Erstveröffentlichungen
  5. Scaling Up Scientific QA
 
  • Details
2026
Erstveröffentlichung
Bachelorarbeit

Scaling Up Scientific QA

File(s)
Download
Hauptpublikation
20260218_Scaling_Up_Scientific_QA.pdf
CC BY 4.0 International
Format: Adobe PDF
Size: 2.28 MB
TUDa URI
tuda/15127
URN
urn:nbn:de:tuda-tuda-151275
DOI
10.26083/tuda-7777
Autor:innen
Ngen Jiaxi, Joy ORCID 0009-0005-1269-9095
Kurzbeschreibung (Abstract)

The rapid growth of scientific publications makes it increasingly difficult for researchers to keep up with new findings. Scientific question answering (QA) systems aim to automatically respond to questions based on scientific articles. Advancing these systems requires high-quality, large-scale datasets. Current work is either limited to small scale due to costly manual annotation or lacks realistic depth when generated synthetically. To address this gap, this thesis introduces a novel framework for automatically generating scientific QA pairs from research literature using large language models (LLMs). The framework extracts QA pairs from peer reviews and rebuttals with state-of-the-art open-source LLMs, applying automated filtering and validation to ensure coherence and relevance. The resulting dataset comprises 12,628 free-form, open-ended QA pairs across ten scientific domains. We conduct extensive experiments to evaluate the dataset, examining both the impact of fine-tuning on our resource and its performance across several benchmarks. Results show that fine-tuning substantially improves a model’s ability to understand and apply scientific knowledge. These findings highlight the value of our framework and demonstrate the potential of peer review–based resources in advancing scientific QA, particularly for generative tasks and long-context reasoning.

Freie Schlagworte

NLP

Question Answering

AI4Science

Sprache
Englisch
Fachbereich/-gebiet
20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung
DDC
000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik
Institution
Universitäts- und Landesbibliothek Darmstadt
Ort
Darmstadt
Gutachter:innen
Gurevych, IrynaORCID 0000-0003-2187-7621
Baumgärtner, Tim
Name der Gradverleihenden Institution
Technische Universität Darmstadt
Ort der Gradverleihenden Institution
Darmstadt
PPN
542322145
Zusätzliche Links (Organisation)
https://ukp-lab.de
Ergänzende Ressourcen (Forschungsdaten)
huggingface.co/datasets/ukplab/PeerQA-XT

  • TUprints Leitlinien
  • Cookie-Einstellungen
  • Impressum
  • Datenschutzbestimmungen
  • Webseitenanalyse
Diese Webseite wird von der Universitäts- und Landesbibliothek Darmstadt (ULB) betrieben.