Logo des Repositoriums
  • English
  • Deutsch
Anmelden
Keine TU-ID? Klicken Sie hier für mehr Informationen.
  1. Startseite
  2. Publikationen
  3. Publikationen der Technischen Universität Darmstadt
  4. Erstveröffentlichungen
  5. Evaluating Cultural Diversity in Text-to-Image Models
 
  • Details
2025
Erstveröffentlichung
Bachelorarbeit
Verlagsversion

Evaluating Cultural Diversity in Text-to-Image Models

File(s)
Download
Hauptpublikation
Bachelor_Thesis.pdf
CC BY 4.0 International
Format: Adobe PDF
Size: 5.22 MB
TUDa URI
tuda/14110
URN
urn:nbn:de:tuda-tuprints-308143
DOI
10.26083/tuprints-00030814
Autor:innen
Li, Kai ORCID 0009-0002-0237-8050
Kurzbeschreibung (Abstract)

Text-To-Image (T2I) models have advanced significantly, leading to widespread global adoption. However, these improvements are not equally reflected across cultures. To evaluate their cultural knowledge and assess cross-cultural differences requires benchmarks. Current cultural benchmarks rely on simple text prompts with English cultural concepts, overlooking the complexity of individuals, locations, and their semantic and spatial interactions in realistic settings. To address this gap, we present a new T2I benchmark for evaluating cultural knowledge, containing 1) a multilingual cultural concept dataset and 2) modular prompt templates for generating compositional and complex prompts. Our dataset is built using a pipeline that automatically extracts cultural concepts from Wikipedia, then refined through Large Language Models and human assessment, covering 4 geographically and typologically diverse Geo-Cultures across 12 categories. With 37 prompt templates, each containing 5 unique individuals and locations per category, our framework enables comprehensive cultural evaluation of T2I models by generating up to 2.3 million unique text prompts. We demonstrate that existing metrics fail to adequately assess the generation quality of cultural concepts by comparing embedding-based models on aligned Wikipedia image-caption pairs and propose an automatic metric using Visual Question Answering models to evaluate text-to-image alignment. Our analysis of three stateof-the-art T2I models reveal that they handle compositional prompts well but are limited in their generative capabilities by their insufficient cultural knowledge. The assessment of their multilingual understanding, achieved by translating prompts in the concept’s native language and evaluating cross-lingual consistency reveals a bias in non-multilingual models towards Western languages. This underscores the need to improve cross-cultural and multilingual capabilities in T2I models.

Freie Schlagworte

Text-to-Image Generat...

Multimodal Models

Cultural Benchmarking...

Cross-Cultural Evalua...

Multilingual Datasets...

Visual Question Answe...

Multilingual Understa...

Sprache
Englisch
Fachbereich/-gebiet
20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung
DDC
000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik
Institution
Technische Universität Darmstadt
Ort
Darmstadt
Datum der mündlichen Prüfung
05.12.2024
Gutachter:innen
Gurevych, Iryna
Liu, Chen Cecilia
Name der Gradverleihenden Institution
Technische Universität Darmstadt
Ort der Gradverleihenden Institution
Darmstadt
PPN
532047788

  • TUprints Leitlinien
  • Cookie-Einstellungen
  • Impressum
  • Datenschutzbestimmungen
  • Webseitenanalyse
Diese Webseite wird von der Universitäts- und Landesbibliothek Darmstadt (ULB) betrieben.