Logo des Repositoriums
  • English
  • Deutsch
Anmelden
Keine TU-ID? Klicken Sie hier für mehr Informationen.
  1. Startseite
  2. Publikationen
  3. Publikationen der Technischen Universität Darmstadt
  4. Zweitveröffentlichungen (aus DeepGreen)
  5. SASVi: segment any surgical video
 
  • Details
2025
Zweitveröffentlichung
Artikel
Verlagsversion

SASVi: segment any surgical video

File(s)
Download
Hauptpublikation
11548_2025_Article_3408.pdf
CC BY 4.0 International
Format: Adobe PDF
Size: 2.74 MB
TUDa URI
tuda/13981
URN
urn:nbn:de:tuda-tuprints-306679
DOI
10.26083/tuprints-00030667
Autor:innen
Sivakumar, Ssharvien Kumar
Frisch, Yannik ORCID 0009-0005-8097-0158
Ranem, Amin ORCID 0000-0003-0783-6903
Mukhopadhyay, Anirban
Kurzbeschreibung (Abstract)

Purpose: Foundation models, trained on multitudes of public datasets, often require additional fine-tuning or re-prompting mechanisms to be applied to visually distinct target domains such as surgical videos. Further, without domain knowledge, they cannot model the specific semantics of the target domain. Hence, when applied to surgical video segmentation, they fail to generalise to sections where previously tracked objects leave the scene or new objects enter.

Methods: We propose SASVi, a novel re-prompting mechanism based on a frame-wise object detection Overseer model, which is trained on a minimal amount of scarcely available annotations for the target domain. This model automatically re-prompts the foundation model SAM2 when the scene constellation changes, allowing for temporally smooth and complete segmentation of full surgical videos.

Results: Re-prompting based on our Overseer model significantly improves the temporal consistency of surgical video segmentation compared to similar prompting techniques and especially frame-wise segmentation, which neglects temporal information, by at least 2.4%. Our proposed approach allows us to successfully deploy SAM2 to surgical videos, which we quantitatively and qualitatively demonstrate for three different cholecystectomy and cataract surgery datasets.

Conclusion: SASVi can serve as a new baseline for smooth and temporally consistent segmentation of surgical videos with scarcely available annotation data. Our method allows us to leverage scarce annotations and obtain complete annotations for full videos of the large-scale counterpart datasets. We make those annotations publicly available, providing extensive annotation data for the future development of surgical data science models.

Freie Schlagworte

Surgical video segmen...

Foundation models

Temporal consistency

Sprache
Englisch
Fachbereich/-gebiet
20 Fachbereich Informatik > Graphisch-Interaktive Systeme
DDC
000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik
600 Technik, Medizin, angewandte Wissenschaften > 610 Medizin, Gesundheit
Institution
Universitäts- und Landesbibliothek Darmstadt
Ort
Darmstadt
Typ des Artikels
Wissenschaftlicher Artikel
Titel der Zeitschrift / Schriftenreihe
International Journal of Computer Assisted Radiology and Surgery : A journal for interdisciplinary research, development and applications of image guided diagnosis and therapy
Startseite
1409
Endseite
1419
Jahrgang der Zeitschrift
20
Heftnummer der Zeitschrift
7
ISSN
1861-6429
Verlag
Springer
Ort der Erstveröffentlichung
Berlin ; Heidelberg
Publikationsjahr der Erstveröffentlichung
2025
Verlags-DOI
10.1007/s11548-025-03408-y
PPN
534932770

  • TUprints Leitlinien
  • Cookie-Einstellungen
  • Impressum
  • Datenschutzbestimmungen
  • Webseitenanalyse
Diese Webseite wird von der Universitäts- und Landesbibliothek Darmstadt (ULB) betrieben.