PolySmart @ TRECVid 2024 Medical Video Question Answering

Wu, Jiaxin; Jiang, Yiyang; Wei, Xiao-Yong; Li, Qing

Computer Science > Computer Vision and Pattern Recognition

arXiv:2412.15514 (cs)

[Submitted on 20 Dec 2024]

Title:PolySmart @ TRECVid 2024 Medical Video Question Answering

Authors:Jiaxin Wu, Yiyang Jiang, Xiao-Yong Wei, Qing Li

View PDF HTML (experimental)

Abstract:Video Corpus Visual Answer Localization (VCVAL) includes question-related video retrieval and visual answer localization in the videos. Specifically, we use text-to-text retrieval to find relevant videos for a medical question based on the similarity of video transcript and answers generated by GPT4. For the visual answer localization, the start and end timestamps of the answer are predicted by the alignments on both visual content and subtitles with queries. For the Query-Focused Instructional Step Captioning (QFISC) task, the step captions are generated by GPT4. Specifically, we provide the video captions generated by the LLaVA-Next-Video model and the video subtitles with timestamps as context, and ask GPT4 to generate step captions for the given medical query. We only submit one run for evaluation and it obtains a F-score of 11.92 and mean IoU of 9.6527.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Cite as:	arXiv:2412.15514 [cs.CV]
	(or arXiv:2412.15514v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2412.15514

Submission history

From: Jiaxin Wu [view email]
[v1] Fri, 20 Dec 2024 02:59:59 UTC (5,202 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2024-12

Change to browse by:

cs
cs.MM

References & Citations

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:PolySmart @ TRECVid 2024 Medical Video Question Answering

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:PolySmart @ TRECVid 2024 Medical Video Question Answering

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators