AgentPS: Agentic Process Supervision for Multi-modal Content Quality Assurance through Multi-round QA

Liu, Gorden; Sun, Yu; Sun, Ruixiao; Dong, Xin; Xiong, Hongyu

Computer Science > Computation and Language

arXiv:2412.15251 (cs)

[Submitted on 15 Dec 2024]

Title:AgentPS: Agentic Process Supervision for Multi-modal Content Quality Assurance through Multi-round QA

Authors:Gorden Liu, Yu Sun, Ruixiao Sun, Xin Dong, Hongyu Xiong

View PDF HTML (experimental)

Abstract:The advanced processing and reasoning capabilities of multimodal large language models (MLLMs) have driven substantial progress in vision-language (VL) understanding tasks. However, while effective for tasks governed by straightforward logic, MLLMs often encounter challenges when reasoning over complex, interdependent logic structures. To address this limitation, we introduce \textit{AgentPS}, a novel framework that integrates Agentic Process Supervision into MLLMs via multi-round question answering during fine-tuning. \textit{AgentPS} demonstrates significant performance improvements over baseline MLLMs on proprietary TikTok datasets, due to its integration of process supervision and structured sequential reasoning. Furthermore, we show that replacing human-annotated labels with LLM-generated labels retains much of the performance gain, highlighting the framework's practical scalability in industrial applications. These results position \textit{AgentPS} as a highly effective and efficient architecture for multimodal classification tasks. Its adaptability and scalability, especially when enhanced by automated annotation generation, make it a powerful tool for handling large-scale, real-world challenges.

Comments:	8 pages, 2 figures
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2412.15251 [cs.CL]
	(or arXiv:2412.15251v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2412.15251

Submission history

From: Hongyu Xiong [view email]
[v1] Sun, 15 Dec 2024 04:58:00 UTC (122 KB)

Computer Science > Computation and Language

Title:AgentPS: Agentic Process Supervision for Multi-modal Content Quality Assurance through Multi-round QA

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:AgentPS: Agentic Process Supervision for Multi-modal Content Quality Assurance through Multi-round QA

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators