Evaluating and Predicting Distorted Human Body Parts for Generated Images

Ma, Lu; Cao, Kaibo; Liang, Hao; Lin, Jiaxin; Li, Zhuang; Liu, Yuhong; Zhang, Jihong; Zhang, Wentao; Cui, Bin

Abstract:Recent advancements in text-to-image (T2I) models enable high-quality image synthesis, yet generating anatomically accurate human figures remains challenging. AI-generated images frequently exhibit distortions such as proliferated limbs, missing fingers, deformed extremities, or fused body parts. Existing evaluation metrics like Inception Score (IS) and Fréchet Inception Distance (FID) lack the granularity to detect these distortions, while human preference-based metrics focus on abstract quality assessments rather than anatomical fidelity. To address this gap, we establish the first standards for identifying human body distortions in AI-generated images and introduce Distortion-5K, a comprehensive dataset comprising 4,700 annotated images of normal and malformed human figures across diverse styles and distortion types. Based on this dataset, we propose ViT-HD, a Vision Transformer-based model tailored for detecting human body distortions in AI-generated images, which outperforms state-of-the-art segmentation models and visual language models, achieving an F1 score of 0.899 and IoU of 0.831 on distortion localization. Additionally, we construct the Human Distortion Benchmark with 500 human-centric prompts to evaluate four popular T2I models using trained ViT-HD, revealing that nearly 50\% of generated images contain distortions. This work pioneers a systematic approach to evaluating anatomical accuracy in AI-generated humans, offering tools to advance the fidelity of T2I models and their real-world applicability. The Distortion-5K dataset, trained ViT-HD will soon be released in our GitHub repository: \href{this https URL}{this https URL}.

Comments:	8 pages, 6 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2503.00811 [cs.CV]
	(or arXiv:2503.00811v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2503.00811

Computer Science > Computer Vision and Pattern Recognition

Title:Evaluating and Predicting Distorted Human Body Parts for Generated Images

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators