F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions

Yang, Jie; Niu, Xuesong; Jiang, Nan; Zhang, Ruimao; Huang, Siyuan

Abstract:Existing 3D human object interaction (HOI) datasets and models simply align global descriptions with the long HOI sequence, while lacking a detailed understanding of intermediate states and the transitions between states. In this paper, we argue that fine-grained semantic alignment, which utilizes state-level descriptions, offers a promising paradigm for learning semantically rich HOI representations. To achieve this, we introduce Semantic-HOI, a new dataset comprising over 20K paired HOI states with fine-grained descriptions for each HOI state and the body movements that happen between two consecutive states. Leveraging the proposed dataset, we design three state-level HOI tasks to accomplish fine-grained semantic alignment within the HOI sequence. Additionally, we propose a unified model called F-HOI, designed to leverage multimodal instructions and empower the Multi-modal Large Language Model to efficiently handle diverse HOI tasks. F-HOI offers multiple advantages: (1) It employs a unified task formulation that supports the use of versatile multimodal inputs. (2) It maintains consistency in HOI across 2D, 3D, and linguistic spaces. (3) It utilizes fine-grained textual supervision for direct optimization, avoiding intricate modeling of HOI states. Extensive experiments reveal that F-HOI effectively aligns HOI states with fine-grained semantic descriptions, adeptly tackling understanding, reasoning, generation, and reconstruction tasks.

Comments:	ECCV24
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2407.12435 [cs.CV]
	(or arXiv:2407.12435v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2407.12435

Computer Science > Computer Vision and Pattern Recognition

Title:F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators