-
3DCoMPaT$^{++}$: An improved Large-scale 3D Vision Dataset for Compositional Recognition
Authors:
Habib Slim,
Xiang Li,
Yuchen Li,
Mahmoud Ahmed,
Mohamed Ayman,
Ujjwal Upadhyay,
Ahmed Abdelreheem,
Arpit Prajapati,
Suhail Pothigara,
Peter Wonka,
Mohamed Elhoseiny
Abstract:
In this work, we present 3DCoMPaT$^{++}$, a multimodal 2D/3D dataset with 160 million rendered views of more than 10 million stylized 3D shapes carefully annotated at the part-instance level, alongside matching RGB point clouds, 3D textured meshes, depth maps, and segmentation masks. 3DCoMPaT$^{++}$ covers 41 shape categories, 275 fine-grained part categories, and 293 fine-grained material classes…
▽ More
In this work, we present 3DCoMPaT$^{++}$, a multimodal 2D/3D dataset with 160 million rendered views of more than 10 million stylized 3D shapes carefully annotated at the part-instance level, alongside matching RGB point clouds, 3D textured meshes, depth maps, and segmentation masks. 3DCoMPaT$^{++}$ covers 41 shape categories, 275 fine-grained part categories, and 293 fine-grained material classes that can be compositionally applied to parts of 3D objects. We render a subset of one million stylized shapes from four equally spaced views as well as four randomized views, leading to a total of 160 million renderings. Parts are segmented at the instance level, with coarse-grained and fine-grained semantic levels. We introduce a new task, called Grounded CoMPaT Recognition (GCR), to collectively recognize and ground compositions of materials on parts of 3D objects. Additionally, we report the outcomes of a data challenge organized at CVPR2023, showcasing the winning method's utilization of a modified PointNet$^{++}$ model trained on 6D inputs, and exploring alternative techniques for GCR enhancement. We hope our work will help ease future research on compositional 3D Vision.
△ Less
Submitted 12 March, 2024; v1 submitted 27 October, 2023;
originally announced October 2023.
-
CoT3DRef: Chain-of-Thoughts Data-Efficient 3D Visual Grounding
Authors:
Eslam Mohamed Bakr,
Mohamed Ayman,
Mahmoud Ahmed,
Habib Slim,
Mohamed Elhoseiny
Abstract:
3D visual grounding is the ability to localize objects in 3D scenes conditioned by utterances. Most existing methods devote the referring head to localize the referred object directly, causing failure in complex scenarios. In addition, it does not illustrate how and why the network reaches the final decision. In this paper, we address this question Can we design an interpretable 3D visual groundin…
▽ More
3D visual grounding is the ability to localize objects in 3D scenes conditioned by utterances. Most existing methods devote the referring head to localize the referred object directly, causing failure in complex scenarios. In addition, it does not illustrate how and why the network reaches the final decision. In this paper, we address this question Can we design an interpretable 3D visual grounding framework that has the potential to mimic the human perception system?. To this end, we formulate the 3D visual grounding problem as a sequence-to-sequence Seq2Seq task by first predicting a chain of anchors and then the final target. Interpretability not only improves the overall performance but also helps us identify failure cases. Following the chain of thoughts approach enables us to decompose the referring task into interpretable intermediate steps, boosting the performance and making our framework extremely data-efficient. Moreover, our proposed framework can be easily integrated into any existing architecture. We validate our approach through comprehensive experiments on the Nr3D, Sr3D, and Scanrefer benchmarks and show consistent performance gains compared to existing methods without requiring manually annotated data. Furthermore, our proposed framework, dubbed CoT3DRef, is significantly data-efficient, whereas on the Sr3D dataset, when trained only on 10% of the data, we match the SOTA performance that trained on the entire data. The code is available at https:eslambakr.github.io/cot3dref.github.io/.
△ Less
Submitted 20 April, 2024; v1 submitted 9 October, 2023;
originally announced October 2023.
-
BlockCampus: A Blockchain-Based DApp for enhancing Student Engagement and Reward Mechanisms in an Academic Community for E-JUST University
Authors:
Mariam Ayman,
Youssef El-harty,
Ahmed Rashed,
Ahmed Fathy,
Ahmed Abdullah,
Omar Wassim,
Walid Gomaa
Abstract:
In today's digital age, online communities have become an integral part of our lives, fostering collaboration, knowledge sharing, and community engagement. Higher education institutions, in particular, can greatly benefit from dedicated platforms that facilitate academic discussions and provide incentives for active participation. This research paper presents a comprehensive study and implementati…
▽ More
In today's digital age, online communities have become an integral part of our lives, fostering collaboration, knowledge sharing, and community engagement. Higher education institutions, in particular, can greatly benefit from dedicated platforms that facilitate academic discussions and provide incentives for active participation. This research paper presents a comprehensive study and implementation of a decentralized application (DApp) leveraging the blockchain technology to address these needs specifically for E-JUST (Egypt-Japan University of Science and Technology) students and academic staff.
△ Less
Submitted 7 July, 2023;
originally announced July 2023.