1 Institute for AI Industry Research (AIR), Tsinghua University
2 Tsinghua Shenzhen International Graduate School, Tsinghua University
3 School of Software and Microelectronics, Peking University
4 College of Computer Science, Zhejiang University
5 College of Computer Science and Technology, Harbin Engineering University
6 Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, School of Artificial Intelligence, Beihang University
* Indicates Equal Contribution
† Indicates Corresponding Author
📰 News: Our paper has been accepted by COLING 2025! 🎉 See you in Abu Dhabi, UAE, from January 19 to 24, 2025!
Overview of the framework of Idea-2-3D, which employs LMM to explore the T-2-3D model's potential through multimodal iterative self-refinement to provide valid T-2-3D prompts for the input user IDEA. Green rounded rectangles indicate steps completed by GPT-4V. Purple rounded rectangles indicate T-2-3D modules, including T2I models and I-2-3D models. The yellow rounded rectangle indicates the off-the-shelf 3D model multi-view generation algorithm. To get a better reconstruction, we remove the background of the image between steps 2 and 3. The blue color indicates the memory module, which saves all the feedback from previous rounds, the best 3D model, and the best text prompt.
@article{chen2024idea23d,
title={Idea-2-3D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs},
author={Junhao Chen and Xiang Li and Xiaojun Ye and Chao Li and Zhaoxin Fan and Hao Zhao},
year={2024},
eprint={2404.04363},
archivePrefix={arXiv},
primaryClass={cs.CV}
}