iFusion: Inverting Diffusion for
Pose-Free Reconstruction from Sparse Views

  • 1National Tsing Hua University
  • 2Microsoft
  • 3Amazon
Paper Code
iFusion extends existing Image-to-3D methods to generate personalized 3D assets given additional views with unknown camera transformations.

Abstract

We present iFusion, a novel 3D object reconstruction framework that requires only two views with unknown camera poses. While single-view reconstruction yields visually appealing results, it can deviate significantly from the actual object, especially on unseen sides. Additional views improve reconstruction fidelity but necessitate known camera poses. However, assuming the availability of pose may be unrealistic, and existing pose estimators fail in sparse view scenarios. To address this, we harness a pre-trained novel view synthesis diffusion model, which embeds implicit knowledge about the geometry and appearance of diverse objects. Our strategy unfolds in three steps: (1) We "invert" the diffusion model for camera pose estimation instead of synthesizing novel views. (2) The diffusion model is fine-tuned using provided views and estimated poses, turned into a novel view synthesizer tailored for the target object. (3) Leveraging registered views and the fine-tuned diffusion model, we reconstruct the 3D object. Experiments demonstrate strong performance in both pose estimation and novel view synthesis. Moreover, iFusion seamlessly integrates with various reconstruction methods and enhances them.

architecture

Novel view synthesis

With as few as two views without poses, iFusion can customize the view synthesis diffusion model, Zero123, for the target object through self-training.

architecture

Citation

@article{wu2023ifusion,
  author = {Wu, Chin-Hsuan and Chen, Yen-Chun, Solarte, Bolivar and Yuan, Lu and Sun, Min},
  title = {iFusion: Inverting Diffusion for Pose-Free Reconstruction from Sparse Views},
  journal = {arXiv preprint arXiv:2312.17250},
  year = {2023},
}