Pixal3D: Pixel-Aligned 3D Generation from Images
Pixal3D introduces a new paradigm for generating 3D assets from 2D images with improved pixel-level fidelity. By aligning 3D generation directly with input image pixels through back-projected features, it addresses ambiguity in 2D-3D correspondence. The method supports high-quality, scalable 3D synthesis and extends naturally to multi-view and scene-level generation.
- ▪Pixal3D improves fidelity in image-to-3D generation by establishing direct pixel-to-3D correspondence.
- ▪The method uses pixel back-projection to lift 2D image features into a 3D feature volume.
- ▪Unlike canonical-space approaches, Pixal3D generates 3D assets aligned with the input view.
- ▪It supports multi-view generation by aggregating features across views.
- ▪Pixal3D enables high-fidelity, object-separated 3D scene synthesis through a modular pipeline.
Opening excerpt (first ~120 words) tap to expand
Abstract Recent advances in 3D generative models have rapidly improved image-to-3D synthesis quality, enabling higher-resolution geometry and more realistic appearance. Yet fidelity, which measures pixel-level faithfulness of the generated 3D asset to the input image, still remains a central bottleneck. We argue this stems from an implicit 2D-3D correspondence issue: most 3D-native generators synthesize shapes in canonical space and inject image cues via attention, leaving pixel-to-3D associations ambiguous. To tackle this issue, we draw inspiration from 3D reconstruction and propose Pixal3D, a pixel-aligned 3D generation paradigm for high-fidelity 3D asset creation from images.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Github.