Ideogram 4.0: A 9.3B open-weight image model
Ideogram 4.0 is a new open-weight image model featuring 9.3 billion parameters. It utilizes a unique architecture that combines a vision-language text encoder with a single-stream Diffusion Transformer. The model is specifically trained on structured JSON captions to enhance image generation capabilities.
- ▪Ideogram 4.0 is a 9.3B parameter open-weight text-to-image model.
- ▪The model employs a vision-language text encoder and a single-stream Diffusion Transformer.
- ▪It is trained exclusively on structured JSON captions with detailed descriptions of image elements.
Opening excerpt (first ~120 words) tap to expand
Technical Model release June 3, 2026 Ideogram 4.0 Technical Details: Open model at the forefront of design Our first open-weight foundation model. A 9.3B single-stream Diffusion Transformer, trained from scratch, with a vision-language text encoder and structured JSON prompts. Authors. Ideogram Team Reading time. 5 min Weights. Hugging Face Code. GitHub Overview Ideogram 4.0 is a 9.3B parameter open-weight text-to-image model. Recent open-weight releases have converged on a single self-attention sequence over text and image tokens[1][2][3], and Ideogram 4.0 follows the same pattern: text and image tokens share the same projections at every layer of a 34-layer DiT. Two design choices distinguish it from peer releases.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Ideogram.