Running LTX-2.3 Alongside TTS on a Single 96GB GPU with a Cold-Start Architecture
The article discusses the challenges of running the LTX-2.3 audio-to-video model alongside TTS on a single 96GB GPU. It details the issues faced with VRAM limitations and the transition to a cold-start architecture to manage memory usage effectively. The author shares insights on optimizing the model's loading process to reduce VRAM consumption significantly.
- ▪The LTX-2.3 model requires significant VRAM, leading to out-of-memory errors when run in persistent mode.
- ▪Switching to a cold-start architecture allows the system to idle at 0 GiB and peak at 40 GiB of VRAM usage.
- ▪Using 4-bit loading with bitsandbytes significantly reduces the VRAM footprint of the Gemma text encoder from 22.78 GiB to 7.26 GiB.
Opening excerpt (first ~120 words) tap to expand
try { if(localStorage) { let currentUser = localStorage.getItem('current_user'); if (currentUser) { currentUser = JSON.parse(currentUser); if (currentUser.id === 3945785) { document.getElementById('article-show-container').classList.add('current-user-is-article-author'); } } } } catch (e) { console.error(e); } shinji shimizu Posted on May 22 • Originally published at kotonia.ai Running LTX-2.3 Alongside TTS on a Single 96GB GPU with a Cold-Start Architecture #ai #machinelearning #python #gpu When integrating LTX-2.3 (a 22B audio-to-video model) into a voice roleplay product, I ran straight into a VRAM wall. The classic dead-end: running it as a persistent server ate 86 GiB, instantly OOM-ing the TTS / Ditto / MuseTalk stack sharing the same GPU.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at DEV.to (Top).