Cedana (YC S23) Is Hiring
Cedana is addressing the challenges of AI and HPC infrastructure by enhancing cluster utilization and reliability through automated GPU checkpointing. The company is seeking a Forward Deployed Engineer to lead customer integrations and optimize platform performance. The role requires extensive experience with SLURM deployments and strong Linux fundamentals.
- ▪Cedana maximizes AI+HPC cluster utilization and reliability with automated GPU checkpointing infrastructure.
- ▪The Forward Deployed Engineer will engage with customers to deploy Cedana in various environments, including SLURM and Kubernetes.
- ▪Candidates should have 3-10 years of software engineering experience and a strong understanding of SLURM and Linux systems.
Opening excerpt (first ~120 words) tap to expand
Introducing Cedana The Problem AI and HPC infrastructure suffers from scarcity and high costs, so when failures happen they are costly in terms of time and money. Cluster productivity directly determines research output and revenue. Achieving high utilization and throughput is increasingly challenging due to the complexity of workloads, hardware, and operations. Cedana’s Solution Cedana maximizes AI+HPC cluster utilization and reliability with automated GPU checkpointing infrastructure. We enable transparent and fast migration of GPU workloads across instances, without losing work. Workloads automatically migrate to achieve new levels of reliability and throughput while accelerating time to results.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Y Combinator.