About the Role

ML training clusters are some of the most high performance computers on the planet. Even relatively small clusters would have been in the TOP500 5 years ago. Our supercomputing team is responsible for keeping our compute clusters running smoothly, monitoring hardware health, and fixing things when they go wrong. We believe strongly in automation — code is the only reliable way to manage hardware at scale. As we scale, this will become a more data-driven role, predicting failures before they happen. We’re a small team, so you’ll be spending time talking to customers as well.

About Us

We’re the San Francisco Compute Company. We’re building the first real-time trading platform. We think that over the next decade, thousands of startups and labs are going to be training and serving large models. They need compute to do this, and we’re building a platform on which that compute can be traded. If we’re successful, it will be possible to scale to tens of thousands of accelerators for hours at a time without having to build your own infrastructure. This will greatly increase the number of organizations that can afford to train large models, which will make the most important technology of our lifetime accessible to more people.

About You

Some Nice to Haves

Compensation

US: $170k - $300k + equity

<aside> 🌱 Apply here

</aside>