AI Infrastructure for Everyone, Now Open Source

Lack of software infrastructure is a fundamental bottleneck in achieving AI’s immense potential – a fact not lost on tech giants like Google, Facebook, and Microsoft. These elite firms have invested massive resources and expertise to build proprietary, AI-native internal infrastructure, and are already reaping the benefits in the form of transformative AI-powered applications and productive Deep Learning Engineers. For everyone else who doesn’t have access to this infrastructure, building practical applications powered by AI remains prohibitively expensive, time-consuming, and difficult.

We started Determined AI three years ago to bring AI-native software infrastructure to the broader market. Working closely with cutting-edge deep learning teams across a variety of industries, a clear narrative emerged: without better infrastructure, training deep learning models at scale remains extremely difficult, as organizations move from research to production. This feedback led us to build the Determined Training Platform, which now powers teams of Deep Learning Engineers and large GPU clusters in industries like pharmaceutical drug discovery, AdTech, Industrial IoT, and autonomous vehicles.

Our innovative infrastructure platform is now ready for widespread adoption, and we’re excited to share it with the DL community! Today we are announcing that we have open sourced our deep learning training platform under the Apache 2.0 license.

Introducing the Determined Training Platform

We designed the Determined Training Platform to empower Deep Learning Engineers to focus on the task at hand — training high-quality models. To achieve this goal, our platform tightly integrates all of the features that a DL engineer needs to train models at scale, including:

Determined outperforms existing tools

  • High-performance distributed training that just works. Determined’s distributed training support builds upon Horovod, a popular distributed training framework, but includes a suite of optimizations that results in twice the performance of stock Horovod. Moreover, Determined’s distributed training support is easy to set up (no code changes are needed to move from single-GPU to distributed training), and allows multiple users to seamlessly share the same GPU cluster. One of our customers saw a 24x speedup in training time by simply turning on this functionality!
  • State-of-the-art hyperparameter search. Determined’s search functionality builds on our cutting-edge research over the past decade.1,2,3,4,5 Our hyperparameter search integrates tightly with our job scheduler and is parallel by default — so you can get to more accurate models 100x faster than standard search methods and 10x faster than Bayesian Optimization methods.
  • DL tools for individuals and teams. Determined helps you excel in experiment management with experiment tracking, log management, metrics visualization, reproducibility, and dependency management. These tools boost productivity for individual DL engineers over the lifespan of a project, and are essential for growing teams to collaborate and scale efficiently.
  • Hardware-agnostic and integrated with the Open Source Ecosystem. Determined supports the public cloud and on-prem infrastructure, which means you can avoid getting locked into proprietary solutions. Moreover, Determined works with your DL framework of choice, exports to popular serving frameworks, and more generally integrates with a wide range of data prep and model serving technologies (see figure below).

Determined Architecture Diagram

Join our Growing Community and Start Using Determined

The Determined Training Platform already powers hundreds of GPUs at innovative companies. Here are some insights into how two of our current customers have integrated Determined’s platform to become a core part of their DL efforts:

“With AI at the forefront of Recursion’s vision for biopharmaceuticals, we use Determined to manage 100s of on-premise GPUs, as well as dynamically scale to using GPUs on Google Cloud Platform. Using Determined’s native support for distributed training, we were able to reduce the training time for a key computer vision model from 3 days to 3 hours, without changing our model code.”
—Ben Mabey, Interim CTO, Recursion

“By adopting Determined AI’s software platform, our team of deep learning engineers has been able to rapidly deliver new, advanced, Industrial IoT products to our customers. We’re delivering new AI features 10 times faster than before.”
—Ben Chehebar, Chief Product Officer, Compology

We look forward to building the future of AI-native infrastructure together. We invite you to check us out on GitHub, install the product, and read the documentation. If you have feedback or run into issues, please join us in the Determined Community Slack or post on the mailing list.

  1. A System for Massively Parallel Hyperparameter Tuning. L. Li, K. Jamieson, A. Rostamizadeh, E. Gonina, J. Ben-tzur, M. Hardt, B. Recht, A. Talwalkar. Conference on Machine Learning and Systems (MLSys), 2020. 

  2. Random Search and Reproducibility for Neural Architecture Search. L. Li, A. Talwalkar. Conference on Uncertainty in Artificial Intelligence (UAI), 2019. 

  3. Hyperband: Bandit-Based Configuration Evaluation for Hyperparameter Optimization. L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, A. Talwalkar. International Conference on Learning Representations (ICLR), 2017. 

  4. Non-stochastic Best Arm Identification and Hyperparameter Optimization. K. Jamieson, A. Talwalkar. International Conference on Artificial Intelligence and Statistics (AISTATS), 2016. 

  5. Automating Model Search for Large Scale Machine Learning. E. Sparks, A. Talwalkar, D. Haas, M. Franklin, M. I. Jordan, T. Kraska. Symposium on Cloud Computing (SOCC), 2015.