Addressing the challenges of massively parallel hyperparameter optimization

By Ameet Talwalkar, Desmond Chan

February 20, 2019

As most deep learning engineers know, it can take days or weeks to train a deep learning model, costing organizations considerable time and money. But what if we could speed up the process and achieve better results in the process?

Earlier this month, Determined AI chief scientist and Carnegie Mellon University (CMU) professor of machine learning Ameet Talwalkar shared insights on the problem of massively parallel hyperparameter optimization with members of the NYC Deep Learning (DL) meetup and introduced a new way to tackle the challenge of model training. Members heard an overview of Automated Machine Learning (AutoML); noted two core challenges within this space, namely hyperparameter optimization and neural architecture search (NAS); and learned how NAS is actually a specialized subproblem of hyperparameter optimization.

The presentation included an overview of modern hyperparameter optimization in the context of deep learning, focusing on the core challenges engineers face when working with large number of tunable hyperparameters and model training times that can stretch from days to weeks. In order to adequately explore these large search spaces, DL developers must evaluate a large number of configurations. While they naturally would like to leverage parallel and distributed computing infrastructure to speed up the search process, they typically must consider orders of magnitude and more configurations than available parallel workers, which can limit the benefits of many traditional parallelization schemes.

Ameet and team aimed to tackle this challenge by introducing ASHA, a simple and robust hyperparameter tuning algorithm with solid theoretical underpinnings that exploits asynchronous parallelism and aggressive early-stopping. Their extensive empirical results show that ASHA outperforms state-of-the-art hyperparameter tuning methods such as Population Based Training (PBT) and Bayesian Optimization and Hyperband (BOHB); scales linearly with the number of workers in distributed settings; converges to a high-quality configuration in half the time taken by Vizier (Google’s internal hyperparameter tuning service) in an experiment with 500 workers; and competes favorably with specialized NAS methods on standard benchmarks.

As a matter of fact, Determined AI’s proprietary hyperparameter optimization method improves upon the research-grade ASHA approach, and seamlessly integrates it into our end-to-end deep learning platform.

View the complete presentation to learn more:

If you’d like to explore the topic of hyperparameter optimization in more depth we recommend these resources:

Massively parallel hyperparameter optimization

Neural architecture search

Toward the jet age of machine learning

What hyperparameter optimization challenges have you encountered and how did you solve them? Start a conversation with us.

Addressing the challenges of massively parallel hyperparameter optimization

Recent Posts

Finding the best LoRA parameters

Summer '24 Conference Recap

How does Video Generation work?