Imagine a world in which gradient descent or second-order methods have not yet been invented, and the only way to train machine learning models is to tune their weights by hand.
We are entering the golden age of artificial intelligence.
In a previous post on “What’s the deal with Neural Architecture Search?”, Liam Li and I discussed Neural Architecture Search (NAS) as a promising research direction that has the potential to replace expert-designed networks with learned, task-specific architectures.
As most deep learning engineers know, it can take days or weeks to train a deep learning model, costing organizations considerable time and money. But what if we could speed up the process and achieve better results in the process?
Deep learning offers the promise of bypassing the process of manual feature engineering by learning representations in conjunction with statistical models in an end-to-end fashion.
In this post, we discuss how warm-starting can save computational resources and improve generalizability when training deep learning models.
Last week, we at Determined AI were honored to sponsor a meetup of the Women in Infrastructure group focused on ML infrastructure.
In this post, we discuss the missing key to fully leveraging your hardware investment: specialized software that understands the unique properties of deep learning workloads.
To maximize the value of your deep learning hardware, you’ll need to invest in software infrastructure. Setting up a cluster manager is an essential first step in this process, but it’s not the end of the story.
Reproducible machine learning is hard, particularly when training deep learning models. We review common sources of DL non-determinism and how to address them.