Build Your Own ML Workflow – With Expert Advice

The Determined team has worked with countless teams on their goals to bring their ML projects into production - from emerging startups to enterprise teams deploying on thousands of GPUs.

Get Expert ML Help

Have an ML project in mind but not sure where to begin? Schedule time with one of our ML Engineers to get a full architecture review today.

Build a seamless ML workflow in minutes

Schedule 1-on-1 ML Project Planning with Determined AI.

The components of an ML workflow:

  1. Data Collection: The first step in any machine learning workflow is accessing a dataset and prepping it for training. There are a multitude of publicly-accessible datasets available to use depending on your use case. We recommend using a platform like HuggingFace to find a dataset that fits your needs.
  2. Data Preparation: Once you’ve identified which data you’d like to train with, you’ll need to extract, transform, and load (ETL) your data to prepare for training, which means converting data to a suitable format to be consumed by ML algorithms.
  3. Choosing a Model: Choose the appropriate machine learning algorithm or model architecture for your task. Consider factors like the type of data, problem type (classification, regression, clustering, etc.), and performance requirements.
  4. Task Orchestration: Feature engineering, training, and prediction all need to be scheduled on compute infrastructure – either on a cloud provider like GCP, AWS, or Azure, or on GPU-optimized on-prem hardware.
    1. Sometimes this includes data pipelines, for which there is a wide range of open source tooling that automates this process for users (i.e. Pachyderm).
  5. Model Training: Machine learning model training is the process of applying algorithms to a dataset, which in turn learns patterns to make predictions on new data. There are several sub-steps to consider within training:
    1. Hyperparameter Search, sometimes called Hyperparameter Tuning or Optimization: The process of choosing the right “hyperparameters”, or tunable parameters involved in the model training process, such as the model architecture, learning algorithm, batch sizes, learning rate, number of layers, etc., to yield the most accurate and efficient model.
    2. Experiment Tracking: Keeping track of metrics associated with model training, such as model losses during training, is important for accuracy and reproducibility.
    3. Resource Management: Model training is almost always computationally expensive. GPU resources aren’t cheap, so being able to monitor efficiency of cloud or on-prem hardware resources is important.
    4. Distributed Training: For large-scale machine learning jobs, the ability to distribute model training across a cluster of GPUs - a technique called distributed training – can speed up training time while using less resources
  1. Evaluation: While our data is being trained, it’s important to periodically monitor how our algorithm is performing, to prepare for a real-world production use case.
  2. light bulb

    Note: Steps 1-6 are constantly repeated as a part of the ML lifecycle. Depending on the specific use case, there are tools to help users automate each of these steps.

  3. Model Testing: Evaluate the final model on the test dataset to estimate its real-world performance. Avoid making further changes to the model based on test set results to prevent overfitting.
  4. Prediction: We use the trained model to perform new tasks and solve new problems – which means making a prediction on new data.
  5. Interaction: Users need a way to interact with our model, too. Usually this takes the form of an API, a user interface, or a command-line interface (CLI).
  6. Infrastructure: Machine learning infrastructure includes the resources, processes, and tooling needed to develop, train, and operate machine learning models. Our underlying infrastructure needs to be able to support each stage or our ML workflow, so it’s important to consider this step early on. Sometimes called “MLOps”, there are numerous open source options to adopt here.
    1. For example, Determined is a software that supports the model training category mentioned above, but needs hardware and software infrastructure surrounding it to optimize training.
  7. Model Deployment: If our model meets the desired performance criteria, we’re ready to deploy it in a production environment.

Our Team

The Determined Team is comprised of machine learning, distributed systems, and open source software experts. As a part of HPE’s AI Solutions team, we’re at the forefront of at-scale machine learning implementations across a wide range of use cases and industries – from autonomous vehicles and rare drug discovery to multimodal Natural Language Processing and advanced Generative AI.

Model development and training

Get Expert ML Help

Have an ML project in mind but not sure where to begin? Schedule time with one of our ML Engineers to get a full architecture review today.

Build a seamless ML workflow in minutes

Schedule 1-on-1 ML Project Planning with Determined AI.