New video demo: LLM Batch Inference with Determined

In this talk from ML-at-Scale 2023, Corey shows how to use Determined’s Core API and Hugging Face Transformers to build and optimize batch inference workflows. He also discusses some advanced parallelization techniques, and shows how to achieve them using Determined’s DeepSpeed integration. Warning: This video is code-heavy!

