Software Engineer, Applied ML

About this role

As an Applied Machine Learning Engineer, you will work closely with our technical customers (ML developers) on a wide variety of deep-learning use-cases across industries such as advertising, financial services, biotechnology, and transportation. You will have the opportunity to own important customer relationships, and work closely with our customers’ technical experts to build production-quality machine learning applications powered by Determined AI’s software. You will meet regularly with these customers to understand their pain points, gather product feedback, and work with our engineering team to design and develop new product features inspired by customer feedback and/or cutting-edge deep learning research.

Requirements

  • Strong problem solving and analytical skills.
  • Excellent communication and presentation skills, both written and verbal.
  • 2+ years of experience designing, implementing and shipping reliable production-quality software in an industrial setting.
  • 1+ years of experience building machine learning applications for computer vision, natural language processing, text understanding, pattern recognition, recommendation systems, ranking systems, or similar.
  • Proficiency with Python and one or more of the leading deep learning software packages: TensorFlow, Keras, PyTorch, or MXNet.

Preferred

  • Ph.D. in Machine Learning / Math / Computer Science or equivalent deep theoretical knowledge of machine learning algorithms.
  • Familiarity with using and/or debugging open-source libraries used in enterprise infrastructure solutions such as Docker, Kubernetes, Mesos, Hadoop / HDFS, and Apache Spark.
  • Experience in sales engineering or customer-facing roles in the enterprise software industry.

Teams & Process

We are building a team of world class engineers — join us! We have one product and one team, where everyone is a worker-leader. We combine input from customers, engineers and company leadership to prioritize our work, and work hard to make decisions transparent. We believe in tight feedback with customers, and in minimum valuable products.

We believe in just enough (but not too much) process; currently we run scrum with two week sprints. We use Github to manage our work; we require code review, lint, and tests to pass for all our PRs. We run an extensive continuous integration pipeline to test our GPU features. We use Slack, GSuite and have provisioned a video conferencing system for our remote workers.

Technical Challenges

We have implemented, from scratch, a distributed, fault tolerant GPU cluster manager and scheduler, purpose-built for DL and ML workloads. We have invented, published and implemented state-of-the-art hyperparameter optimization algorithms in our platform. We have numerous other research ideas ready to turn into product features that will differentiate us from our competitors.

Technical Stack

    Go

    Python

    Docker

    TensorFlow

    PyTorch

    Keras

    Elm

    Kubernetes

    Mesos

    PostgreSQL