October 05, 2020
BERT has quickly become one of the most influential ML models in the world, leveraging the Transformer architecture to achieve state-of-the-art results on a range of natural language processing (NLP) tasks. Let’s check out how you can fine-tune BERT for your NLP task using Determined.
Thanks to the Hugging Face transformers library, its easy to get started with BERT in Determined. The code for this example can be found here1. For the example we’ll be training a question-answering model on the Stanford Question Answering Dataset.
You can easily train a model locally – to get started clone the code above and try:
pip install -r requirements.txt python train_local.py
This script will download SQuAD locally, download a pretrained Bert model, and begin fine-tuning on the SQuAD dataset. You’ll probably want a GPU, this could take a while!
Behind the scenes, we’ve implemented BERT in a Determined PyTorch Trial Interface. By organizing the model this way, we can use Determined to track our experiments, scale to distributed training, and do hyperparameter tuning. To get started, you’ll need to install Determined, and configure the Determined cli.
Once you have Determined running, you can train BERT and track the progress of training with:
det experiment create const.yaml .
When training has completed (about 2 epochs), your model should obtain a validation F1 score of ~88.
Note that the experiment we want to run is defined in
description: Bert_SQuAD_PyTorch hyperparameters: global_batch_size: 12 learning_rate: 3e-5 lr_scheduler_epoch_freq: 1 adam_epsilon: 1e-8 weight_decay: 0 num_warmup_steps: 0 max_seq_length: 384 doc_stride: 128 max_query_length: 64 n_best_size: 20 max_answer_length: 30 null_score_diff_threshold: 0.0 max_grad_norm: 1.0 num_training_steps: 15000 searcher: name: single metric: f1 max_length: records: 180000 smaller_is_better: false min_validation_period: records: 12000 data: pretrained_model_name: "bert-base-uncased" download_data: False task: "SQuAD1.1" entrypoint: model_def:BertSQuADPyTorch
If you want to modify the experiment, say to modify hyperparameters or the duration of training, you can easily make changes to this file. For more information about how to configure experiments, check out the Determined experiment configuration documentation.
If you have a multi-GPU Determined cluster, you can run distributed training by adding a few lines to your config (captured in
resources: slots_per_trial: 8
Which you can easily submit to Determined with:
det experiment create distributed.yaml .
To learn more about training BERT at large scales, check out this blog post.
This BERT model is also outfitted for easily accessible hyperparameter tuning. To set up a hyperparameter tuning experiment, all you need to do is modify the configuration file to describe the algorithm you want to use (we recommend using ASHA) as well as the hyperparameter search space:
description: Bert_SQuAD_PyTorch_hp_search hyperparameters: global_batch_size: 12 learning_rate: base: 10 maxval: -4 minval: -6 type: log lr_scheduler_epoch_freq: 1 model_type: 'bert' adam_epsilon: base: 10 maxval: -6 minval: -10 type: log weight_decay: 0 num_warmup_steps: 0 max_seq_length: 384 doc_stride: 128 max_query_length: 64 n_best_size: 20 max_answer_length: 30 null_score_diff_threshold: 0.0 max_grad_norm: 1.0 num_training_steps: 15000 searcher: name: adaptive_asha metric: f1 max_length: records: 180000 max_trials: 48 smaller_is_better: false data: pretrained_model_name: "bert-base-uncased" download_data: False task: "SQuAD1.1" entrypoint: model_def:BertSQuADPyTorch
Then launch your search with:
det experiment create search.yaml .
August 2023 update: The original BERT example has been replaced with an ALBERT example. ↩