• Skip to main content
  • Skip to primary sidebar

utterworks

  • Home
  • Solutions
    • Natural Language Assessment
    • Fluent One – Natural Language Platform
    • Fluent Find – Natural Language Search
    • Fluent Converse – Conversational AI
  • About Us
    • About Utterworks
    • Meet the Directors
    • Contact Us
  • Blogs

BERT

12/09/2019

Train and Deploy the Mighty BERT based NLP models using FastBert and Amazon SageMaker

FastBert — The story so far…

In my earlier introduction to FastBert, I described it as a library that will allow developers and data scientists to train and deploy BERT based models for NLP tasks beginning with Text Classification. The scope of BERT (read Transformers) based models have widened a bit since I wrote my earlier blog and includes BERT, XLNet, RoBERTa, DistilBERT and a few more.

I am happy to report that with lots of support from Hugging Face, FastBert now supports all the above mentioned model architectures and with a couple of changes in input parameters, you can try out all the above model architectures on your custom datasets. With the current pace of research in the area of Transformer based models, I expect the model architectures to grow rapidly in coming days/weeks/months and I hope to support all or most of them.

BERT meets Amazon SageMaker

One of the key necessities in training BERT based models is access to GPUs, the more the better. I personally have been fortunate to have access to multiple GPUs in order to experiment with different Transformer architecures and parameters but I am sure it is one of the major issues for the research and developer community. A single GPU AWS p3.2xlarge EC2 instance will cost about $80 a day and a multi-gpu AWS p3.8xlarge EC2 instance will set you back by $320 a day. One has to be incredibly disciplined to switch off the virtual machines when not in use in order not to get a shock bill. Another issue with using a virtual machine approach is that you will limited scope of testing out different hyper-parameters or BERT architectures in parallel as you are limited by the number of GPUs available in each virtual machine.

What about Inference?

Training is just half of the job. Once the model is trained to your satisfaction, you would like to have a simple way to deploy the trained model in a highly scalable, available and secure environment with a REST API endpoint. Developers and data scientists would agree with me that this step is generally ignored by most academic researchers, however in the industry this step is what counts for the most.

Amazon SageMaker

In Amazon’s own words:

Amazon SageMaker provides every developer and data scientist with the ability to build, train, and deploy machine learning models quickly. Amazon SageMaker is a fully-managed service that covers the entire machine learning workflow to label and prepare your data, choose an algorithm, train the model, tune and optimize it for deployment, make predictions, and take action. Your models get to production faster with much less effort and lower cost.

and I must say that I tend to agree for most part.

FastBert includes the support for training BERT models on Amazon SageMaker. With FastBert on SageMaker, you only pay for the time (in seconds) your experiment is actually executing the training loop. Once the training epochs are complete, the training resources are automatically released and your trained model artefacts are securely stored in the S3 bucket of your AWS account, ready to be deployed as a RESTFul endpoint.

In this blog, I will describe how to train and deploy BERT based models using FastBert on Amazon SageMaker.

The AWS components used here are:

EC2 Container Repository (ECR) Image

In order for us to use FastBert with SageMaker we will have to pack together the library, training code and pretrained weights as a Docker image stored in AWS EC2 Container Repository(ECR). We will be using the same image to hold both the training and inference code for FastBert.

S3 Bucket

S3 bucket holds the training and validation data and other config files. The data in S3 bucket can be encrypted using AWS KMS.

S3 bucket also holds the output of the training job which will be the trained model artefacts, log files and tensorboard output.

SageMaker Training Job

To train a model in SageMaker, you will need to create a training job. The training job includes the following:

  1. Reference to S3 bucket training location (input bucket)
  2. Reference to the S3 bucket to store trained model artefacts (output bucket)
  3. Reference to the AWS ECR image that holds our FastBert library and training code

The training job will also be passed the ML compute resources, i.e. the type in instance used for the training job (p3.2xlarge, p3.8xlarge, etc). The compute resoures are managed by SageMaker. The training job also gets the defined model hyperparameters.

After you create the training job, Amazon SageMaker launches the ML compute instances and uses the training code and the training dataset to train the model. It saves the resulting model artifacts and other output in the S3 bucket you specified for that purpose.

SageMaker Endpoint

SageMaker provides the model hosting service to deploy the trained model and provides an HTTPS endpoint to provide inferences. The SageMaker training job creates a trained model that allows us to create a so-called SageMaker model. By creating a model, you tell Amazon SageMaker where it can find the model components. This includes the S3 path where the model artifacts are stored and the Docker registry path for the image that contains the inference code.

When hosting models in production, you can configure the endpoint to elastically scale the deployed ML compute instances. For each production variant, you specify the number of ML compute instances that you want to deploy. When you specify two or more instances, Amazon SageMaker launches them in multiple Availability Zones. This ensures continuous availability. Amazon SageMaker manages deploying the instances.

How does this work?

Prerequisites

  1. Install Docker on your computer.
  2. Create an AWS Account.
  3. Install and configure AWS CLI on your computer.

Create the FastBert ECR image

In order to use BERT based transformer model architectures using fast-bert, we need to provide the custom algorithm code to SageMaker. This is done in the shape of a docker image stored in Amazon Elastic Container Registry (ECR). The image is created using DockerFile contained in the fast-bert repository.

  1. Clone the fast-bert repository on your local machine using git clone https://github.com/kaushaltrivedi/fast-bert.git
  2. Navigate to the container folder of the fast-bert repository.
  3. Run the script build_and_push.sh. On successful execution of the script, you will have a docker image named sagemaker-bert in your AWS account. This script will also prepackage some of the most used pre-trained weights in the docker image. This is particularly useful if you decide to run SageMaker training jobs in a network isolation mode or withing a VPC without any internet gateway. Feel free to update this script for your own purpose.

This docker image can be used to train and deploy any number of models that are supported by the fast-bert library. At this point you can use the AWS Console to create a training job. But I have created a “helper” Jupyter notebook for uploading data and config files to S3 bucket, creating a training job, and then deploying the model as a SageMaker endpoint.

Note that this Sagemaker notebook doesn’t need any GPUs. This can also be executed on your local machine or a low-cost virtualmachine. The training and inference will be delegated to the managed Amazon SageMaker instance.

SageMaker Helper Notebook

Import the necessary libraries.

Setup the paths for your local data locations. The data and label files must already be stored in the DATA_PATH location. We will be creating the training_config.json file shortly.

Hyper-parameters and Training configuration

I have split the parameters required by SageMaker into Hyper-parameters and general configuration parameters. Hyper-parameters are passed directly to SageMaker training job and can be tuned to optimise model.

The general parameters that cannot be tuned by SageMaker are stored in training_config.json and provided to SageMaker through the S3 bucket.

These are the parameters that are either used by data-bunch or learner objects. This particular example is for the multi-label scenario and hence the label_col list is serialised as a string. I hope to improve this in the future. In case of a multi-class text classification, label_col will just be the name for the label column.

As you would notice we also save the training_config object in a file at CONFIG_PATH location.

Upload data and config to S3 bucket

Create an estimator object and start training

At this point SageMaker will create the training instance using the Docker image that you have provided. It will then download the data and config files from S3 bucket to the SageMaker instance and start the training job.

The fit function calls the Amazon SageMaker CreateTrainingJob API to start model training. The function uses configuration you provided to create the estimator and the specified input training data to send the CreatingTrainingJob request to Amazon SageMaker.

You should see the logs similar to the following which keeps you informed on the status of training job. The logs are displayed in the notebook and they are also available in AWS CloudWatch logs for future reference.

2019-08-27 10:15:06 Starting - Starting the training job...
2019-08-27 10:15:08 Starting - Launching requested ML instances......
2019-08-27 10:16:08 Starting - Preparing the instances for training...
2019-08-27 10:17:05 Downloading - Downloading input data...
2019-08-27 10:17:11 Training - Downloading the training image............
2019-08-27 10:19:19 Training - Training image download completed. Training in progress.

You can also see the training job details in AWS console.

Once the training job is complete, the trained model and all the accompanying files such as config file, tokenizer vocabulary and labels.csv are zipped and stored in the S3 bucket specified in the estimator object’s output_path parameter.

You can call the deploy() method to host the model using the Amazon SageMaker hosting services.

Voila!!! You now have an active model endpoint that you can invoke to get real-time inference. You can use AWS SDK for all the major supported platforms and call the InvokeEndpoint API to get the inference.

As you see from the example above, we have used different types of instances for training and hosting. For training we use an instance with multiple GPUs. However for hosting the model to get the inference, you can use a cheaper instance such as m5.large which is optimised for general compute but does not contain any expensive GPUs.

The complete notebook is available in the fast-bert github repo at:Notebook on nbviewerCheck out this Jupyter notebook!nbviewer.jupyter.org


Conclusion and next steps

Hopefully this story will help you leverage the power of Amazon SageMaker to train and deploy BERT based models on your own data using the fast-bert library.

Amazon SageMaker abstracts away the complexities related to maintaining secure and expensive GPU-powered virtual machines for training phase and also simplifies the process of deploying the model to production.

You will be able to customise most of the fast-bert parameters through the use of hyper-parameters and training config file and at the same time build sophisticated training and hosting production workflows.

Some of the next steps would be to use additional SageMaker features such as hyper-parameter tuning, elastic inference, batch inference and more.

I would love to hear your suggestions on further improvements and also welcome your code contribution to the fast-bert github repo.

References

  • The original BERT paper.
  • The fast-bert library.
  • PyTorch implementation of BERT by Hugging Face
  • Highly recommended course.fast.ai. I have learned a lot about deep learning and transfer learning for natural language processing by following fast.ai.

17/05/2019

Introducing FastBert — A simple Deep Learning library for BERT Models

BERT What?

The little Sesame Street muppet has taken the world of Natural Language Processing by storm and the storm is picking up speed. We have seen a number of NLP problems solved by neural network architectures built on top of contextual representations of BERT. To name a few BERT based models have pushed the state of the art for SQUAD 2.0 question answering, GLUE multi task learning, Google natural questions task and Biomedical domain specific tasks — BioBERT.

Google research open sourced the TensorFlow implementation for BERT along with the pretrained weights. This opened the door for the amazing developers at Hugging Face who built the PyTorch port for BERT. With this library, geniuses i.e. developers and data scientists can use BERT models for text classification, question answering, fine tuning language model and more. Yours truly has contributed to the text classification capability by adding the feature for multi-label text classification.

Enter FastBert

FastBert is the deep learning library that allows developers and data scientists to train and deploy BERT based models for natural language processing tasks beginning with Text Classification. The work on FastBert is inspired by fast.ai and strives to make the cutting edge deep learning technologies accessible for the vast community of machine learning practitioners.

With FastBert, you will be able to:

  1. Train (more precisely fine-tune) BERT text classification models on your custom dataset
  2. Tune model hyper-parameters such as epochs, learning rate, batch size, optimiser schedule and more
  3. Save and deploy trained model for inference (including on AWS Sagemaker)

Starting today, FastBert will support both multi-class and multi-label text classification and in due course, it will support other NLU tasks such as Named Entity Recognition, Question Answering and Custom Corpus fine-tuning. I rely on the community to help make this happen 🙂

Installation

pip install fast-bert

From Source: pip install git+https://github.com/kaushaltrivedi/fast-bert.git


Usage

Import the required packages. Please note that I have not included the usual suspects such as os, pandas, etc.

Define general parameters and path locations for data, labels and pretrained models. (some good engineering practices)

Tokenizer

Create a tokenizer object. The is the BPE based WordPiece tokenizer and is available from the magnificient Hugging Face BERT PyTorch library.

The do_lower_case parameter depends on the version of the BERT pretrained model you have used. In case you use uncased models, set this value to true, else set it to false. For this example we have use the BERT base uncased model and hence do_lower_case parameter is set to true.

GPU & Device

Training a BERT model does require a single or more preferably multiple GPUs. In this step we can setup GPU parameters for our training.

Note that in the future releases, this step will be abstracted from the user and the library will automatically determine the correct device profile.

BertDataBunch

This is an excellent idea borrowed from fast.ai library. The databunch object takes training, validation and test csv files and converts the data into internal representation for BERT. The object also instantiates the correct data-loaders based on device profile and batch_size and max_sequence_length.

The DataBunch object provides the location to the data files and the label.csv file. For each of the data files, i.e. train.csv, val.csv and/or test.csv, the databunch creates a dataloader object by converting the csv data into BERT-specific input objects. I would encourage you to explore the structure of the databunch object using Jupyter notebook.

BertLearner

Another concept in line with the fast.ai library, BertLearner is the ‘learner’ object that holds everything together. It encapsulates the key logic for the lifecycle of the model such as training, validation and inference.

The learner object will take the databunch created earlier as as input alongwith some of the other parameters such as location for one of the pretrained BERT models, FP16 training, multi_gpu and multi_label options.

The learner class contains the logic for training loop, validation loop, optimiser strategies and key metrics calculation. This help the developers focus on their custom use-cases without worrying about these repetitive activities.

At the same time the learner object is flexible enough to be customised either via using flexible parameters or by creating a subclass of BertLearner and redefining relevant methods.

The learner object does the following upon initiation:

  1. Creates a PyTorch BERT model and initialises the same with provided pre-trained weights. Based on the multi_label parameter, the model class will be BertForSequenceClassification or BertForMultiLabelSequenceClassification.
  2. Assigns the model to the right device, i.e. CUDA based GPU or CPU. if Nvidia Apex is available, the distributed processing functions of Apex will be utilised.

fast-bert provides a bunch of metrics. for multi-class classification, you will generally use accuracy whereas for multi-label classification, you should consider using accuracy_thresh and/or roc_auc.

Train the model

Start the model training by calling fit method on the learner object. the method takes epoch, learning rate and optimiser schedule_type as input. Following schedule types are supported (again courtesy of the Hugging Face Bert library):

  • none: always returns learning rate 1.
  • warmup_constant: Linearly increases learning rate from 0 to 1 over warmup fraction of training steps. Keeps learning rate equal to 1. after warmup.
  • warmup_linear: Linearly increases learning rate from 0 to 1 over warmup fraction of training steps. Linearly decreases learning rate from 1. to 0. over remaining 1 - warmup steps.
  • warmup_cosine: Linearly increases learning rate from 0 to 1 over warmup fraction of training steps. Decreases learning rate from 1. to 0. over remaining 1 - warmup steps following a cosine curve. If cycles(default=0.5) is different from default, learning rate follows cosine function after warmup.
  • warmup_cosine_hard_restarts: Linearly increases learning rate from 0 to 1 over warmup fraction of training steps. If cycles (default=1.) is different from default, learning rate follows cycles times a cosine decaying learning rate (with hard restarts).
  • warmup_cosine_warmup_restarts: All training progress is divided in cycles (default=1.) parts of equal length. Every part follows a schedule with the first warmup fraction of the training steps linearly increasing from 0. to 1., followed by a learning rate decreasing from 1. to 0. following a cosine curve. Note that the total number of all warmup steps over all cycles together is equal to warmup * cycles

On calling the fit method, the library will start printing the progress information on the logger object. It will print training and validation losses, and the metric that you have requested.

In order to repeat the experiment with different parameters, just create a new learner object and call fit method on the same. If you have tons of GPU compute, then you can possibly run multiple experiments in parallel by instantiating multiple databunch and learner objects at the same time.

Once you are happy with your experiments, call the save_and_reload method on learner object to persist the model on the file structure.


Model Inference

You have two options to get inference from the model.

Call predict_batch method on the learner object that contains the trained model.

Of course the above method is convenient if you already have a trained learner object in memory. If you have persistent trained model and just want to run inference logic on that trained model, use the second approach, i.e. the predictor object.

And thats how it works…The library repo contains a sample notebook to demonstrate the usage of the library.


Conclusion and next steps

Hopefully this library will help you build and deploy BERT based NLU models within minutes. In the next part, I will describe how to build your training workflow using fast-bert and deploy your trained model as an endpoint on AWS SageMaker. Watch this space

With respect to this library it is very much in early stages of development. I do have a few more ideas with respect to further development of the library. Some of them are:

  1. Add capability to pre-train a BERT language model for custom text corpus
  2. Add other NLU capabilities such as NER, question answering, and more.
  3. Experiment and include additional improvements to BERT by incorporating some of the key innovations in fast.ai such as learning rate finder, freezing model layers and more.
  4. Add capability for automatic hyper-parameter tuning using AWS SageMaker

As mentioned earlier, this is an community driven initiative. Any help will be very much appreciated.

I would love to hear back from all. Also please feel free to contact me using LinkedIn or Twitter.


References

  • The original BERT paper.
  • Open-sourced TensorFlow BERT implementation with pre-trained weights on github
  • PyTorch implementation of BERT by HuggingFace — The one that this library is based on.
  • Highly recommended course.fast.ai. I have learned a lot about deep learning and transfer learning for natural language processing by following fast.ai.

Primary Sidebar

Recent Posts

  • Conversational AI and Customer Service
  • Customer self-service is hard too
  • Customer Service is hard
  • We need to talk about search
  • Re-think your metrics
  • Covid-19 and NLP
  • Can NLP enhance RPA?
  • We love messaging
  • Multi-label Text Classification using BERT – The Mighty Transformer
  • Train and Deploy the Mighty BERT based NLP models using FastBert and Amazon SageMaker

About This Site

Jump onboard with us as Natural Language Processing takes off

Copyright © 2023 UTTERWORKS LTD Company no: 12186421 Registered in England and Wales · Privacy