• Skip to main content
  • Skip to primary sidebar

utterworks

  • Home
  • Solutions
    • Natural Language Assessment
    • Fluent One – Natural Language Platform
    • Fluent Find – Natural Language Search
    • Fluent Converse – Conversational AI
  • About Us
    • About Utterworks
    • Meet the Directors
    • Contact Us
  • Blogs

Natural Language Processing

19/05/2020

Multi-label Text Classification using BERT – The Mighty Transformer

The past year has ushered in an exciting age for Natural Language Processing using deep neural networks. Research in the field of using pre-trained models have resulted in massive leap in state-of-the-art results for many of the NLP tasks, such as text classification, natural language inference and question-answering.

Some of the key milestones have been ELMo, ULMFiTand OpenAI Transformer. All these approaches allow us to pre-train an unsupervised language model on large corpus of data such as all wikipedia articles, and then fine-tune these pre-trained models on downstream tasks.

Perhaps the most exciting event of the year in this area has been the release of BERT, a multilingual transformer based model that has achieved state-of-the-art results on various NLP tasks. BERT is a bidirectional model that is based on the transformer architecture, it replaces the sequential nature of RNN (LSTM & GRU) with a much faster Attention-based approach. The model is also pre-trained on two unsupervised tasks, masked language modeling and next sentence prediction. This allows us to use a pre-trained BERT model by fine-tuning the same on downstream specific tasks such as sentiment classification, intent detection, question answering and more.

Okay, so what’s this about?

In this article, we will focus on application of BERT to the problem of multi-label text classification. Traditional classification task assumes that each document is assigned to one and only on class i.e. label. This is sometimes termed as multi-class classification or sometimes if the number of classes are 2, binary classification.

On other hand, multi-label classification assumes that a document can simultaneously and independently assigned to multiple labels or classes. Multi-label classification has many real world applications such as categorising businesses or assigning multiple genres to a movie. In the world of customer service, this technique can be used to identify multiple intents for a customer’s email.

We will use Kaggle’s Toxic Comment Classification Challenge to benchmark BERT’s performance for the multi-label text classification. In this competition we will try to build a model that will be able to determine different types of toxicity in a given text snippet. The types of toxicity i.e. toxic, severe toxic, obscene, threat, insult and identity hate will be the target labels for our model.

Where do we start?

Google Research recently open-sourced the tensorflow implementation of BERT and also released the following pre-trained models:

  1. BERT-Base, Uncased: 12-layer, 768-hidden, 12-heads, 110M parameters
  2. BERT-Large, Uncased: 24-layer, 1024-hidden, 16-heads, 340M parameters
  3. BERT-Base, Cased: 12-layer, 768-hidden, 12-heads , 110M parameters
  4. BERT-Large, Cased: 24-layer, 1024-hidden, 16-heads, 340M parameters
  5. BERT-Base, Multilingual Cased (New, recommended): 104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters
  6. BERT-Base, Chinese: Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, 110M parameters

We will use the smaller Bert-Base, uncased model for this task. The Bert-Base model has 12 attention layers and all text will be converted to lowercase by the tokeniser. We are running this on an AWS p3.8xlarge EC2 instance which translates to 4 Tesla V100 GPUs with total 64 GB GPU memory.

I personally prefer using PyTorch over TensorFlow, so we will use excellent PyTorch port of BERT from HuggingFace available at https://github.com/huggingface/pytorch-pretrained-BERT. We have converted the pre-trained TensorFlow checkpoints to PyTorch weights using the script provided within HuggingFace’s repo.

Our implementation is heavily inspired from the run_classifier example provided in the original implementation of BERT.

Data representation

The data will be represented by class InputExample.

  • text_a: text comment
  • text_b: Not used
  • labels: List of labels for the comment from the training data (will be empty for test data for obvious reasons)

We will convert the InputExample to the feature that is understood by BERT. The feature will be represented by class InputFeatures.

  • input_ids: list of numerical ids for the tokenised text
  • input_mask: will be set to 1 for real tokens and 0 for the padding tokens
  • segment_ids: for our case, this will be set to the list of ones
  • label_ids: one-hot encoded labels for the text

Tokenisation

BERT-Base, uncased uses a vocabulary of 30,522 words. The processes of tokenisation involves splitting the input text into list of tokens that are available in the vocabulary. In order to deal with the words not available in the vocabulary, BERT uses a technique called BPE based WordPiece tokenisation. In this approach an out of vocabulary word is progressively split into subwords and the word is then represented by a group of subwords. Since the subwords are part of the vocabulary, we have learned representations an context for these subwords and the context of the word is simply the combination of the context of the subwords. For more details regarding this approach please refer Neural Machine Translation of Rare Words with Subword Unitshttps://arxiv.org/pdf/1508.07909.

P.S. This in my opinion is as important a breakthrough as BERT itself.

Model Architecture

We will adapt BertForSequenceClassification class to cater for multi-label classification.

We will adapt BertForSequenceClassification class to cater for multi-label classification.

The primary change here is the usage of Binary cross-entropy with logits (BCEWithLogitsLoss) loss function instead of vanilla cross-entropy loss (CrossEntropyLoss) that is used for multiclass classification. Binary cross-entropy loss allows our model to assign independent probabilities to the labels.

The model summary is shows the layers of the model alongwith their dimensions.

  1. BertEmbeddings: Input embedding layer
  2. BertEncoder: The 12 BERT attention layers
  3. Classifier: Our multi-label classifier with out_features=6, each corresponding to our 6 labels

Training

The training loop is identical to the one provided in the original BERT implementation in run_classifier.py. We trained the model for 4 epochs with batch size of 32 and sequence length as 512, i.e. the maximum possible for the pre-trained models. The learning rate was kept to 3e-5, as recommended in the original paper.

We had the opportunity to use multiple GPUs. so we wrapped the Pytorch model inside DataParallel module. This allows us to spread our training job across all the available GPUs.

We did not use half precision FP16 technique as for some reason, binary crosss entropy with logits loss function did not support FP16 processing. This doesn’t really affect the end result, it simply takes a bit longer to train.

Evaluation Metrics

We adapted the accuracy metric function to include a threshold, which is set to 0.5 as default.

For multi-label classification, a far more important metric is the ROC-AUC curve. This is also the evaluation metric for the Kaggle competition. We calculate ROC-AUC for each label separately. We also use micro-averaging on top of individual labels’ roc-auc scores.

I would recommend reading this excellent blog to get a deeper insight on the roc-auc curve.

Evaluation Scores

We ran a few experiments with a few variations but more of less got similar results. The outcome is as listed below:

Training Loss: 0.022, Validation Loss: 0.018, Validation Accuracy: 99.31%

ROC-AUC scores for the individual labels:

toxic: 0.9988

severe-toxic: 0.9935

obscene: 0.9988

threat: 0.9989

insult: 0.9975

identity_hate: 0.9988

Micro ROC-AUC: 0.9987

The result seems to be quite encouraging as we seems to have created a near perfect model for detecting toxicity of a text comment. Now lets see how we score against the Kaggle leaderboard.

Kaggle result

We ran inference logic on the test dataset provided by Kaggle and submitted the results to the competition. The following was the outcome:

We scored 0.9863 roc-auc which landed us within top 10% of the competition. To put this result into perspective, this Kaggle competition had a price money of $35000 and the 1st prize winning score is 0.9885.

The top scores are achieved by teams of dedicated and highly skilled data scientists and practitioners. They use various techniques as such ensembling, data augmentation and test-time augmentation in addition to what we have done so far.

Conclusion and Next Steps

We have tried to implement the multi-label classification model using the almighty BERT pre-trained model. As we have shown the outcome is really state-of-the-art on a well-known published dataset. We were able to build a world class model that can be used in production for various industries, especially in customer service.

For us, the next step will be to fine tune the pre-trained language models by using the text corpus of the downstream task using the masked language model and next sentence prediction tasks. This will be an unsupervised task and hopefully will allow the model to learn some of our custom context and terminologies. This is similar technique used by ULMFiT. I will share The outcome in another blog so do watch out for it.

I have shared most of the code for this implementation in the code gist. However I will merge my changes back to HuggingFace’s github repo.

I would encourage you all to implement this technique on your own custom datasets and would love to hear some stories.

I would love to hear back from all. Also please feel free to contact me using LinkedIn or Twitter.


Update

I have made available the jupyter notebook for this article. Note that this is an interim option and this work will be merged into HuggingFace’s awesome pytorch repo for BERT.Jupyter Notebook ViewerCheck out this Jupyter notebook!nbviewer.jupyter.orgkaushaltrivedi/bert-toxic-comments-multilabelMultilabel classification for Toxic comments challenge using Bert – kaushaltrivedi/bert-toxic-comments-multilabelgithub.com


References

  • The original BERT paper.
  • Open-sourced TensorFlow BERT implementation with pre-trained weights on github
  • PyTorch implementation of BERT by HuggingFace – The one that this blog is based on.
  • Highly recommended course.fast.ai. I have learned a lot about deep learning and transfer learning for natural language processing by following fast.ai.

12/09/2019

Train and Deploy the Mighty BERT based NLP models using FastBert and Amazon SageMaker

FastBert — The story so far…

In my earlier introduction to FastBert, I described it as a library that will allow developers and data scientists to train and deploy BERT based models for NLP tasks beginning with Text Classification. The scope of BERT (read Transformers) based models have widened a bit since I wrote my earlier blog and includes BERT, XLNet, RoBERTa, DistilBERT and a few more.

I am happy to report that with lots of support from Hugging Face, FastBert now supports all the above mentioned model architectures and with a couple of changes in input parameters, you can try out all the above model architectures on your custom datasets. With the current pace of research in the area of Transformer based models, I expect the model architectures to grow rapidly in coming days/weeks/months and I hope to support all or most of them.

BERT meets Amazon SageMaker

One of the key necessities in training BERT based models is access to GPUs, the more the better. I personally have been fortunate to have access to multiple GPUs in order to experiment with different Transformer architecures and parameters but I am sure it is one of the major issues for the research and developer community. A single GPU AWS p3.2xlarge EC2 instance will cost about $80 a day and a multi-gpu AWS p3.8xlarge EC2 instance will set you back by $320 a day. One has to be incredibly disciplined to switch off the virtual machines when not in use in order not to get a shock bill. Another issue with using a virtual machine approach is that you will limited scope of testing out different hyper-parameters or BERT architectures in parallel as you are limited by the number of GPUs available in each virtual machine.

What about Inference?

Training is just half of the job. Once the model is trained to your satisfaction, you would like to have a simple way to deploy the trained model in a highly scalable, available and secure environment with a REST API endpoint. Developers and data scientists would agree with me that this step is generally ignored by most academic researchers, however in the industry this step is what counts for the most.

Amazon SageMaker

In Amazon’s own words:

Amazon SageMaker provides every developer and data scientist with the ability to build, train, and deploy machine learning models quickly. Amazon SageMaker is a fully-managed service that covers the entire machine learning workflow to label and prepare your data, choose an algorithm, train the model, tune and optimize it for deployment, make predictions, and take action. Your models get to production faster with much less effort and lower cost.

and I must say that I tend to agree for most part.

FastBert includes the support for training BERT models on Amazon SageMaker. With FastBert on SageMaker, you only pay for the time (in seconds) your experiment is actually executing the training loop. Once the training epochs are complete, the training resources are automatically released and your trained model artefacts are securely stored in the S3 bucket of your AWS account, ready to be deployed as a RESTFul endpoint.

In this blog, I will describe how to train and deploy BERT based models using FastBert on Amazon SageMaker.

The AWS components used here are:

EC2 Container Repository (ECR) Image

In order for us to use FastBert with SageMaker we will have to pack together the library, training code and pretrained weights as a Docker image stored in AWS EC2 Container Repository(ECR). We will be using the same image to hold both the training and inference code for FastBert.

S3 Bucket

S3 bucket holds the training and validation data and other config files. The data in S3 bucket can be encrypted using AWS KMS.

S3 bucket also holds the output of the training job which will be the trained model artefacts, log files and tensorboard output.

SageMaker Training Job

To train a model in SageMaker, you will need to create a training job. The training job includes the following:

  1. Reference to S3 bucket training location (input bucket)
  2. Reference to the S3 bucket to store trained model artefacts (output bucket)
  3. Reference to the AWS ECR image that holds our FastBert library and training code

The training job will also be passed the ML compute resources, i.e. the type in instance used for the training job (p3.2xlarge, p3.8xlarge, etc). The compute resoures are managed by SageMaker. The training job also gets the defined model hyperparameters.

After you create the training job, Amazon SageMaker launches the ML compute instances and uses the training code and the training dataset to train the model. It saves the resulting model artifacts and other output in the S3 bucket you specified for that purpose.

SageMaker Endpoint

SageMaker provides the model hosting service to deploy the trained model and provides an HTTPS endpoint to provide inferences. The SageMaker training job creates a trained model that allows us to create a so-called SageMaker model. By creating a model, you tell Amazon SageMaker where it can find the model components. This includes the S3 path where the model artifacts are stored and the Docker registry path for the image that contains the inference code.

When hosting models in production, you can configure the endpoint to elastically scale the deployed ML compute instances. For each production variant, you specify the number of ML compute instances that you want to deploy. When you specify two or more instances, Amazon SageMaker launches them in multiple Availability Zones. This ensures continuous availability. Amazon SageMaker manages deploying the instances.

How does this work?

Prerequisites

  1. Install Docker on your computer.
  2. Create an AWS Account.
  3. Install and configure AWS CLI on your computer.

Create the FastBert ECR image

In order to use BERT based transformer model architectures using fast-bert, we need to provide the custom algorithm code to SageMaker. This is done in the shape of a docker image stored in Amazon Elastic Container Registry (ECR). The image is created using DockerFile contained in the fast-bert repository.

  1. Clone the fast-bert repository on your local machine using git clone https://github.com/kaushaltrivedi/fast-bert.git
  2. Navigate to the container folder of the fast-bert repository.
  3. Run the script build_and_push.sh. On successful execution of the script, you will have a docker image named sagemaker-bert in your AWS account. This script will also prepackage some of the most used pre-trained weights in the docker image. This is particularly useful if you decide to run SageMaker training jobs in a network isolation mode or withing a VPC without any internet gateway. Feel free to update this script for your own purpose.

This docker image can be used to train and deploy any number of models that are supported by the fast-bert library. At this point you can use the AWS Console to create a training job. But I have created a “helper” Jupyter notebook for uploading data and config files to S3 bucket, creating a training job, and then deploying the model as a SageMaker endpoint.

Note that this Sagemaker notebook doesn’t need any GPUs. This can also be executed on your local machine or a low-cost virtualmachine. The training and inference will be delegated to the managed Amazon SageMaker instance.

SageMaker Helper Notebook

Import the necessary libraries.

Setup the paths for your local data locations. The data and label files must already be stored in the DATA_PATH location. We will be creating the training_config.json file shortly.

Hyper-parameters and Training configuration

I have split the parameters required by SageMaker into Hyper-parameters and general configuration parameters. Hyper-parameters are passed directly to SageMaker training job and can be tuned to optimise model.

The general parameters that cannot be tuned by SageMaker are stored in training_config.json and provided to SageMaker through the S3 bucket.

These are the parameters that are either used by data-bunch or learner objects. This particular example is for the multi-label scenario and hence the label_col list is serialised as a string. I hope to improve this in the future. In case of a multi-class text classification, label_col will just be the name for the label column.

As you would notice we also save the training_config object in a file at CONFIG_PATH location.

Upload data and config to S3 bucket

Create an estimator object and start training

At this point SageMaker will create the training instance using the Docker image that you have provided. It will then download the data and config files from S3 bucket to the SageMaker instance and start the training job.

The fit function calls the Amazon SageMaker CreateTrainingJob API to start model training. The function uses configuration you provided to create the estimator and the specified input training data to send the CreatingTrainingJob request to Amazon SageMaker.

You should see the logs similar to the following which keeps you informed on the status of training job. The logs are displayed in the notebook and they are also available in AWS CloudWatch logs for future reference.

2019-08-27 10:15:06 Starting - Starting the training job...
2019-08-27 10:15:08 Starting - Launching requested ML instances......
2019-08-27 10:16:08 Starting - Preparing the instances for training...
2019-08-27 10:17:05 Downloading - Downloading input data...
2019-08-27 10:17:11 Training - Downloading the training image............
2019-08-27 10:19:19 Training - Training image download completed. Training in progress.

You can also see the training job details in AWS console.

Once the training job is complete, the trained model and all the accompanying files such as config file, tokenizer vocabulary and labels.csv are zipped and stored in the S3 bucket specified in the estimator object’s output_path parameter.

You can call the deploy() method to host the model using the Amazon SageMaker hosting services.

Voila!!! You now have an active model endpoint that you can invoke to get real-time inference. You can use AWS SDK for all the major supported platforms and call the InvokeEndpoint API to get the inference.

As you see from the example above, we have used different types of instances for training and hosting. For training we use an instance with multiple GPUs. However for hosting the model to get the inference, you can use a cheaper instance such as m5.large which is optimised for general compute but does not contain any expensive GPUs.

The complete notebook is available in the fast-bert github repo at:Notebook on nbviewerCheck out this Jupyter notebook!nbviewer.jupyter.org


Conclusion and next steps

Hopefully this story will help you leverage the power of Amazon SageMaker to train and deploy BERT based models on your own data using the fast-bert library.

Amazon SageMaker abstracts away the complexities related to maintaining secure and expensive GPU-powered virtual machines for training phase and also simplifies the process of deploying the model to production.

You will be able to customise most of the fast-bert parameters through the use of hyper-parameters and training config file and at the same time build sophisticated training and hosting production workflows.

Some of the next steps would be to use additional SageMaker features such as hyper-parameter tuning, elastic inference, batch inference and more.

I would love to hear your suggestions on further improvements and also welcome your code contribution to the fast-bert github repo.

References

  • The original BERT paper.
  • The fast-bert library.
  • PyTorch implementation of BERT by Hugging Face
  • Highly recommended course.fast.ai. I have learned a lot about deep learning and transfer learning for natural language processing by following fast.ai.

17/05/2019

Introducing FastBert — A simple Deep Learning library for BERT Models

BERT What?

The little Sesame Street muppet has taken the world of Natural Language Processing by storm and the storm is picking up speed. We have seen a number of NLP problems solved by neural network architectures built on top of contextual representations of BERT. To name a few BERT based models have pushed the state of the art for SQUAD 2.0 question answering, GLUE multi task learning, Google natural questions task and Biomedical domain specific tasks — BioBERT.

Google research open sourced the TensorFlow implementation for BERT along with the pretrained weights. This opened the door for the amazing developers at Hugging Face who built the PyTorch port for BERT. With this library, geniuses i.e. developers and data scientists can use BERT models for text classification, question answering, fine tuning language model and more. Yours truly has contributed to the text classification capability by adding the feature for multi-label text classification.

Enter FastBert

FastBert is the deep learning library that allows developers and data scientists to train and deploy BERT based models for natural language processing tasks beginning with Text Classification. The work on FastBert is inspired by fast.ai and strives to make the cutting edge deep learning technologies accessible for the vast community of machine learning practitioners.

With FastBert, you will be able to:

  1. Train (more precisely fine-tune) BERT text classification models on your custom dataset
  2. Tune model hyper-parameters such as epochs, learning rate, batch size, optimiser schedule and more
  3. Save and deploy trained model for inference (including on AWS Sagemaker)

Starting today, FastBert will support both multi-class and multi-label text classification and in due course, it will support other NLU tasks such as Named Entity Recognition, Question Answering and Custom Corpus fine-tuning. I rely on the community to help make this happen 🙂

Installation

pip install fast-bert

From Source: pip install git+https://github.com/kaushaltrivedi/fast-bert.git


Usage

Import the required packages. Please note that I have not included the usual suspects such as os, pandas, etc.

Define general parameters and path locations for data, labels and pretrained models. (some good engineering practices)

Tokenizer

Create a tokenizer object. The is the BPE based WordPiece tokenizer and is available from the magnificient Hugging Face BERT PyTorch library.

The do_lower_case parameter depends on the version of the BERT pretrained model you have used. In case you use uncased models, set this value to true, else set it to false. For this example we have use the BERT base uncased model and hence do_lower_case parameter is set to true.

GPU & Device

Training a BERT model does require a single or more preferably multiple GPUs. In this step we can setup GPU parameters for our training.

Note that in the future releases, this step will be abstracted from the user and the library will automatically determine the correct device profile.

BertDataBunch

This is an excellent idea borrowed from fast.ai library. The databunch object takes training, validation and test csv files and converts the data into internal representation for BERT. The object also instantiates the correct data-loaders based on device profile and batch_size and max_sequence_length.

The DataBunch object provides the location to the data files and the label.csv file. For each of the data files, i.e. train.csv, val.csv and/or test.csv, the databunch creates a dataloader object by converting the csv data into BERT-specific input objects. I would encourage you to explore the structure of the databunch object using Jupyter notebook.

BertLearner

Another concept in line with the fast.ai library, BertLearner is the ‘learner’ object that holds everything together. It encapsulates the key logic for the lifecycle of the model such as training, validation and inference.

The learner object will take the databunch created earlier as as input alongwith some of the other parameters such as location for one of the pretrained BERT models, FP16 training, multi_gpu and multi_label options.

The learner class contains the logic for training loop, validation loop, optimiser strategies and key metrics calculation. This help the developers focus on their custom use-cases without worrying about these repetitive activities.

At the same time the learner object is flexible enough to be customised either via using flexible parameters or by creating a subclass of BertLearner and redefining relevant methods.

The learner object does the following upon initiation:

  1. Creates a PyTorch BERT model and initialises the same with provided pre-trained weights. Based on the multi_label parameter, the model class will be BertForSequenceClassification or BertForMultiLabelSequenceClassification.
  2. Assigns the model to the right device, i.e. CUDA based GPU or CPU. if Nvidia Apex is available, the distributed processing functions of Apex will be utilised.

fast-bert provides a bunch of metrics. for multi-class classification, you will generally use accuracy whereas for multi-label classification, you should consider using accuracy_thresh and/or roc_auc.

Train the model

Start the model training by calling fit method on the learner object. the method takes epoch, learning rate and optimiser schedule_type as input. Following schedule types are supported (again courtesy of the Hugging Face Bert library):

  • none: always returns learning rate 1.
  • warmup_constant: Linearly increases learning rate from 0 to 1 over warmup fraction of training steps. Keeps learning rate equal to 1. after warmup.
  • warmup_linear: Linearly increases learning rate from 0 to 1 over warmup fraction of training steps. Linearly decreases learning rate from 1. to 0. over remaining 1 - warmup steps.
  • warmup_cosine: Linearly increases learning rate from 0 to 1 over warmup fraction of training steps. Decreases learning rate from 1. to 0. over remaining 1 - warmup steps following a cosine curve. If cycles(default=0.5) is different from default, learning rate follows cosine function after warmup.
  • warmup_cosine_hard_restarts: Linearly increases learning rate from 0 to 1 over warmup fraction of training steps. If cycles (default=1.) is different from default, learning rate follows cycles times a cosine decaying learning rate (with hard restarts).
  • warmup_cosine_warmup_restarts: All training progress is divided in cycles (default=1.) parts of equal length. Every part follows a schedule with the first warmup fraction of the training steps linearly increasing from 0. to 1., followed by a learning rate decreasing from 1. to 0. following a cosine curve. Note that the total number of all warmup steps over all cycles together is equal to warmup * cycles

On calling the fit method, the library will start printing the progress information on the logger object. It will print training and validation losses, and the metric that you have requested.

In order to repeat the experiment with different parameters, just create a new learner object and call fit method on the same. If you have tons of GPU compute, then you can possibly run multiple experiments in parallel by instantiating multiple databunch and learner objects at the same time.

Once you are happy with your experiments, call the save_and_reload method on learner object to persist the model on the file structure.


Model Inference

You have two options to get inference from the model.

Call predict_batch method on the learner object that contains the trained model.

Of course the above method is convenient if you already have a trained learner object in memory. If you have persistent trained model and just want to run inference logic on that trained model, use the second approach, i.e. the predictor object.

And thats how it works…The library repo contains a sample notebook to demonstrate the usage of the library.


Conclusion and next steps

Hopefully this library will help you build and deploy BERT based NLU models within minutes. In the next part, I will describe how to build your training workflow using fast-bert and deploy your trained model as an endpoint on AWS SageMaker. Watch this space

With respect to this library it is very much in early stages of development. I do have a few more ideas with respect to further development of the library. Some of them are:

  1. Add capability to pre-train a BERT language model for custom text corpus
  2. Add other NLU capabilities such as NER, question answering, and more.
  3. Experiment and include additional improvements to BERT by incorporating some of the key innovations in fast.ai such as learning rate finder, freezing model layers and more.
  4. Add capability for automatic hyper-parameter tuning using AWS SageMaker

As mentioned earlier, this is an community driven initiative. Any help will be very much appreciated.

I would love to hear back from all. Also please feel free to contact me using LinkedIn or Twitter.


References

  • The original BERT paper.
  • Open-sourced TensorFlow BERT implementation with pre-trained weights on github
  • PyTorch implementation of BERT by HuggingFace — The one that this library is based on.
  • Highly recommended course.fast.ai. I have learned a lot about deep learning and transfer learning for natural language processing by following fast.ai.

Primary Sidebar

Recent Posts

  • Conversational AI and Customer Service
  • Customer self-service is hard too
  • Customer Service is hard
  • We need to talk about search
  • Re-think your metrics
  • Covid-19 and NLP
  • Can NLP enhance RPA?
  • We love messaging
  • Multi-label Text Classification using BERT – The Mighty Transformer
  • Train and Deploy the Mighty BERT based NLP models using FastBert and Amazon SageMaker

About This Site

Jump onboard with us as Natural Language Processing takes off

Copyright © 2023 UTTERWORKS LTD Company no: 12186421 Registered in England and Wales · Privacy