The group feature allows multiple NL APIs to be grouped together and perform inference in parallel on the same input text. For example the text could be simultaneously classified by multiple text classification models and have key entities recognised by one or more entity recognition token classifiers.

You might want to group inference calls together and run them in parallel on the same piece of text for several reasons. First, running inference calls in parallel can improve the speed and efficiency of the inference process. By splitting the inference workload across multiple parallel processes, you can take advantage of multiple CPU cores or other hardware resources to perform the computations more quickly. This can be particularly useful when working with large or complex models, or when performing inference on a large dataset.
Second, running inference calls in parallel can also help you improve the accuracy and consistency of the inference process. When working with multiple models or inference algorithms, it can be challenging to ensure that they are applied consistently to each piece of data. By running the inference calls in parallel, you can ensure that each model or algorithm is applied to the same input data, which can help you achieve more consistent and accurate results.
Third, running inference calls in parallel can also provide more flexibility and adaptability for your inference pipeline. By grouping the inference calls together, you can easily swap out different models or algorithms without having to rewrite the entire pipeline. This can make it easier to experiment with different approaches and find the best solution for your specific use case.
Case Study Application
Working with a large utility, Utterworks developed several text classification and entity recognition models for use across the utilities customer service channels. Initially we created and intent model to recognise the reason for a customer’s contact, and a model to recognise a customer’s expression of dissatisfaction. Calling both these models in parallel allowed for effecient intent prediction and the opportunity to prioritise contacts where the customer had a complaint. We subsequently added further models to be called simultaneously, one recognising customer vulnerability and another recognising references to the recently introduced government Energy Bills Support Scheme (this model was introduced in one day and improved the accuracy of deflection of customer enquiry to some specific content created to answer customer questions).