Contact Center Conversational AI – When Good Isn’t Good Enough

The topic of ‘accuracy’ has for a long time been a key discussion in the call and contact center speech recognition world. The prevailing logic being that the more of your in-bound calls that can be handled by software, the overall increase in speed and accuracy with which you’ll be handling calls from customers. This results in cost savings, happier customers, and human call center agents being able to handle the more complex and demanding interactions. But what classifies as good, and what is good enough for this technology? And perhaps more importantly, how is it achievable?

The past five years have seen huge leaps forward in the quality of speech recognition technology, with an ever-growing proliferation of voice/computer interfaces for phones, cars and smart home assistants. Similarly, in the call and contact center environment, AI voice agents have significantly narrowed the gap between them and their human counterparts – but there is still room for improvement. How do call and contact centers ensure they are deploying the best possible software for the job, and what is the key to delivering the highest levels of accuracy in this scenario?

It is widely accepted that while the supposed holy grail of 100% voice accuracy is realistically unobtainable, levels of around 85% are typically seen as the target benchmark for acceptability. While the industry has now achieved this level, the ability to improve beyond it is both incremental and time consuming. As with many software challenges and AI development, the 80/20 rule applies; while the effort in reaching 80% accuracy is significant, it exponentially increases in difficulty for every percentage point beyond that level.

Specialists over Generalists

While there are a number of generic speech recognition engines which can be bought off the shelf, they do not deliver the same level of quality and accuracy as solutions which are tailored for specific industries and scenarios. While commercially available products may be suitable for use in the home to switch off lights, play music, or deliver the news, in more challenging and complex environments they are simply not up to the task. Specifically, for the call center market in which Verbio operates, it’s fair to say that not all speech recognition engines are born equal.

The 80% accuracy offered by generic systems may be considered adequate for some applications, such as those with home voice assistants, where users are likely to be asking a broad range of questions and where the interactions are relatively simple, rather than complex conversations. However, this level is simply not good enough in a commercial call center environment, where any reduction in accuracy will result in significantly reduced ROI and customer satisfaction.

The key lies in adopting specialized solutions which are specifically tailored to the unique environment of the call center, which presents a number of specific challenges. These include:


· Acoustics – conversations taking place using a telephone bring with them lots of technical implications in terms of the background noise and quality.


· Conversational style – people interacting with a call center may well be stressed, shouting and using repetition and conversational language, speaking fast and interrupting – all of which can represent a significant challenge for transcription solutions.


· Silence – invariably there will be times where the caller has to find some specific information, such as a customer ID number, a bank card or a receipt, prompting a pause in the conversation. How do you transcribe the sound of silence? Understanding when silence is likely to occur is vital to avoiding dropped calls or unnecessary repeated questions which can risk amplifying a customer’s frustration.


· Technical/bespoke language – brand and product names or technical language related to specific domains, such as financial services, can all cause confusion and inaccuracy unless the customization has been carried out and the dataset of words built upon years of specific domain experience.

At Verbio, our solution is specifically tailored to this environment and has been built from the ground-up to mitigate these various difficulties. The two keys in delivering this lie in our engine and the data with which we have used to train our model.

We are currently on the 4th generation of our engine which has been honed and developed through more than 20 years of market experience. It has evolved over time through the use of Machine Learning, producing a better engine each time which is continually being enhanced with improvements to not only the underlying technology and algorithms, but also through the use of more market specific data which we have worked with our customers to collate.

But this is still only part of the equation. The reality is that if you only have a state-of-the-art engine with no data to learn from you have nothing. In addition, we also work with our customers to tailor our solution to their requirements prior to deployment, ensuring that the engine is familiar with a range of industry and customer specific terms and phrases – further enhancing its performance. The result is an unparalleled accuracy for this environment, delivering the best ROI for our customers and a great experience for their customers who are interacting with their call centers.

To learn more about the accuracy of our solution and how we can help improve the efficiency of your call and contact center, get in touch with us at marketing@verbio.com


Have any questions about this post? We’re here to help