At Google’s developer conference in mid-May, they made lots of announcements, but it was one demo that got everyone talking. This is the one where a new AI system, that they have dubbed Duplex, makes calls to restaurants and hairdressers to make bookings on behalf of its human masters. The apparent ‘magic’ in the demo is the way Duplex uses intonation and speech disfluencies (such as ‘um’ and ‘ah’) and is able to cope with a very natural conversation style (the person taking the booking doesn’t know they are talking to an AI). As Google say on their blog article about Duplex: “When people talk to each other, they use more complex sentences than when talking to computers. They often correct themselves mid-sentence, are more verbose than necessary, or omit words and rely on context instead; they also express a wide range of intents, sometimes in the same sentence.” All of that makes it very tricky for an AI to deal with, but, based on the demo, Google seem to have cracked it.
It is worth looking into the technology a little deeper, because, as good, cynical citizens, we shouldn’t be taking these sorts of demonstrations at face value. The system, like many AI solutions, uses a number of different AI capabilities to achieve its results: the inputs (the words spoken by, for example, the person at the restaurant taking the booking) are initially processed by an Automatic Speech Recognition engine to turn those sounds into words. The words are combined with other information such as context and are fed into a Machine Learning model called a Recurrent Neural Network (RNN). This RNN has been trained on a large number of anonymised telephone conversations around each specific task so that it is able to ‘understand’ the intent of the words and generate an appropriate response. A Text To Speech engine then ‘reads aloud’ the response. It is important to note that the Duplex system is very specific to its trained task – there will be one RNN for making restaurant bookings, and another to make hair appointments, for example. Some of the training is shared across the different use cases, but much of it has to be specific to the context to work effectively.
So, what are the benefits of all of this? The main use case is aimed at service businesses that do not have online booking capabilities (these will generally be smaller establishments). It allows customers to make bookings ‘asynchronously’ i.e. the customer can request the booking from their home assistant when the business is shut and the booking will then be made in the background once it is open. It is also useful for making bookings in a foreign language or if you are hearing-impaired. We also think it could be used (or abused, depending on your viewpoint) for ‘data scraping’ – the system could call up hundreds or thousands of businesses to find, for example, their opening hours which could then be collated in a marketable database.
It should be obvious to anyone reading this that the above benefits are pretty limited, especially when compared to the amount of hype that was generated by the demo. Most restaurants, in Europe at least, have some sort of online booking capability, even if it is through a third-party service. And calling up your hairdresser is hardly an arduous task, especially if you have to instruct your home assistant to do it in the first place. And what if the salon doesn’t have a slot available? The back-and-forth between you, the home assistant and the salon has suddenly become much more complicated.
The constraint around the specificity of the use case also limits the benefits. Google will have to train and create a new model for each case, but, like many current AI systems, users will quickly find those limitations and may become frustrated. As Google freely admit, “it cannot carry out general conversations”, but this may be the actual expectation of many users, especially if the hype continues at the current pace.
There is also a deeper problem around transparency. Many people would feel uneasy if they realised they had been talking to a machine rather than a human. Other people have reacted by calling it outright “deliberate deception”. Google have responded to this already and have promised to warn people that are called that they are speaking to a machine. But then that might mean that people dumb down their language, which rather defeats the object of the whole thing.
We should also remember the hype that came with the launch of Google’s Pixel Buds, which generally failed to live up to expectations, and, of course, Google Glasses, which never even made it into mass production. But these sorts of demos, pushing the boundaries of what AI is capable of, do play and important role in the overall development and advancement of the technology. And if it does stimulate debate around transparency, then that’s much better to do now than when the machines are really running our lives for us.