Teaching Robots to See: RPA and Machine Vision

In a previous blog post, we talked about how you can teach your robots to read – by using Artificial Intelligence-powered Natural Language Processing they are able to ingest unstructured and semi-structured data such as emails and invoices. In this post we will talk about another sense we can give to our robots – that of sight.

Firstly, you might be asking why you would want software robots to be able to see anything at all. Isn’t that the domain of physical robots and driverless cars? Well, RPA can be really helpful working with ‘machine vision’ systems to do things like classify photographs (to look for potentially abusive material, for example), read hand-written text, recognise faces, or identify potentially diseased organs in X-Rays or MRI images. The AI will be doing the intelligent stuff (more of which shortly) but the RPA can help feed the images and then take appropriate action based on the outcome of the algorithm.

Let’s take a (non-technical) look at how the AI actually works, with a simple example of being able to recognise whether a picture is of a dog or not. This, and most machine-vision use cases, are examples of ‘supervised learning’, which means that the AI has been trained on sample images, all of which have been appropriately labelled (most usually by humans, although there are examples of AI labelling AI images). In supervised learning, the AI must learn what the characteristics and features are of a dog, such as the shape, the edges that define things like ears, colours, etc. from a large number of images it has been fed, all of which have been labelled ‘dog’. (Remember, though, that the AI doesn’t actually understand what a dog is, just what patterns of pixels in a photograph most represent the label ‘dog’). To be trained well, the AI will also need to be shown lots of pictures without dogs in, all of which have been labelled ‘not dog’.

In most cases, the AI will need thousands, if not tens of thousands, of labelled images if it is to be accurate enough. Just getting this volume of pictures can be a huge challenge for some systems, which is why the process is often crowd-sourced. Some complex cases, such as with medical images, may actually require millions of images to get the level of accuracy required. And this is made even more challenging because ideally the AI will need a balanced set of images – e.g. an equal number of diseased and non-diseased images – which is not always possible.

Once trained, the AI model can be embedded in an automated process. The leading RPA vendors, including UIPath, make this integration pretty seamless. The key, of course, is to ensure that the automation is appropriate for the task being carried out. For example, recognising whether a parcel has been damaged so that it can be processed for a claim, is straightforward and relatively low risk, but identifying carcinogenic shadows on X-Rays needs to be carefully managed. We don’t want the RPA to be automatically sending out emails to patients based on the output of the AI model. In that case we need to understand the potential levels of false-positives (the machine says there is cancer when there isn’t) and false-negatives (the machine says there isn’t cancer when there is – which can be much more damaging), and have humans in the loop to capture those results that could be borderline or distressing for patients.

Teaching your robots to see can therefore add valuable capability to the automation environment, if used correctly and appropriately. To find out more about how machine-vision could help your organisation, get in touch with us here at Roboyo.