Novel input and Safety-Critical failures in ML / AI / DL

Machine Learning (ML), Artificial Intelligence (AI), and Deep Learning (DL) have moved from the research end of embedded systems development into real-world applications. These methodologies are in use to support the solution to problems that traditional control algorithms either cannot solve or solve at a high computational cost. Adopting a new technology means understanding the advantages and disadvantages of the technology. Today I want to talk about the novel input problem and how it can result in critical system faults.

Name that tune

ML/AI/DL systems have made huge strides in their ability to classify inputs into reasoned guesses. Recently Google announced a “name that tune” feature that allows users to provide a snippet and google will try to name that tune. In my 5 test cases it got 4 out of 5 right. The one it got wrong was John Cage’s 4’33”. It got it wrong because it is a song with no sound, something that falls outside all of the classification categories.

Conceptually understanding a black box

ML/AI/DL systems are notoriously black box systems, e.g. what goes on inside is not known or in a functional sense, knowable. But, there is a way of thinking of them that can help with the design of the complete controls system. ML/AI/DL systems can be thought of as “self taught” expert systems.

They “ask” themselves a set of questions, and for each question they guide down to a set of probabilities to the final “answer.”

The ideal system works like this:

I know what this is: It is a hot dog!
I know the type of thing it is: It is a type of sausage
I know the general class of things: it is food
I don’t know what it is: but I think it is safe
I don’t know what it is: it may not be safe

Depending on the type of system you are designing you may need to reject the results for anything that comes back 3 or lower in the list.

I don’t know: it may not be safe

Crying wolf, that is to say alerting when there is not a good reason to alert, is the curse of all ML/AI/DL systems. At the same time, as ML/AI/DL systems are often used in safety critical systems they need to have a better safe then sorry approach to input response. The first answer? A graduated response.

In most cases the warning can be responded to in stages. First “slow down” the rate of the event happening; use that time to re-evaluate the alert. If the conditions continue to merit, then “stop.”

The second approach? Stop. If you have a safety critical system, “fail safe” should always be prioritized over continued operation. Ideally the data gathered can be used as part of a DevOps workflow so that in the future the “correct” approach can be followed.