Polar bears in a snowstorm discussing philosophy (e.g. black box testing for DL/ML systems)

A recent article announced “Deep Learning Machine Beats Humans in IQ Test.” It is an interesting read but it raises a question on many minds these last few years; how do you test a device that is an unbounded black box?⁽¹⁾

Testing DL/ML programs is a unique challenge. Unlike conventional software testing, the algorithm you are testing does not have a viewable code base (e.g. you can’t inspect for overflow, underflow, or pointers). So what do you test?

The first question is “what are you protecting against?” This is not the inverse of what you are trying to achieve. Let’s take a hypothetical case; you have developed a ML/DL system that controls a robot to give full body massages.⁽²⁾

The objective of the robot is to relive pain and tension in the body. What we are protecting against is damage to the body such as bruising or muscle tears.

Another way of thinking about this is the “law of unintended consequences.” Reading Asimov’s Robot stories you would see that the three laws were not enough to keep things from going off the rails. So what can we do?

Fencing the problem

In some cases the solution is simple; put an observer in place and fence off the dangerous areas. In our massaging robot example, the “fence” could include things such as

Maximum force
Maximum compression
Maximum speed

In some cases fencing isn’t a solution so how do we remain “en-guard”?

Black swans

There is no way to dance around this one; AI/ML/DL systems in the real world have to deal with real world problems. That is to say they need to “expect the unexpected.”⁽³⁾

AI/ML/DL systems work based on data models; any event that is outside of the data model is still responded to and interpreted as something that is known. Since it is not possible to know all the unknowns upfront the question becomes “are there meta-unknowns?” Can you come up with classes of things that you have not trained for, that you can give rules for how to respond?

Sticking with our massaging robot, someone with scar tissue from surgery may have a different underlying musculature layout. This is a different problem from someone who has a temporary strain that can be corrected.

Fail Safe

To “fail safe” we need to first know what “safe” is. Fail safe for an engine when you are parked would be to shut down whereas while driving it may be to reduce power. AI/ML/DL systems can employ scenario based fail safe approaches to determine what the correct action is when a black swan or fence condition occurs.

0.3048 Meter notes⁽⁴⁾

I’m describing this as an unbounded black box since the actual range of inputs cannot be defined as in a traditional system. For example, you could say a voltage signal goes from 0 to 42 volts, but how do you define the information encoded in an image when you don’t know what the system has determined to be salient features?
I’m not saying that sitting at a desk much of the day has made me interested in creating such a machine but…
For many years the “No one expects the Spanish Inquisition” was a fine joke; however given time, everyone expects the Spanish Inquisition and it is no longer an edge case.
1 foot is 0.3048 meters, hence these can be translated into “foot” notes.