One of the key metrics for test suite development is code coverage: MCDC, range and address coverage. There are two questions that need to be asked when considering coverage. First, is it mathematically and logically possible for the full range to be reached? Second, with your test cases are you covering the full range?
Dead logic
“Dead logic” references paths within your model that can never be reached, either because the “if/else conditions” are always true or always false, or because a while or for loop construct does not terminate. When dead logic is detected this should be the first priority in examination as this may be due to a design error.
You can get there, but do you?
Assuming you can demonstrate that the code is fully reachable the next step is to look at your code coverage with your requirement-based tests. Your requirements should dictate what the algorithm does, therefore if you are validating all of your requirements you should see 85% or greater code coverage. Note that there is always some “helper” code which may not be exercised by the requirements-based tests, hence the 85% floor.
If your coverage metric is lower than 85% you need to determine if you are
Not fully testing the requirements: Add in additional test cases to cover the missing parts
Have introduced additional requirements into the functionality: Either repartition the code or add additional requirements and requirement tests to the module.
The process of validating your control algorithm requires the creation of virtual realities, a reality that must be good enough to convince your controller that it is operating in the real world. How do you know if your simulation is good enough to “fool” the controller?
How do you know if you are living in a simulation?
Designing the simulated environment for your control algorithm starts by defining the operating conditions in which the program will run. You need to include all of the operating conditions that the system may encounter while excluding the unreachable domains. For example, with a wind turbine you would want to model wind gusts up to 45 mph but exclude higher rates as the turbine would be shut down for safety reasons for speeds above that.
The next question is accuracy; at one level of change in the input values is there a perceptible change in the real world effects? In some cases a 1-degree change in temperature is trivial (e.g., a combustion engine) while in others it is significant (a home heating and cooling system). Select the resolution to match the requirements.
Next up is consistency, if we go back to the temperature resolution think about the other environmental variables associated with it, e.g. pressure: PV = NRT. If the resolution on the Pressure does not match the resolution of the Temperature then there can negative impacts on the control algorithm.
Finally, update rate: how often are the simulated environment variables updated? The data associated with the simulated variables should be updated with a rate that is consistent with the behavior in the real world.
The condition
The one conditional here; if a general use simulation is being created then the design needs to take into account the operating needs of all the target systems. For the ease of testing, the cross controller development should be limited to insure the highest quality simulation and testing environment.
Small, smart habits in modeling add up; while a local efficiency may not seem like much, when compounded over the full design, they will result in interestingly large improvements.
Data types, bit field, enums & #defines
Selecting a data type for your variable directly impacts the amount of memory required by your program. In general you should select the smallest sized variable that covers the range of the variable. But that is just the first step in how to save memory.
When you have multiple bits of On/Off state data, packaging the data into a single variable will save memory. For example, if you have 5 On/Off variables you could either use 5 Booleans (5 * 8 = 40 bytes) or a single unsigned 8 bit integer.
The final tip has to do with constant values. If the data needs to be tuneable then the data needs to be declared as a parameter. However, for non-tunable parameters, #defines or enumerated data types should be used; they provide the same “non-literal” implementation while being inlined in the generated code.
Exit early
When creating if-elseif-else logic, order the “if-elseif-else” in order of likelihood of occurrence (e.g., put the most likely first). This prevents the need to do unneeded comparisons in n% of the time.
Additionally, consider inlining calculations into the if/elseif logic if they are only used by a subset of the comparison.
resOf = highlyComplexMatheMaticalFunction if (lightIsOn) elseif (lightIsOff) elseif (resOf) end
The “highlyComplexMatheMaticalFunction” in this example is only used in one place so the requirement to calculate it for all paths is wasted. In contrast, if you have a complex calculation that is used by multiple branches you should consider pre-calculating the results.
The final suggestion here is to provide “early exits.” If you have if/then else logic or for/while loops consider having a “break” command in the loop / logic if the desired results are reached prior to the completion of the loop.
Resample your data
Real-world data is messy, often sampled at inconsistent intervals and both over and under-sampled at some domain points. For table data, resample your data into uniform intervals that cover your full range while maintaining the required accuracy from the data. If sections of the data are “flat” consider increasing the intervals in that section while decreasing the intervals on sections with rapid changes.
Error detection is not a halting problem,(1) which is to say that we can halt before crashing.(2) The objective of error detection is to determine operating regimes where
Continued operation presents
a risk to the operator
a risk to the those in the environment
a risk to the device
When those conditions are detected the device should provide feedback to the user and/or migrate into a different operating mode.
How do you know when faults are possible?
There are three primary ways in which errors can be predicted:
Known “dangerous” operating conditions: (fever condition) In some cases the fault condition is known; when your temperature goes above X you have a fever and should get treatment. In the same fashion for most devices, there are known conditions where continued operations are a risk.
Interpolated error: (crystal ball) In some cases, the current conditions with forward prediction can predict that the device will at some near point in time enter into a known dangerous condition. In this case a combination of how far off is the condition and how long have you been trending towards the condition should inform when the alert is raised.
Regression/ML/AI error: (history book) A statistical approach to predictive maintenance and error detection can be taken e.g., when conditions that historically have led to a fault are detected, then the alert can be raised. This differs from the crystal ball in that the root cause may not be understood.
I know but(3)
How to respond
How you respond should depend on the fault severity and the timeline of the fault. The higher the severity of the fault the higher the response; the closer the fault is to “activating,” the faster the response.
Low Alert
The lowest level of alert is the diagnostic message. For a low impact, delayed incident, this is acceptable. However, the end user may ignore this so…
Medium Alert
For a mid-level response, the device may reduce the performance. For example, adaptive cruise control could apply braking if the vehicle in front is too close.
High Alert
At the final end of things (high alert), the device should deploy / respond / act in such a way to protect the users and the people in the environment.
Better late than never; better early than late
As the section above shows, as the response increases the recovery is more significant. By the time you hit the high alert, repairing the device is a greater expense than a simple “check engine” light.”(4) Use the three prediction methods and you(5) can prevent faults.
Footnotes
I was surprised that there are no good engineering cartoons on the halting problem; perhaps they haven’t been finished yet.
For many engineers with error detection and fault handling, they treat the perfect as the enemy of the good. It is always better to error out gracefully than to continue operation at risk.
I know this is a fault screen for the small backhoe of a BobCat, but I like to imagine this is a nature documentary following around a wildcat (i.e., some of you may remember the early ’90’s commercials featuring a live bobcat).
The migration from Adaptive cruise control to air bags is clear; if you brake, you don’t crash.
In this case the “you” is you and your whole engineering team.