State machines: At the C-level

Those of you who have read my blog for a while have seen my posts on the tool Stateflow by The MathWorks, such as the Tips for Readability and the Stateflow Best Practices document on which much of the MathWorks documentation is based. Today I wanted to do something different, dive into State machines at a basic level through a C code example

Starting Simple

We will start with a simple 3 State example, A, B & C, where A is the “Default State”, and there are valid transitions between all 3 states. Further we will treat the State machine function as a “Stateless” function, meaning that all data will be passed into the function.

First, the simplest representation of this in C for clarity of the example

The Statemachine function evaluates the held “current state” of the machine and, following an if / else if structure, flows down to the currently active state.

Once the active state is reached it first checks to see if this is the first time in the state (stateA.isFirst). If so, it performs “entry actions” and sets the entry time (in case there are temporal actions)

Next, it performs any actions associated with the active state, which could be a calculation or calling a sub-function.

Once all these actions are complete, the function checks for exits. If an exit condition exists, it updates the current state and last state variables.

The First Efficiency

The first optimisation is to think about “what is active most of the time. In the current example, we have only three top-level states, so a low number of evaluations are performed. But if at each level we had 10 states, then ordering the states in terms of the percentage time they are active makes sense.

E.g. if in our case

  • A : Active 20%
  • B : Active 70%
  • C : Active 10%

Then the If / Else If tree should be ordered B, A, C.

NOTE: The order of evaluation of the exit conditions should be set based on the precident of the exit, e.g. if two exits could be true at the same time the statemachine designer needs to determine which ‘goes first’

The Data and Children

In this first example we have just one level of states, no child states; this can be easily extended following our pattern.

First the data:

In our current implimentation a simple structure is used: For now we will expand this data structure with a small modification, in the next section we will consider memory optimizations.

Initial

The issue with this approach is fairly clear, while the pattern is easy to follow, you could end up with multiple “twoChild”, “threeChild”,…”nChild” structures following this pattern, further it does not support unique ‘local data’ constructs.

Updated

Child Patterns

The pattern used in the initial code is now extended to the child states

Cyclomatic complexity: Testing the Machine

A quick look a the code above shows how quickly the number of independent branches grow, leading to a high cyclomatic complexity score.

The solution is to limit the depth of the children: As a general rule of thumb, IF logic more then 3 levels deep quickly becomes difficult to maintain. This is done either by re-factoring the state logic into multiple machines, or creating sub-state machine functions that can be indepdendently validated.

Data Efficency versus Clarity

The final topic is on data efficency versus clarity; there are two issues with the current implimentation of the state structure

  1. Inactive data: Within any state structure only one “state and children” will be active at any time. This means that the data associated with all of the other states is not active and could be colapsed
  2. Data structure alignment: The “easy to read” version of the array structure can result in alignment issues, e.g. when you are switch between different sized variables

Solution 1: “Common Data”

There are 4 types of data in use

  1. Current State: This can be changed to a single array of size N, where N is the max depth of the state chart. This requires book keeping to know the “depth” of the child.
  2. Last State: Like current state this can be changed to a single array size N
  3. First time in: This can be changed to a uint32 and then using bit logic on the state number to determine if it is the first time in or not
  4. Local data: The maximum number of floats, doubles, integers used by the state machine can be determined and then a single instance is created.

With the exception of local data there is no impact on the clarity of the code. Because of this thelocal data should be fully commented.

Solution 2: Alignment:

When you follow the approach of #1 you can quickly perform the alignment tasks on the structure.

Safety is an Ecosystem: History Rhymes

The programming language RUST is in the news recently, with articles that are recursive(1)

  • RUST: We have learned lessons from C++. We address the fundamental memory thread safety issues!
    Back 15 years
  • C++: We have learned lessons from C. We have improved memory cleanup and dangling pointer issues!
    Back 30 years
  • C: We have learned lessons from FORTRAN. We have a cleaner, more efficient syntax with better memory management
    Back 45 years
  • FORTRAN: Dang, assembly is hard and not portable, let’s use a human-readable language
Languages: Beyond Embedded

The basic narrative is accurate, but it misses a larger point in software development. It takes an ecosystem to adopt languages.

Ecosystem building blocks

For a language to be adopted, there are 5 basic building blocks that need to be present and qualified(2)

  1. Standardized language syntax: The syntax of the language needs to be formalized and stable(3). All downstream tools are dependent on this first part.
  2. Qualified compilers: Compilers exist that are verified to translate the language into a ‘correct’ hardware implementation. Large-scale adoption requires support for multiple silicon targets
  3. Coding standards: In C / C++ this often means the MISRA guidelines. It represents the distilled ‘best practices for safety.
  4. Supporting tools: A collection of tools such as style guide checkers (for 3), code coverage tools, debuggers, testing infrastructure (e.g. xUnit…), OS integration…(4)

The Apex Predator: Experienced Developers

No matter how good the language or the tool chain is, until a critical mass of experienced developers exists to support development, the language will not take off. This is one of the reasons why new languages often start off with smaller projects that require smaller teams.

Languages as Invasive Species

The software development environment is a crowded field with multiple languages competing for use. Even within the embedded domain, where only a subset of modern languages are used, the competition is fierce. RUST has some good features, but only time will tell if it thrives or becomes a footnote in the history of embedded languages(5)

Footnotes

  1. As a side note: safety systems should avoid recursion
  2. The cost of qualifying tools is significant! Until a language builds up a critical “mass” of real-world use, companies and vendors will not invest in developing the tools.
  3. This is a frozen version of the language that companies can depend on not changing for on the order of ~3 to 5 years. Further, there is an expectation that future iterations of the language will be backward compatible.
  4. A ‘full’ set of supporting tools doesn’t need to be present, but until they are, it is difficult for companies to adopt a new language.
  5. The initial list of languages above is a small subset of what has been part of the embedded landscape. Dropping things such as ADA, OCCAM, COBALT, and many more.

Why not MVP your Process Adoption?

The core concept of the Minimum Viable Product (MVP) is to pilot a quick, low-investment proof-of-concept product to validate the behavior/needs/functionality with the end users. The work is intended to be iterative and potentially thrown away at the end of the evaluation cycle. The MVP is intended for ‘new’ processes or products

The best MVPs start with the mindset of

I know enough to create an initial design, but I don’t know enough to invest resources into the final product. I need to learn

Why We Need Processes: (Human)

Product development requires processes to ensure quality and consistency. Not all products require the same set of processes or rigor within a process. Misalignment of the required processes or rigor can result in low-quality products (missing processes, rigor too low) or wasted engineering effort (unneeded processes, rigor too high).

Both types of misalignment led to discontent among the developers. They can see problems even when they don’t understand the root cause.

Why People Follow Processes

People will follow processes if 4 conditions hold

  1. The process does not significantly slow down their ‘core’ work
  2. They see a near-term impact of following the process
  3. There is consistent support from management for the process
  4. The sum total of process work is seen as low.

The final point, ‘seen as low,’ is critical. If new processes are added too quickly, then the total volume of process work is seen as high. By having a ramp-up to processes, they become part of the background work and can be mastered before new processes are introduced.

One Size does not fit all

The pitfall of people working in a process/systems engineering role is the belief that what worked before will work now. (This is a pitfall in every field.) In many, if not most, cases, this will be true, but care still needs to be taken in ‘assigning’ processes. Which brings us to the thesis of this post.

Processes are “New”: The case for MVP Processes Adoption

Thinking about the objectives of an MVP and the resistance to Processes, it becomes clear that, with modifications, Process Adoption should be treated as an MVP

  • Incremental rollout:
    • New processes are tested out at a small scale to determine if they fit the needs.
    • Users of the process give feedback, validating the process and giving them buy-in
    • Prevents “process dump” burnout
  • Validation loop:
    • Bugs and issues in the process are discovered early, reducing re-work costs
  • Low-cost:
    • Developing, deploying, and documenting a process takes resources; by identifying issues early, sunk costs can be reduced

End Note: Why We Need Processes: (Legal)

There is a second reason we need processes: legal liability. In the case of an issue with the product, showing that the company developed it following industry best practices insulates them from repercussions. This is the truth, but this will never be what motivates developers to adopt processes. When we emphasize this as the reason to adopt, we get reluctant adoption.

Summer Break!

The blog is going on a summer break and will return on Monday August 9th. When the blog resumes, I look forward to examining..

  • Using Model-Based Design for scientific research: What are the fundamental principals behind models as an approach to experimentation.
  • Integrative safety critical design: Incorporating safety critical design philosophy into your key workflows.
  • The intersection of engineering and art: What is an “elegant” design and how do you learn to think in this fashion.

And much more to come…

To boldly go…

One of the key metrics for test suite development is code coverage: MCDC, range and address coverage. There are two questions that need to be asked when considering coverage. First, is it mathematically and logically possible for the full range to be reached? Second, with your test cases are you covering the full range?

Dead logic

“Dead logic” references paths within your model that can never be reached, either because the “if/else conditions” are always true or always false, or because a while or for loop construct does not terminate. When dead logic is detected this should be the first priority in examination as this may be due to a design error.

You can get there, but do you?

Assuming you can demonstrate that the code is fully reachable the next step is to look at your code coverage with your requirement-based tests. Your requirements should dictate what the algorithm does, therefore if you are validating all of your requirements you should see 85% or greater code coverage. Note that there is always some “helper” code which may not be exercised by the requirements-based tests, hence the 85% floor.

If your coverage metric is lower than 85% you need to determine if you are

  • Not fully testing the requirements: Add in additional test cases to cover the missing parts
  • Have introduced additional requirements into the functionality: Either repartition the code or add additional requirements and requirement tests to the module.

Your algorithm must never know…

The process of validating your control algorithm requires the creation of virtual realities, a reality that must be good enough to convince your controller that it is operating in the real world. How do you know if your simulation is good enough to “fool” the controller?

How do you know if you are living in a simulation?

Designing the simulated environment for your control algorithm starts by defining the operating conditions in which the program will run. You need to include all of the operating conditions that the system may encounter while excluding the unreachable domains. For example, with a wind turbine you would want to model wind gusts up to 45 mph but exclude higher rates as the turbine would be shut down for safety reasons for speeds above that.

The next question is accuracy; at one level of change in the input values is there a perceptible change in the real world effects? In some cases a 1-degree change in temperature is trivial (e.g., a combustion engine) while in others it is significant (a home heating and cooling system). Select the resolution to match the requirements.

Next up is consistency, if we go back to the temperature resolution think about the other environmental variables associated with it, e.g. pressure: PV = NRT. If the resolution on the Pressure does not match the resolution of the Temperature then there can negative impacts on the control algorithm.

Finally, update rate: how often are the simulated environment variables updated? The data associated with the simulated variables should be updated with a rate that is consistent with the behavior in the real world.

The condition

The one conditional here; if a general use simulation is being created then the design needs to take into account the operating needs of all the target systems. For the ease of testing, the cross controller development should be limited to insure the highest quality simulation and testing environment.

3 small tips for better performance

Small, smart habits in modeling add up; while a local efficiency may not seem like much, when compounded over the full design, they will result in interestingly large improvements.

Data types, bit field, enums & #defines

Selecting a data type for your variable directly impacts the amount of memory required by your program. In general you should select the smallest sized variable that covers the range of the variable. But that is just the first step in how to save memory.

When you have multiple bits of On/Off state data, packaging the data into a single variable will save memory. For example, if you have 5 On/Off variables you could either use 5 Booleans (5 * 8 = 40 bytes) or a single unsigned 8 bit integer.

The final tip has to do with constant values. If the data needs to be tuneable then the data needs to be declared as a parameter. However, for non-tunable parameters, #defines or enumerated data types should be used; they provide the same “non-literal” implementation while being inlined in the generated code.

Exit early

When creating if-elseif-else logic, order the “if-elseif-else” in order of likelihood of occurrence (e.g., put the most likely first). This prevents the need to do unneeded comparisons in n% of the time.

Additionally, consider inlining calculations into the if/elseif logic if they are only used by a subset of the comparison.

resOf = highlyComplexMatheMaticalFunction
if (lightIsOn)
elseif (lightIsOff)
elseif (resOf)
end

The “highlyComplexMatheMaticalFunction” in this example is only used in one place so the requirement to calculate it for all paths is wasted. In contrast, if you have a complex calculation that is used by multiple branches you should consider pre-calculating the results.

The final suggestion here is to provide “early exits.” If you have if/then else logic or for/while loops consider having a “break” command in the loop / logic if the desired results are reached prior to the completion of the loop.

Resample your data

Real-world data is messy, often sampled at inconsistent intervals and both over and under-sampled at some domain points. For table data, resample your data into uniform intervals that cover your full range while maintaining the required accuracy from the data. If sections of the data are “flat” consider increasing the intervals in that section while decreasing the intervals on sections with rapid changes.

Better safe than sorry: error prediction

Undecidability of the Halting Problem - YouTube

Error detection is not a halting problem,(1) which is to say that we can halt before crashing.(2) The objective of error detection is to determine operating regimes where

  • Continued operation presents
    • a risk to the operator
    • a risk to the those in the environment
    • a risk to the device

When those conditions are detected the device should provide feedback to the user and/or migrate into a different operating mode.

How do you know when faults are possible?

There are three primary ways in which errors can be predicted:

  1. Known “dangerous” operating conditions: (fever condition)
    In some cases the fault condition is known; when your temperature goes above X you have a fever and should get treatment. In the same fashion for most devices, there are known conditions where continued operations are a risk.
  2. Interpolated error: (crystal ball)
    In some cases, the current conditions with forward prediction can predict that the device will at some near point in time enter into a known dangerous condition. In this case a combination of how far off is the condition and how long have you been trending towards the condition should inform when the alert is raised.
  3. Regression/ML/AI error: (history book)
    A statistical approach to predictive maintenance and error detection can be taken e.g., when conditions that historically have led to a fault are detected, then the alert can be raised. This differs from the crystal ball in that the root cause may not be understood.

The Most Important Bobcat Fault Code List
I know but(3)

How to respond

Objects in the mirror are closer than they appear. - Far Side | Animated  cartoons, Funny cartoons, Cartoon

How you respond should depend on the fault severity and the timeline of the fault. The higher the severity of the fault the higher the response; the closer the fault is to “activating,” the faster the response.

Low Alert

The lowest level of alert is the diagnostic message. For a low impact, delayed incident, this is acceptable. However, the end user may ignore this so…

What is adaptive cruise control, and how does it work? - ExtremeTech

Medium Alert

For a mid-level response, the device may reduce the performance. For example, adaptive cruise control could apply braking if the vehicle in front is too close.

High Alert

At the final end of things (high alert), the device should deploy / respond / act in such a way to protect the users and the people in the environment.

Toyota, Honda recall over 60 lakh vehicles due to glitches in air bag

Better late than never; better early than late

As the section above shows, as the response increases the recovery is more significant. By the time you hit the high alert, repairing the device is a greater expense than a simple “check engine” light.”(4) Use the three prediction methods and you(5) can prevent faults.

Careful With Those Birthday Candles, Smokey: Beloved Bear Turns 75 | NPR  News | ideastream

Footnotes

  1. I was surprised that there are no good engineering cartoons on the halting problem; perhaps they haven’t been finished yet.
  2. For many engineers with error detection and fault handling, they treat the perfect as the enemy of the good. It is always better to error out gracefully than to continue operation at risk.
  3. I know this is a fault screen for the small backhoe of a BobCat, but I like to imagine this is a nature documentary following around a wildcat (i.e., some of you may remember the early ’90’s commercials featuring a live bobcat).
  4. The migration from Adaptive cruise control to air bags is clear; if you brake, you don’t crash.
  5. In this case the “you” is you and your whole engineering team.

Interpolation: Yes but Beware!

In the movie version of the book “Charlotte’s Web” there is a song “Zuckermans Famous Pig” which features the lyrics

Fine swine wish he was mine

What if he’s not so big

Seeing this cartoon while in graduate school(1) and while taking a numerical methods course led to the parody song

Fine Spline, coefficients are prime
What if it grows too big?


Interpolation: When to use it

There are three general categories when interpolation is used

  • Sampled real-world data does not cover the full range (regression case)
    If the interpolation covers points inside of the data set this is generally a “safe” scenario e.g., the interpolated data will be smooth within the range. If the interpolation goes outside of the data set then the values predicted by the interpolation should be checked against real-world expectations. In general, 10~15% beyond the sampled range (for smooth data) is reasonable to interpolate.
  • To reduce calculation cost (speed or memory)
    Some calculations are memory or FLOPs intensive; interpolations (especially polynomial interpolation) can show a significant reduction in the total number of operations.(2)
  • Handle discontinuities (piecewise interpolation)
    For equations with discontinuities, an interpolation can be used to provide a non-infinite transition between the operating realms.

Interpolation: When not to use it

The “when not to use” is the mirror image of the “when to use.”

  • Mister toad’s wild ride:(3) (Sampled and Discontinuities):
    In some instances, the curvature of the equations is so severe that interpolations cannot accurately capture the data
  • Flip(4)side: The real thing is cheaper:
    Depending on the equation, and your target processor, the real calculation may be less intensive. In general when I hit a polynomial of order 6 or greater I start to question the value; (Taylor series after 3 terms).
  • Integer data: Gear 1.3
    The class interpolation failure is when integer data is interpolated to floating-point values. My first exposure to this was when I interpolated a non-CVT vehicle into the 1.3rd gear.(5)
Happy Smiling Man Looking In Mirror. Vector Simple Reflection.. Royalty  Free Cliparts, Vectors, And Stock Illustration. Image 151722210.

Follow these tips and you will know if you can “Pig out or Pig In” with your interpolation.

Footnotes

  1. Back at this time, campus television had a limited number of channels. I would estimate that about 50% of my classmates, like me, had it on in the background the day before.
  2. When performing polynomial interpolation save your powers, e.g.
    x2 = x*x;
    x3 = x2 * x;
    x4 = x2 * x2;
  3. Continuing with the children’s story theme
  4. I hope these puns don’t get you off on the wrong foot with my Flip-FLOPS.
  5. Interestingly enough it was seeing that (a good decade before CVT’s were common) that I understood the impact that a CVT could have on fuel economy. If you are interested in fuel economy take a look at this series I’m writing.

Everyday engineering: MBD & MPG (Part-3 F=MA)

Sometimes simple is good enough.  Let’s start by reviewing what goes into fuel consumption for a car; e.g., what are the forces acting on it?

  • Acceleration: getting the car from 0 to 55 (and above) requires force
  • Aerodynamic drag: the faster you go the higher it is
  • Gravity: uphill or down, it has a way of changing your speed

Using these three forces acting on the vehicle, we can (when we add in losses) calculate the energy needed to get from point A to point B. (And if you are curious about how the data for A to B is collected check out this previous post)

Simple models

Our simple model uses the Lat / Lon / Elevation and Speed data we downloaded as part of the last blog post for our points A to B. 

Counting your losses

In a frictionless, lossless world, my car with regenerative braking(1) could reach 100% efficiency.  However, your car-not (2) able to do this in the real world. Our first pass of “driving the route” will make the following assumptions:

  • We hit every stoplight(3)
  • We drive at the speed limit(4)
  • There is no traffic(5)
  • Standard profile acceleration and deceleration between speed zones(6)
  • 20% energy recapture on braking.

The first route: To the Tech Center!

My first working commute was from Farmington Hills Michigan to GM’s Warren(7) Tech Center.  If we break it down by distance and stops we get the following table.(8)

PointDistance
(miles)
Speed
MPH
0.265 15
1.53035
2.0845
322.7265
4.545
51.545
Farmington Hills to Warren Tech Center

Between each point, there is a deceleration to stop(9) followed by an acceleration to the target speed.  If we put this information into the Simulink model we get the following energy usage profile.  There is an interesting modeling point between points 1, 2 and 3; it is a short stretch of the road section of road where the car does not have time to get up to speed before you have to slow down.  I’ve included the Stateflow chart that I created to solve this look ahead in the footnotes.(10)

Full Trip
Regen Effects

Because this trip was mainly highway there was very little chance for regenerative breaking;(11) in contrast, my ADI (Applied Dynamics International) commute had many more start-stop moments with more regenerative braking events.

ADI Route

City Commute

Reviewing the data from these two routes reinforces some basic knowledge:

  • Total distance is only one factor in energy usage
  • Energy usage goes up with rate of travel
  • Start-stop events (with acceleration) have a large impact on energy use (e.g. steady speed is better)
  • Yellow is an odd default choice for plotting color.

In next week’s post we will add in a “human” driver model to improve the accel and decel behavior of these models.

Footnote

  1. For a short introduction to the efficiency of regenerative braking I recommend this link.  In short, there are two limitations to capturing energy from regenerative braking. First, a portion of the brake force is applied through conventional brake pads. Second, the torque/speed of the wheels at braking cannot be tuned for optimal energy capture.  As a first pass approximation, we will assume that 20% of the energy is re-captured during brake implementation.
  2. To instructors of Thermodynamics courses, please feel free to use this joke under a GPL Open Source License.
  3. It only ever seems this way when you are driving.
  4. Generally this is true with the exception of 25 MPH zones which always seem way slower than anyone drives.
  5. Ok, so that never happens, but one could dream.
  6. The first pass approximation of this is 10 mph/sec on acceleration, and 20 mph/sec on deceleration.
  7. GM Technical Center is a large complex with tunnels connecting all the buildings; I often thought that I was working in a “rabbit’s warren.”
  8. For the first pass, we are assuming a “flat” drive.  In southeast Michigan, this is generally true.
  9. The stops require a “look ahead” model, e.g., we have to know when to start stopping.
  10. I implemented this as a Stateflow chart with the intention that additional logic will be added to account for the driver behavior model in subsequent updates.  For now, it is a simple accel / deccel / hold calculation.
  1. And because this was in 1995 there was zero chance for regenerative braking.  GM had the EV1 then but I did not own one.