Big Data (the other kind)

Back when I swam competitively(1) there was a saying: “train like you compete”; which is to say the habits you develop when training are the behaviors you exhibit when you compete. The same principle holds when developing a model; the block and data architecture you select during development carries into production. So… onto “big data.”

Three little bears

Like porridge or beds, we want our data to be “just right”; neither too big nor too small; so we are talking about how much RAM or ROM the data consumes. Basic data types use 8, 16, 32 or 64 bits. Depending on your processor the minimum and maximum data size may be different.(3) In designing your algorithm select the smallest data type that accurately represents your data and using simulation verify that this data type is sufficient for the full system.

Let’s get real. Talking about complex data

When I write (about complex data) I am not being imaginary, that is a hole pole apart.(4) Rather in this instance I am thinking of vectors and structures. The rule of thumb for vectors is simple, allocate enough entries for the data that needs to be stored. Structures require a rule of hand.(5).

Structures provide a method for grouping different variables and data types together into a single package. From a data organization perspective this provides a clear advantage as all the related data can be found in one location. When thinking about size of data the question becomes “how closely are you related?”

A concrete example

If I was creating a physical model of a road I would want to know the attributes of the concrete I was pouring. Attributes include tensile and compressive strength, porosity, density, thermal conductivity, cost per cubic foot and many more. My full physical model will require all of those attributes but not all parts of my model require all the parts.

  • Construction “strength” model: tensile & compressive coefficients & density
  • “Wear and tear” model: porosity, thermal conductivity
  • Lifetime cost model: cost per cubic foot…

Megastructures

If I created one “megastructure” including all the data then I would pass the following data (for this simple example):

  • 3 unneeded data bits to “strength”
  • 4 unneeded data bits to “wear and tear”
  • 5 unneeded data bits to “cost”

Over time and in more realistic examples this unnecessary data quickly adds up consuming processor memory.

Footnotes

  1. H2Okies(2)
  2. The fact that the swim team name had a chemistry pun embedded endeared me to the team right away.
  3. If your smallest data type is 16 bits, creating an 8 bit variable does not save any memory and may in some cases result in different behaviors between Simulation and the deployed code. Setting the target hardware configuration for your model addresses this issue.
  4. Complex data can also mean data with real and imaginary parts (e.g., A = 1 + 3i). More often than not it is plotted in polar graphs with “poles and holes” for the real and imaginary components.
  5. The most famous “rule of hand” would be the “right hand rule.” In this case I am just thinking that unlike rules of thumbs I have 4 more points…

Super step!

In today’s video blog I am going to cover the often overlooked Stateflow Super Step semantic and how it interacts with Stateflow function calls and Simulink subsystems. This is the first in a two part series, in the second part I will show how I validate the behavior of the chart.

Free Body Problems

By background I am an Aerospace Engineer;(1) by starting in computational mathematics and CFD, and later physical modeling, I became a “programmer” through effort.(2) Engineers are taught to think of systems as Free-Body Diagrams, e.g. that the system can be understood as a sum of the input forces.

FBD / ICD

With software we think of the system through the Interface Control Document (ICD) and the system architecture. A common mistake for engineers who are working in programing is to treat an ICD like an FBD; as if it was the full summation of the system. The correct analogy is to look at the system architecture and ICD with the system call tree.

Σ (ICD,FNC) = FBD

The primary or key observation should be that units of software are, more often than not, analogous to individual forces in your system. It is only in the summation of these units that you have the full system. The system architecture is the portion of the environment that defines when and where the calculations are performed.

Footnotes

  1. In truth my earliest “work” was as a programmer at age 8, writing the program “guess it”, where the computer would guess the number you had selected.
  2. Learning to write a program in C / Simulink is not the same thing as being a Computer Scientist, that is, when you start to think about the structure and format of your code / models. Unfortunately, there is often confusion as to what the roles of an engineer versus CS person should be in the development of software.

When do we know what we know?

Within the simulated world we have the ability to act as an omnipotent god, all knowing and all seeing, able to respond as needed in any situation. This is appropriate when we model physical systems; however when we model human reactions the question of “how would they respond” becomes an important question.

The human condition

When it comes to responses, people make a lot of mistakes. It could be a simple delayed response, a missed event or an incorrect interpretation of the information. Regardless of the type of error it manifests in your simulation as a “non-optimal” response.

Simulating humans

Fortunately in our efforts to simulate humans we only have to emulate a limited set of response types. This effort looks much like our earlier post on “noise,” e.g., we can model delay, error and crossed signals.

Probabilistic humans…

There is a last type of human error “wrong response” and it is the trickiest to model. For every input event there are multiple incorrect responses, and there are secondary responses after the initial response. Take for example the case of a kid running out into traffic. The correct response is to apply the breaks and steer away from the child (assuming there is no one in the other lanes). Incorrect responses could include.

  • Applying the accelerator (1% chance)
  • swerving into oncoming traffic (1% chance)
  • Asking Siri what to do (0.01% chance)
    • Since car talk is no longer on the air

Once we know what sort of incorrect responses people can make, we need to determine when we should include them in our test scenarios…

My heater doesn’t care about your brakes

The obvious answer is “include human error in the applicable systems.” In the case above I would want to model human error when I am working on adaptive braking for an ADAS level 2 system. I may want to include incorrect acceleration decision in a fuel economy study, but I would not include it in an EPA fuel cycle test as it would add unneeded test cases to the system.

The Calendar Problem…

Today I’m coining the term “the calendar problem” to describe any system that has irregular rules,(1) which require complex logic to implement or check. My goal when creating any system is to bound any calendar problem logic to make validation as simple as possible.

Can’t be avoided; can be minimized

In systems, calendar problems will exist but they can be minimized through a 4 step process.

  1. Determine the core “default” rules: this is the behavior of the system unless an exception occurs.
  2. Determine the exceptions: what is the logic for when the core rules do not hold.
  3. Determine the exceptions to the exceptions: this is both when the exceptions do not hold and when there is a different result for the exception
  4. Look for back propagation: do the exceptions change any of the earlier results?

Calendar example

We will walk through the calendar to show how this can be applied

  • Core / Default
    • 365 days in a year
    • Jan, March, May, July, Aug, Oct, Dec : 31 days
    • Sept, April, June, Nov : 30 days
    • Feb 28 days
  • Modifications:
    • Every 4th year Feb has 29 days
    • The year has 366 days.
  • Modifications to modifications
    • If the year is divisible by 100 it is not a leap year
    • If the year is divisible by 400 it is a leap year
  • Back propagation
    • None

Humans and irregularities….

Irregularities come in when you have human interactions with your system; physics is always regular. The key is to look for situations where the person in the loop has conflicting needs or may not know there is an issue (so they act in ways that are not in alignment with what they want).

Footnote

  1. The “knuckles / spaces” method for knowing which months have 28 / 29 / 30 or 31 days was something I just found today. What is interesting is that it shows the problem even as it offers help; the darn 28 / 29 February (not even getting into the 4, 100, 400 year issues here).

Doing your “bit”, or “don’t byte of more than you can chew”

In the US, the traditional “thank you” for a friend who helps you pack up your bits and move is payment through a bite of pizza.(1) When moving data in computer programs, we can pack our bits into bytes.(2)

In a recent post I wrote about minimizing the data used by your model. I want to expand on that concept by talking about data packing using masks. In this case it is a three way balance between minimizing data usage, efficiency of operations, and clarity of the data.(3)

Basic Bit

Masking a byte(4) is when a specific bit in the byte is associated with a Boolean value. For example:

#define VOLTERRORSTATUS 2^3
if (voltError) {
errorFlags = errorFlags || VOLTERRORSTATUS;}

In this example the variable error flags is “or” together to add the volt error onto the generic error flags message.(5) Later, the status of the error can be unmasked to take action.

if (errorFlag && VOLTERRORSTATUS) || (errorFlag && CURRENTERRORSTATUS) {
  /* Power problems!  Do something */
}

The total number of status flags that can be set using this method is dependent on the word size; a 32 bit integer could store 32 status flags. The drawback is that errorFlag, on it’s own, requires decoding before you know what the issue is.

Footnotes

  1. Sadly, these are the moments when you learn your friends’ horrible taste in pizza toppings.
  2. Despite the stereotypes, programmers do not get paid in pizza, though they may use their pay for pizza.
  3. The image on the right is a perfect example of this sort of balance; each point has some resilience and movement in any direction affects the others. As a secondary note, to the best of my knowledge no child ever liked this playground item.
  4. The image search for “mask” has very different returns from a year ago. Please help protect yourself and others and wear a mask when you go out.
  5. In this example “VOLTERRORSTATUS” is a #define variable. This is again, to save on memory.

What is Model-Based Design?

Many of my posts are prompted by questions from readers or issues that have arisen from my clients over the years. This one one comes from an engineering student, “What should I study if I want to go into Model-Based Design?”

One “post video” thought. Strictly speaking, the study of puns is not required to work in the field of Model-Based Design but study of the polymorphic nature of language can help you for the polymorphic behavior of models.

Some roses are red…(1)

“What’s in a name? That which we call a rose
By any other name would smell as sweet.”
Romeo and Juliet: Act 2 Scene 2

Names are labels we give to things to provide mental holding places for what a thing is; I am Michael, author of this blog and you, dear folks, are my readers.(2) Humans are flexible and we can assign different names to the same thing and still know it is a rose. Computer programs however….

Naming conventions

Many moons ago I wrote a post on naming conventions; (3) this is the first stage in the software “name game.” For software names have to be descriptive (with naming conventions) and unique.

Where things go wrong: Scope

Error, stop, fault, or startup: If you look into almost any set of code you will find these common variable names. If the variables are scoped to just the given module, things are okay, but often things “leak out.”

If you look at the interface variables, e.g. the bus (structures in C), enumerations, and all global data, you will see that enumerations, given the “state” nature, commonly reuse the same names.

ICD and Data Dictionaries

First, a note on the image. Since this is a blog on Names, the use of acronyms is especially important. When I write “ICD” I am thinking of Interface Control Documents; but it can also mean “Implantable Cardiovascular Device.” But enough side bars…

Our goal to have unique names (at the project scope) aided by two things; ICD and Data Dictionaries. The object here is to augment these base tools with integrity checking tools to validate full systems.

  • Tag the scope of data in the data dictionary: Enable scope detection for the data in the system.
  • Enforce naming rules: For both root and sub-data names.

Footnotes

  1. It has always bothered me that the poem goes “Roses are red, violets are blue“; properly speaking violets are by their very name the color violet.
  2. At some point in the future your role will change and you will have another name.
  3. The image to the right is exactly what a naming convention looks like; everyone comes together and picks names.

Diving into graphical models

Proponents of text based languages often point to the ease of developing / debugging their code; “I put a breakpoint on a line(1) and…” while at first blush it may seem easier to debug textual models today I am going to walk you through(2) how to debug graphical models.

What is debugging?

First let us come to an agreement on what it means to “debug” a model / code. A model has a “bug” when the model either

  • Fails to run:
    or
  • Runs producing incorrect results:
    it is not
  • Excessive memory use, slow execution(3)

Failure to run is easy to detect, incorrect results may be more difficult to detect if unit tests for regression do not exist.

Sometimes…

You get lucky and a simple differencing of your model against recent versions can determine where the error crept in.(4) To do this, you need to…

  • Review the blocks added to the model
  • Review block parameter changes
  • Review the changes to connections
  • Review changes to data

Breakpoints and probes

In textual debugging there is often enough “printf” statements to make you think you are in Project Gutenberg;(5) graphical models use probes and scopes to accomplish the same objective.

Breakpoints can be inserted into the model in any location; for advanced debugging techniques, conditional breakpoints can be used e.g., break when signal > MyVal

Sometimes when you debug you want to go in reverse(6)

Break it down

Finally, a powerful method available in Simulink is the ability to create test harnesses for subsystems, e.g. once you have isolated the issue to a given subsystem you can create a test harness that will run that section of the model independently, enabling a faster debugging experience

Final thoughts

Once you have gone through your debugging exercise…

  • Check any models that you have created to prevent regression to this bug
  • Note any incorrect block usage for future best practices

Footnotes

  1. Why yes, that is MATLAB code with simple breakpoints; it is the “text” portion of the Simulink / Stateflow / MATLAB triumvirate.
  2. After a month of video posts walking you through Model-Based Design this will be a short walk in the park.
  3. Many of the techniques for debugging a model can be used for these situations; however those are optimization problems, not function problems. The general best practice is to ensure full system compliance with requirements before starting to optimize your system.
  4. As a side bar, the phrase “low hanging fruit” is an interesting phrase. It covers the “things that are easy to get”. However, if you talk to people who climb into trees to pick fruit they will tell you that you grab the low hanging fruit last so you do not climb up the tree with extra weight. Like many things in life if your long term objectives should shape your short term actions.
  5. The Project Gutenberg brings public domain books to electronic format.
  6. Point-Break ==> reverse Break-Point