I recently read an article about identifying programmers based on their programming “style“. One thing that was interesting about the study sited in the article was that the algorithm for determining who wrote the code… More
Difference reports are a staple of software development, they allow you to quickly see what has changed. Model-Based Design, with its graphical environment, presents some unique issues in differencing; we will examine them with this post.
Differencing best practices
The following best practice hold regardless of the development environment
- Reasonable diffing cadence: Finding the difference between two objects is easy if the number of changes are small; however if large structural, or multiple-point changes have occurred difference operations become difficult if not impossible. Perform differencing operations when no more then 5% of the object has changed.
- Diff function changes / ignore visual changes: Visual changes, such as the use of indents in textual languages, or block placement in graphical, do not impact the behavior of the code. While these changes may violate modeling guidelines they should be ignored during the diff operations
- Include the authors: Have both the original author and updating person take part in the review for complex issues.
- Test after diffing: After performing the difference operations have been run the files should be run through their test suites to ensure that the changes did not have a negative impact.
Don’t judge a model by its cover
In text based languages changes in the text are it; with Model-Based development environments it is the combination of the
- Block connections: How the blocks are connected to each other determines execution order and data flow.
- Block parameters: How the block is parameterized determines the functionality of the block
- Model configuration: The model can have parameters that influence the behavior system beyond the data set by the blocks
It is possible to diff the underlying text files that define graphical models; at one point in time this was the only method that was possible. However, doing this presents multiple problems and should be avoided.
- Determining what is significant: The text representations often encode information into parameters / structures. Often it is not obvious what are significant changes versus cosmetic.
- Block changes: The text representation can have the same functionality “shifted” in the code to a new location depending on where it shows up in the model; however this is often not a functional change.
- Interpretation: When reviewing the text changes the reviewers then have to interpret what the text means.
Table lookup algorithms provide a powerful tool for the modeling of systems the following are some basic tips for making your table look up’s fast and accurate
Pre-process your data
Table data often comes from real-world sources. This presents two issues
- Errors in the data: Validate that any anomalous readings are removed from the data
- Non-uniform input axis: Real-world measurements frequently have jitter on the exact “place” of measurement. Data should be renormalized to a uniform input axis
Know your boundaries
With tables, there are always limits on the input axes range. Understanding the behavior of the table outside of the known range is important. There are two basic options
- Hold last value: In this instance, the last value at the tale boundary is used for all data outside of the axes
- Interpolate data: In this case, the output data is extended beyond the known boundary. For data near the boundaries this is generally acceptable; however, the accuracy can quickly break down as inputs exceed the
There are three approaches to handling this issue
- Expand the range of valid data: This is the ideal solution but is often not possible due to sampling reasons.
- Pre-interpolate the data: Create data outside of the range with “safe” values based on engineering knowledge.
- Limit the input data range: Create a “hard stop” at the data edge.
Reduce, reuse, recycle…
It is common for multiple tables to share the same input axis; in this case, sharing the index lookup across multiple tables is one method for reducing the total number of calculations required by the algorithm.
Data types and discontinuities
lookup algorithms are not well suited for data with discontinuities. When working with such data either piecewise approaches are used or the region in which the discontinuity appears has additional data points to handle the sharp change.
A similar issue is working with integer-based data and interpolation. When the outputs from the table are integer values then either interpolation should not be enabled or all possible input coordinates should be part of the input axes.
So there you go, a few quick suggestions for working with tables!
There is an old saying “measure twice, cut once”‘; wise words of caution before you take an irrevocable action. However, if you have a faulty measuring tape then measuring twice will just produce the same error twice.
Orthogonal redundancy is an approach to safety-critical software where the same dependent variable is calculated using 2 or more methods. In the woodworking example, this could be done by using a standard tape measure the first time and laser guide the second.
Achieving software orthogonality
There are three basic approaches to software orthogonality, listed in terms of “strength”
- Unique algorithms:
- Common algorithm: unique parameterization
- Common algorithm: unique inputs
Using a unique algorithm has the advantage that it removes the chance of a common point of failure; e.g. if one algorithm can overflow, the second doesn’t. (Mind you, you should catch the overflow problem). The downside to this approach then is that you need to create validation test cases for each unique algorithm
Common algorithm: unique parameterization
In this case, the same algorithm is used however the parameterization is different for each instance. This is commonly seen for hardware sensors, such as unique scaling on a set of analog input sensors. For example, as in the image shown a simple linear equation (y = m*x + b) can be used to determine the throttle angle, however, the coefficient or “m” and “b” are different.
Common algorithm: unique inputs
This final approach is used when the input source data is suspect or can fail. The solution, in this case, is to create multiple input sources for the same data. The throttle body example above, redundant sensors, is an example of this; a more robust example would be to have two different types of sensors.
When and how?
Using orthogonal algorithms requires additional execution steps and memory; both for the algorithm and for the validation of results against each other. Because of this, the use of the algorithms should be limited to safety critical calculations.
The standard way to use multiple results is with triple redundancy. The results are compared with each other, as long as they are in agreement (within tolerance) of each other then the result is passed on. If two of the three agree that value is used. If there is no agreement then the results are flagged as an error.
In scenario-based testing, the test author configures the input data to “move” the system through a set of desired states or maneuvers. A simple example of this would be a “wide-open throttle” scenario for automobiles
- Key set to crank
- Hold .1 second
- Key set to run
- Brake to 100%
- PRNDL to Drive
- Brake to 0%
- Throttle to 100%
Measure time to 60 MPH
In scenario-based testing, there are often common “starting points”. In our example above the common starting point would be the first 3 steps. In the example that follows, using a Test Sequence block I implemented the common steps as a single, grouped, step.
When the common setup is complete the sequence block then moves onto another “grouped” step which implements the specific test. Optionally a second or third “common setup” sequence could be defined before the actual test begins.
Another method of reusing scenarios is to “chain data” together. In this case, a series of input data files are concatenated and played in sequence into the simulation. To continue this example the first three steps would be one data file and then either the “WOT” or “idelWarmUp” would contribute to the next data file. (Please imagine the image is of a fat cat named “Nate” (Con-Cat-Ton-Nate)
Why perform scenario-based testing?
Scenario-based testing allows us to validate that models behave in the way we expect and the way we designed, if I “put the pedal to the metal” the car accelerates quickly. Most frequently I use them to
- Validation of state machine behavior
- Exercising diagnostic code
- Validate requirements
- Debug models under development
Every once and a while I take a dip into my design philosophy. This video is one of those posts. A short meditation on the link between systems that we design and inhabit. The image I show in the video is the same as the header and image shown below. If you have a chance go to the Detroit Institute of Art (DIA) and see it in person!
We all know what it is:
Bubble wrap syndrome (BWS): The overwhelming desire to “pop” cells
in a sheet of bubble wrap. See also Packing Peanut Popping (PPP).
Even if it isn’t a real psychiatric diagnosis it makes for a good, slightly humorous, starting point of this post. Bubble wrap is designed to protect items in shipping. When we “pop” bubbles we decrease the effectiveness of the wrap. In the software domain, this corresponds to the slow creep of “features” into the diagnostic code.
The road to heck…
By design diagnostic code is isolated from the functional code. Because of this diagnostic code will, sometimes, duplicate calculations that exist in the functional code. It can be tempting to interleave, reuse, those calculations for both the functional and diagnostic code.
There are three problems with this
- Unit test interference: by mixing the functional and diagnostic code you prevent unit testing of the components.
- Development independence / common failure mode: Mixing the two components together you run the risk of introducing a common failure mode in the analysis of the data.
- Link to a common processor: for some safety critical systems diagnostic code runs on a separate processor or core.
Where are the boundaries?
In this example, the “Proposed Mines” are the possible infestation of our pristine “boundary waters” diagnostic code. The question is “how close is to close?” The following rules of thumb can be applied
- The functional code should not depend on any output from the diagnostic beyond the diagnostic flags.
- The diagnostic code should not make calls to functional code
- The diagnostic code should not perform filtering, table lookups or integration operations
- The diagnostic code should be non-interuptable.
Welcome to my latest blog, I have tried for a “dynamic” whiteboard experience today; I apologize in advance for the occasional motion blur.
The focus of this video is how to exchange data between multiple tools, with a recommended approach of settling on a common data format that all tools can read and write into. A list of common data formats can be found here:
First a definition:
A software bug is an error, flaw, failure or fault
in a computer program or system that causes it to
produce an incorrect or unexpected result,
or to behave in unintended ways.
This is in contrast to incomplete development where the program is not yet performing the intended function.
There are three types of bugs:
- Changed induced: these are bugs that arise when part or all of the program is changed. These can often be resolved by doing comparisons against the earlier version of the program.
- Corner case bugs: these bugs are due to missed behavioral cases; for instance not accounting for overflow in a button counter.
- Incorrect library/function usage: these bugs arise from the use of library functions incorrectly; for instance passing a double to an integer function.
DIF: Detect, Isolate, Fix
In debugging the first step is to trace the issue to its root; in Simulink, this is normally a subsystem, in Stateflow a set of state transitions; in either case, the issue could be due to changes in parameterization so…
- Review/compare parameter data: inspect the parameter data that specifies the behavior of the system. Try reverting to earlier versions of the data.
- Introduce data logging: the simplest level of debugging is the introduction of intermediary data logging points. If this is a change induced bug this is often enough to determine the problem.
- Simulate by unit: where possible decompose the full model into components and simulate them in isolation to determine the subsystem behavior.
- Introduce breakpoints: both Simulink and Stateflow allow for the introduction of breakpoints. Conditional breakpoints, where the simulation halts for a given configuration, add additional debugging power.
- Use formal methods: use of formal method tools such as Simulink Design Verifier to detect dead logic and overflows/underflows can automatically determine the location of some bugs.
- Second eyes: Bring another person in to talk about your model, what you expect and what it is doing.
Common “bugs” and simple fixes
The following are common development bugs
- Miss aligned ports: verify that the inputs to a Model or Library correctly map with the calling subsystem. This issue arises when the referenced model/library is changed.
- Never reached: dead code due to logic that is never activated. This is found using the coverage report or through SLDV.
- NAN: nan, or not a number, occurs when you have a divide by zero operation. To detect this set the Simulink diagnostic to detect this condition.
- Interpolation data and tables: by default blocks in Simulink will interpolate outside of the specified range. This can cause problems if
- The data is of integer type and the result is a float
- The data is not valid outside of the specified range
- Saturation/limiters: frequently people put in limit blocks into systems during development. These blocks can “prevent” issues but also introduce errors. Inspect the data going into and out of limit blocks (and limits on integrators.)
- Synchronization: in some instances, the behavior depends on synchronized data; if the signals are out of alignment due to either introduction of unit delays or sample rate of the system. Look for cases where transitions are dependent on the status of two highly transient variables at the same time.
I would love to hear about your common bugs and debugging techniques.
I recently had a conversation with a client about how they instantiated constants for their model. Thier approach was to group common parameters together into a structure. What I pictured was something like this
In this instance, we have a small structure, one layer deep with 4 elements. This would be enough information to perform the calculations required to transform a throttle sensor voltage into a throttle position.
However, what they showed me was something quite different. In their instance, they had a hierarchical structure that was, in some places, 7 layers deep. The single structure contained not only all the parameters required for a single component but for multiple models.
This isn’t the first time I have seen a structure like this, in general, they grow organically. As multiple people work on a project they want an “easy” way to share data and, at first, when it is small, the method works well. However, as the structure grows in size several problems start to emerge.
- Where is my data: the first problem with large data structure is finding the data. Even the most logically organized structure
- Waste of space: deep structures inevitably end up with unused data.
- Repeated data: Going along with the “where is y data” is the “repeated data. People will add the same data to multiple locations.
- Customization: with a large data structure you have to configure the whole data structure as one.
The argument against flattening structures was “If we break them up we will have thousands of parameters”. While that was factually correct it missed the fact that they already had 1000’s of parameters, just in a deep structure. The advantage of the flat format are
- Ability to easy find parameters
- Ability to customize on a parameter by parameter basis
- Only used parameters are in the generated code
There are some disadvantages, related to how the parameters are stored in files; a single structure can be stored easily in a single structure. With multiple parameters, a storage format needs to be determined. Standard approaches include
- Use of MATLAB files
- Use of .mat files
- Use of Simulink Data Dictionary
- Use of an external data base
Any of these approaches can be used to organize the data.
For those of you reading in the distant future, e.g. more than a month from now, let me set the stage. It is July 2018 and (as an American) World Cup Soccer is in full swing. Now if I was anyplace else in the world it would just be “The World Cup”. However, with either name you know what I am talking about; this is what is called a “one-to-one” mapping.
In natural, e.g. spoken languages, these “one-to-one” mappings (or near mappings) are common and can often be understood through the context of the conversation. However, in software these mappings are problematic.
A rose by any other name… may still have thorns
Multiple names come into existence for multiple reasons.
- Multiple developers: This is the most common reason, multiple developers working in separate models/files. They each “need” to use a variable so they create a name for the variable.
- Units: For physical qualities, such as vehicle speed, it is common to see, VehSpeedMPH and VehSpeedKPH. While this may seem like a good idea, e.g. the units are known, this is perhaps the most problematic duplicate as you will often see the two instance out of sync with each other.
- Reusable functions: In this instance, the same function is used in multiple instances. In this instance, the key observation is to have meaningful generic names in the function.
- Common usage: For a subset of cases the reuse of names should be encouraged. This is the scoped data with common usage. For example “cnt” can be used for a counter in multiple models/functions.
The reconciliation project
First, determine if it is worth doing. While consistency is important, code that is fully developed and validated may not warrant the work required to update the code. Once the decision has been made to update the code/model base the following steps should be taken.
- Identification: find where the same data concept is used with different names.
- Select a common name: Use of a descriptive name is important in these cases to smooth the transition process.
- Check for “unit” issues: If the different names existed due to desired units, validate that the change to the common units is accounted for downstream.
- Update documentation/test cases: (You do have those right?). Documentation and test cases will often, reference the “old” names. Update the documentation to reflect the “new” name.
- Validate the units: After the updates have been performed the full regression test suite should be run to validate the behavior of the models.