Best practices for model cleanup

In this blog I have written a lot about “mushroom” and “spaghetti” code; today I’m going to write about the best practices for updating and cleaning up those models

Should I update?

Before you start you should ask yourself three questions

  1. Beyond cleanup are there additional modifications needed to the model? (No)
  2. Is the model, as written, performing it’s intended function? (Yes)
  3. Do I have tests cases that cover the full range of behavior of the model? (Yes)

If you answered as indicated (no,yes,yes) then stop. Spend time on another part of your code that does not meet those criteria(1). Otherwise lets start…

Baselining the model

The first step in cleaning up code is baselining the model. This activity consists of N steps

  1. Back up the model’s current state: Ideally this is already handled by your version control software but….
  2. Generate baseline test vectors: To the degree possible create baseline tests, these could be auto-generated.
  3. Generate baseline metrics: Generate the baseline metrics for the model, ram / rom usage, execution time, model coverage…
  4. Create the “Difference Harness”: The difference harness compares the original model to the update model by passing in the initial test vectors and comparing the outputs.

What is different about today?

The next question to ask in your refactoring is “do I re-factor or do I redo”? Depending on the state of the model there are times when simply re-doing the model from scratch is the better choice. This is often the case when the model was created before requirements existed and, as a result, does not meet them; that would make for a very short article though so let us assume that you are refactoring. First figure out what needs and what should change. To do that ask the following questions.

  • Review the requirements: what parts of the requirements are met, which are incorrect and which are missing?
    • Prioritize missing and incorrect requirements
  • Is it possible to decompose the model into sub-components: In most cases, the answer is no, or yes but it is tangled. It wouldn’t be mushroom code if you could.
    • Create partitioning to enable step-based modifications
  • Identify global data and complex routing: Minimization of global data should be an objective of update, complex routing is an indication that the model is not “conceptually” well decomposed
    • Move sections of the model to minimize signal routing and use of global data
  • Identify the “problem” portions of the model: Which sections of the model most frequently has bugs?
    • Squash them.

Once you have asked these questions you understand your priorities in updating the model

Begin modification

First understand the intent of the section of the model, either through inspection or through review of the requirements . Once you understand what the intention is you can start to simplify and clarify.

  • Simplifying logical statements / state charts
    • Run tool such as Simulink Design Verifier to check for dead branches, trim or fix
    • Look for redundant logical checks (multiple transitions all using the same “root” condition check)
    • Look for redundant states (multiple states exist all with the same entry and exit conditions)
  • Mathematical equations
    • Did they create blocks to replicate built in blocks? (Tables, sine, transfer functions)
      • Replace them with built-in blocks
    • Are complex equations being modeled as Simulink blocks?
      • Replace them with a MATLAB function
  • Size (to big or to small)
  • Partitioning rationale

Footnotes

  1. With mushroom code it is highly unlikely that you have tests cases that cover the full range of behavior of the model; model (or code) coverage should not be confused with full behavioral coverage since it is possible to auto-generate tests cases that would cover the full model without every understanding what that coverage means
  2. One advantage of having this blog for 3+ years is I can mine back article for information. Hopefully you will as well. What I mine is yours, nuggets of MBD wisdom.

Interface control documents and data dictionaries

Interface control documents (ICD) and data dictionaries are two parts of a mature MBD infrastructure. The question I often hear is “what is the boundary between the two artifacts”? First a high-level refresher:

  • The Data Dictionary: an artifact used to share a set of common data definitions external to the model and codebase.
    • Objective: provide common and consistent data definition between developers
  • The ICD: an artifact used to share interface information between components external to the model and codebase; often derived from or part of the requirements document set.
    • Objective: provide a common interface definition to simplify the integration of components when multiple people are working on a project.

An example of an ICD spec is

function namemyIncredibleFunction
function prototype(double mintGum, single *thought, something *else)
Call rateevent-driven
Multi-thread interruptibleyes
Function information
VariableTypeDimensionpassby
mintGumdouble1value
thoughtsingle4reference
somethingstructure10reference
function specification

And here is where the boundary question comes up. In specifying the data type and dimension in the ICD I am duplicating information that exists in the data dictionary; violating the single source of truth objective.

Duplication can be dangerous

So what is the flow of information here? I would suggest something like this…

  • The ICD document is created as part of the initial requirement specifications
  • The data interface request is used to inform the initial creation of data in the data dictionary
  • Once created the data is owned by the data dictionary

Infrastructure: making your artifacts work for you

Data dictionaries serve an obvious purpose, they are a repository for your data. On the other hand, interface control documents can seem like a burdensome overhead; which it will be without proper supporting infrastructure. If you remember the objective of the ICD, to simplify integration, then the need for tool support becomes obvious. When a developer checks in a new component it should be

  • Checked against its own ICD
  • Checked against the ICD for functions it calls and is called by
  • Its ICD should be checked against the data dictionary to validate the interface definition

With those three checks in place, early detection of invalid interfaces will be detected and integration issues can easily be avoided.

ICDs and the MATLAB / Simulink environment

Recently MathWorks released the System Composer tool. While I have not had a chance to try it out yet it offers some of the functionality desired above. I would be interested to learn of anyone’s experience with the tool