Best practices for model cleanup

In this blog I have written a lot about “mushroom” and “spaghetti” code; today I’m going to write about the best practices for updating and cleaning up those models

Should I update?

Before you start you should ask yourself three questions

  1. Beyond cleanup are there additional modifications needed to the model? (No)
  2. Is the model, as written, performing it’s intended function? (Yes)
  3. Do I have tests cases that cover the full range of behavior of the model? (Yes)

If you answered as indicated (no,yes,yes) then stop. Spend time on another part of your code that does not meet those criteria(1). Otherwise lets start…

Baselining the model

The first step in cleaning up code is baselining the model. This activity consists of N steps

  1. Back up the model’s current state: Ideally this is already handled by your version control software but….
  2. Generate baseline test vectors: To the degree possible create baseline tests, these could be auto-generated.
  3. Generate baseline metrics: Generate the baseline metrics for the model, ram / rom usage, execution time, model coverage…
  4. Create the “Difference Harness”: The difference harness compares the original model to the update model by passing in the initial test vectors and comparing the outputs.

What is different about today?

The next question to ask in your refactoring is “do I re-factor or do I redo”? Depending on the state of the model there are times when simply re-doing the model from scratch is the better choice. This is often the case when the model was created before requirements existed and, as a result, does not meet them; that would make for a very short article though so let us assume that you are refactoring. First figure out what needs and what should change. To do that ask the following questions.

  • Review the requirements: what parts of the requirements are met, which are incorrect and which are missing?
    • Prioritize missing and incorrect requirements
  • Is it possible to decompose the model into sub-components: In most cases, the answer is no, or yes but it is tangled. It wouldn’t be mushroom code if you could.
    • Create partitioning to enable step-based modifications
  • Identify global data and complex routing: Minimization of global data should be an objective of update, complex routing is an indication that the model is not “conceptually” well decomposed
    • Move sections of the model to minimize signal routing and use of global data
  • Identify the “problem” portions of the model: Which sections of the model most frequently has bugs?
    • Squash them.

Once you have asked these questions you understand your priorities in updating the model

Begin modification

First understand the intent of the section of the model, either through inspection or through review of the requirements . Once you understand what the intention is you can start to simplify and clarify.

  • Simplifying logical statements / state charts
    • Run tool such as Simulink Design Verifier to check for dead branches, trim or fix
    • Look for redundant logical checks (multiple transitions all using the same “root” condition check)
    • Look for redundant states (multiple states exist all with the same entry and exit conditions)
  • Mathematical equations
    • Did they create blocks to replicate built in blocks? (Tables, sine, transfer functions)
      • Replace them with built-in blocks
    • Are complex equations being modeled as Simulink blocks?
      • Replace them with a MATLAB function
  • Size (to big or to small)
  • Partitioning rationale


  1. With mushroom code it is highly unlikely that you have tests cases that cover the full range of behavior of the model; model (or code) coverage should not be confused with full behavioral coverage since it is possible to auto-generate tests cases that would cover the full model without every understanding what that coverage means
  2. One advantage of having this blog for 3+ years is I can mine back article for information. Hopefully you will as well. What I mine is yours, nuggets of MBD wisdom.

Interface control documents and data dictionaries

Interface control documents (ICD) and data dictionaries are two parts of a mature MBD infrastructure. The question I often hear is “what is the boundary between the two artifacts”? First a high-level refresher:

  • The Data Dictionary: an artifact used to share a set of common data definitions external to the model and codebase.
    • Objective: provide common and consistent data definition between developers
  • The ICD: an artifact used to share interface information between components external to the model and codebase; often derived from or part of the requirements document set.
    • Objective: provide a common interface definition to simplify the integration of components when multiple people are working on a project.

An example of an ICD spec is

function namemyIncredibleFunction
function prototype(double mintGum, single *thought, something *else)
Call rateevent-driven
Multi-thread interruptibleyes
Function information
function specification

And here is where the boundary question comes up. In specifying the data type and dimension in the ICD I am duplicating information that exists in the data dictionary; violating the single source of truth objective.

Duplication can be dangerous

So what is the flow of information here? I would suggest something like this…

  • The ICD document is created as part of the initial requirement specifications
  • The data interface request is used to inform the initial creation of data in the data dictionary
  • Once created the data is owned by the data dictionary

Infrastructure: making your artifacts work for you

Data dictionaries serve an obvious purpose, they are a repository for your data. On the other hand, interface control documents can seem like a burdensome overhead; which it will be without proper supporting infrastructure. If you remember the objective of the ICD, to simplify integration, then the need for tool support becomes obvious. When a developer checks in a new component it should be

  • Checked against its own ICD
  • Checked against the ICD for functions it calls and is called by
  • Its ICD should be checked against the data dictionary to validate the interface definition

With those three checks in place, early detection of invalid interfaces will be detected and integration issues can easily be avoided.

ICDs and the MATLAB / Simulink environment

Recently MathWorks released the System Composer tool. While I have not had a chance to try it out yet it offers some of the functionality desired above. I would be interested to learn of anyone’s experience with the tool

Modular testing environments

Foundations define and limit the structures we create; this is as true in Model-Based Design as it is in architecture.  With that in mind, I want to use this post to discuss the concept of modular testing environments (MTE).  First, I will point to an earlier blog post “Testing is software“, before I drill deeper into the concept of MTE.

What is a modular testing environment?

A modular testing environment consists of 5 parts

  1. Test manager:test manager provides the framework for running, evaluating and reporting on one or more test cases. Further, the test manager provides a single hook for the automation process.
  2. Test harnesses: a test harness is the software construct that “wraps” the unit under test.  Ideally, the test harness does not change the unit under test in any fashion; e.g. it allows ‘black box’ testing.
  3. Evaluation primitives: the evaluation primitives are a set of routines that are commonly used to evaluate the results of the test.  Evaluation primitives range from a simple comparison against an expected value to complex evaluations of a sequence of events.
  4. Reporting: there are two types of reports, human and machine readable.  The human readable reports are used as part of the qualification and review process.  Machine-readable reports are used for tracking of data across the project development.
  5. Data management: testing requires multiple types of data, inputs, outputs, parameters and expected results.

Why is a modular testing environment important?

Having helped hundreds of customers develop testing environments the 5 most common issues that I have encountered are

  1. Reinventing the wheel, wrong:  Even the simplest evaluation primitive can have unexpected complexities.  When people rewrite the same evaluation multiple times mistakes are bound to occur.
  2. Tell me what happened:  When tests are pulled together in an individual fashion it is common for there to be limited or inconsistent reporting methods.
  3. Fragile tests: A fragile test is one where if the inputs change in a significant fashion the test has to be completely rewritten.
  4. “Bob” has left the company:  Often tests are written by an individual and when that person leaves the information required to maintain those tests leaves with them.
  5. It takes too much time:  When engineers have to build up tests from scratch, versus assembling from components, it does take more time to create a test.  Hence, tests are not written.

Final thoughts

Verification and validation activities are central to any software development project, Model-Based Design or otherwise.  The easier you make the system to use the more your developers will embrace them.

Managing data

In previous posts, I have covered data attributes and data usage.  In this post, I cover data management.  Within the Model-Based Design workflow, and traditional hand coding environments, there is a concept of model scoped and common data.  This blog post will use Simulink specific concepts for Data Dictionaries to show how scoped data can be achieved.

What is in common?

Deciding what goes118eb73e994c025de7f60b0689c4de10 into the common versus the model specific data dictionary is the primary question that needs to be asked at both the start of the project and throughout the model elaboration process.  There is always a temptation to “dump” data into the common data dictionary to “simplify” data access.  While in the short run it simplifies access, in the long run, doing so creates unmanageable data repositories.  So, again, the question is “what goes in there?”

Common data type specification

commonDataTypesThe common data types consist of four primary entries, each of which is created as a separate sub-dictionary.

  • Structure definitions
  • Enumerated data types
  • Data type aliases
  • Model configurations

In all 4 cases, these bits of information should be used in a global scope.  For example, structures used as an interface definition between two models or an enumerated data type that is used for modal control across multiple models.  In contrast, structures that are local to a single model should not be part of the common data types sub-dictionary.

Common data

Like the common data types, the commoncommonData data consists of sub-dictionaries.  In this case, there are three.

  • Physical constants
  • Conversion factors
  • Common parameters

The first two are simple to understand; instead of having the engineer put in 9.81 (m/s) for each instance of the force of acceleration a physical constant (accelGravMetric) can be defined.  Likewise, instead of hard coding 0.51444 you could have a parameter Knots_to_meter_p_sec.  (Note: in the first case, 9.81 is a value that most engineers would know off the top of their head.  The second case most people will not recognize and it results in “magic numbers” in the code.  This is compounded when people “compact” multiple conversion factors into a single conversion calculation and the information is lost)

The final sub-dictionary, common parameters, is the most difficult to scope.  Ideally, it should be limited to parameters that are used in more than one model; or more than one integration model.  To prevent the “mushroom growth” of data in the common parameter data dictionary regular pruning should be applied.

Pruning your data

Pruning data is the process of examining entries in a data dictionary and determining if they are needed in the common data or in a model specific dictionary.  Within the Simulink environment, this can be accomplished using the model explorer or programmatically


Model and integration model data dictionaries

In the section on model architecture, we discussed the concept of “integration models.”  An integration model consists of multiple sub-models, which, in turn, may contain sub-models.


The pattern for the integration model data dictionary mirrors the pattern that was shown in the initial diagram; the “twig” of the model tree references the branches, which in turn reference all the way back to the root.


Final thoughts

The use of scoped data dictionaries allows users to logically organize their data while minimizing the amount of work that individual contributors need to take to maintain the data.  This approach does not eliminate the need for data maintenance however it does provide tools to aid in the work.




Assessing your current state

During the exploration phase of adoption, the adoption team should have become familiar with the fundamentals of Model-Based Design e.g. model architecture, data dictionaries, testing, formal methods, report generation and so on. In that this blog is focused on groups adopting MBD it is a fair assumption that the current state is “introductory” (1) however the following sections can be used to identify where additional support is needed.(2)


The following tasks should be understood…


  • Encapsulation of models:  How to define each model in a format with defined interfaces and data.
  • Integration of models: How to integrate individual models into a larger integration model
  • Integration of existing code artifacts: How to integrate artifacts from the models into existing code artifacts and/or integrate code artifacts into the models

Data management

The following tasks should be understood…


  • Creation of data artifacts for use in the model: How to create data that the model can reference
  • Management of data artifacts: How to store and reference data artifacts in a scalable way.
  • Harmonizing data between model and existing code artifacts: How to reuse data between models and existing code artifacts


The following tasks should be understood…

  • Creation of test harnesses: How to create a test harness that will exercise the model in a “stand alone” method.
  • Creation of data: How to create the data used by the test harness both for input and output evaluation.
  • Creation of test evaluation methods: Creation of methods for evaluating the outputs from tests

Supporting tools

The following tasks should be understood…


  • Basic use of version control: The users should understand which files need to be placed under version control.

Final thoughts

The obvious question arises “how do I move from “introductory” to “ready?”  There are three primary methods

  1. Training: There are multiple training courses (industry) out there allowing you to learn about Model-Based Design (university).
  2. Papers: Both academic and industry papers exist to help you learn about MBD.
  3. Outside help:  Outside help can come from either hiring people with MBD experience or hiring outside consultants.
  4. All of the above…


(1) Having an introductory level of knowledge of Model-Based Design methodologies is not the same thing as an introductory level of knowledge of software development.  Most groups that adopt MBD have a strong software development background.
(2) Most of the “Why is my state so [XXX]” I understand.  However, I don’t know why Pennsilvina and Connecticut got “haunted” as their “Why.”

KPI for initial project…

What will and what were you measuring?

During the initial phase of the project, the key performance indicators (KPI) are, generally, not measured.  However, it is at this stage of the project when you start thinking about what you should measure and how you will measure it(1).

First a word of warning, metrics are useful but they rarely provide the full picture(2).  That being said there are metrics that can be monitored

Bugs found at stage X…

One of the benefits of a Model-Based Design approach is the ability to detect bugs earlier on in the development process.bugs However, one side effect is the observation “we find so many bugs when following an MBD process.(3)”  Hence the metric that should be tracked is the number and severity of bugs found at each stage of the development process.  To determine the impact of finding the bugs early in the development a multiplier can be applied to the cost of the bug…

cost = Severity * StageConstant

Test and requirements coverage

Model-Based Design allows users to track the coveragePolyspacerequirements and testing coverage through automated tools(4).  With respect to test coverage; there are two types of test coverage.  The first is requirements based test coverage; do tests exist to cover the requirements.  The second are formal metrics coverage such as MCDC coverage.

The objective with coverage tracking is to see a steady increase in the percentage coverage over the development cycle.

Integration and development time

The final “primary” metrics are the development and integration time.  The development time is straight forward, how long from receipt of requirements to final check in of a model that satisfies the requirements (and test cases).  The integration time is a bit more interesting.

In an ideal workflow for Model-Based Design, there is integrationHandsan integration model that is created at the start of the development cycle.  Individual developers check their models against that integration model to ensure that there are no conflicts.  Hence in an ideal flow, the integration time should be near zero.

However, since there will be changes as the project develops, the integration model will change and the systems engineer will need to validate those changes.  Like the bug detection finding integration issues is done further upstream in an MBD process.  Again the metric should use a weighted value based on the stage of where the integration issue is found.

Final thoughts

This post covered what can be measured, not how to measure them; this will be covered in future posts.  Additional metrics can be covered however take care in having too many metrics and frustrating those who are developing with a heavy “tracking debit”.


(1) For this post, I am assuming that you do not currently have metrics in place.  If you have existing metrics they can be adapted to the MBD environment.
(2)Trying to capture all activities in development can, in fact, be detrimental in that it takes away time from development.  Always try to automate metric collection and, when not possible, simplify the process of creating this data.
(3) I have gotten this comment on many engagements; people mistake finding bugs in a process for not having bugs in their earlier process.  While there will be new bugs due to adopting a new process it is rare that an old process did not have any issues.
(4) Setting up the automation tools is something that is done in future steps of the adoption.

Model-Based Design foundational concepts

In previous blog posts, I have gone into some depth on testing, architecture and data management for models.  With this post, I will cover how these three activities for the foundation of any Model-Based Design process.

Foundations and core competencies

At later stages in the adoption of Model-Based Design crawl-walk-run-fly-300x191
processes task-specific groups will emerge (development, verification, and systems.)  However, at the start of the Model-BAsed Design process users from all groups need to determine the “common language” that will be used to develop their project (1).

Architecture and data sure, but why testing?

The identification of architecture and data as foundational concepts is generally well understood.modelcentricgifslow  Combined they define how people will develop the model through interfaces and clear communication.  So why testing in the trinity?  It returns to the core concept of Model-Based Design that models are at the center of the development process.  To ensure that the model can be used consistently through the development process they need to be “locked down”(2) with test cases.

Driven by this objective, the testing environment is designed at the start of the development process.  The requirements of the test environment should be addressed within the architectural and data infrastructure.  The good news is that best practices for the three “legs”(2) of the MBD stool are already in the well defined; it is a matter of honing them to your specific project and environment.

Final thoughts

This post is not intended to be technical; rather it is to remind us as we develop new processes to start out with the “best path forward” from the start.  In the section about the validation project, I will discuss the next round of tools that are commonly adopted.


(1) I may start using this graphic as my tag for MBD adoption.  Crawl = investigate.  Walk = initial.  Run = validated.  Flying = optimizing.
(2) Tests cases are elaborated as the model is developed.  The “lock down” is achieved through the use of a continuous build and test server.
(3) The metaphor of a tripod or stool can be overused.  But, to push it one last time, this is your stepping stool to the next round of MBD tools and processes.  Build it well and it forms a strong foundation.