A few thoughts on the measurement of coffee?

I like a good cup of coffee, while my wife Deborah is partial to tea and we both enjoy sharing a cup of hot cocoa on cold winter nights. (1) Recently we needed to buy a new kettle for our hot brews, so we purchased one that had a thermometer built in. This enabled brewing at the proper temperature, but we couldn’t tell what difference it was really making.

In the field, or in the model?

For experimental data it is common to “over log”; that is, to log as many data channels as your system can handle. This ensures if there is an unexpected event(2) you have the best chance of capturing and understanding the event. By contrast in simulation, the environment(3) is controlled and repeatable so unknowns are of low probability.(4) This means that “over logging” just slows down the simulation time.

How to determine what to log?

In the ideal world you are able to understand the system based off of first principles physics.(5) However, it is often the case that the system as a whole has too many interconnected models that writing out the full system of equations cannot be realistically performed. In that case how do you determine what to log?

The approach I recommend in this case is “self and nearest neighbors”. In other words if you cannot determine the full set of equations that define your whole system, break down the system into components (perhaps at the model reference or lower level) and determine what are the inputs and outputs of those systems. Take the inputs of your Unit Under Test (UUT) and the units directly connected to the UUT and use that to determine what to measure.

Back to coffee

I’ve started experimenting with coffee (okay, not as rigorously as in the milk first/tea first tea experiments), to determine the factors providing the optimal cup? There are 4 primary factors in the outcome of the cup of coffee.

  • Water temperature, Rate of extraction, coffee dose to water amount, coffee quality

The question then is, what is the relative weight to assign to each variable

GoodCup = β(1)* WaterTemp + β(2) * dr/de + β(3) * CoffeeD + β(4) * CoffeeQ

Through a few simple experiments I learned my personal weights heavily lean towards β(3) and β(4) e.g. the temperature effect was minimal. (6) In the same way when designing models, think twice before measuring once. (7)

Footnotes

  1. Perhaps one day we will buy this chocolate teapot for our hot cocoa
  2. The “unexpected” is what you most want to capture; expected data can often be calculated, it is when the dice role snake eyes that you learn the most.
  3. As an interesting side note, I often make 2 “plant” models. One that models the real world (the environment) and one that models the device I am controlling (e.g. the road and the car, air and an airplane, human veins and the I.V. system).
  4. Unlike the real world where unknowns are random events, the unknowns in simulation arise from modeling errors, and when that occurs, adding in additional logging is important.
  5. I was amused that the image of the book cover read “Note: this is not the actual book cover”! The use of a classical cover for fundamental physics seemed spot-on.
  6. There is a side benefit to brewing at the correct temperature, fewer cases of “wow that’s hot” on the first sip of coffee
  7. Unless you are cutting wood in which case it is design once; check your design and measure twice, cut once.

Testing your testing infrustructure

Ah tests! Those silent protectors of developments integrity, always watching over us on the great continuous integration (CI) system in the clouds. Praise be to them and the eternal vigilance they provide; except… What happens to your test case if your test infrastructure is incorrect?

Quis custodiet ipsos custodes(1)

There are 4 ways in which testing infrastructure can fail, from best to worst

  1. Crashing: This is the best way your test infrastructure can fail. If this happens the test ends and you know it didn’t work.
  2. False failure: In this case, the developer will be sent a message saying “fix X”. The developer will look into it and say “your infrastructure is broken.”(2)
  3. Hanging: In this case the test never completes; eventually this will be flagged and you will get to the root of the problem
  4. False pass: This is the bane of testing. The test passes so it is never checked out.

False passes

Prevention of false passes should be a primary objective in the creation of testing infrastructure; the question is “how do you do that?”

Design reviews are a critical part of preventing false passes. Remember, your testing infrastructure is one of the most heavily reused components you will ever create.

While not preventing false positives, adherence to standards and guidelines in the creation of test infrastructure will reduce common known problems and make it easier to review the object

There are 3 primary types of “self test”

  1. Golden data: the most common type of self test is to pass known data that either passes or fails the test. This shows if it is behaving as expected but can miss edge cases(3)
  2. Coverage testing: Use another tool to generate coverage tests. If this is done, then for each test vector provided by the tool provide the correct “pass or fail” result.
  3. Stress and concurrency testing: For software running in the cloud, verification that the fact that it is running in the cloud does not cause errors(4)
  4. Time: Please, don’t let this be the way you catch things… Eventually because other things fail, false positives are found through root cause analysis.

Final thoughts

In the same way that nobody(5) notices water works until they fail, it is common to ignore testing infrastructure. Having a dedicated team in support is critical to having a smooth development process.

Footnotes

  1. In this I think we all need to take a note from Sir Samuel Vimes and watch ourselves.
  2. There is an issue here; frequently developers will blame the infrastructure before checking out what they did. Over time the infrastructure developers “tune out” the development engineers.
  3. Sometimes the “edge cases” that golden data tests miss are mainstream but since they were not reported in the test specification document, they are overlooked by the infrastructure developers.
  4. The type of errors seen here are normally multiple data read / writes to the same variable or licensing issues with tools in use.
  5. And if you look at it with only one eye, failures will slip passed.

This is “only” a test

In the last blog I introduced the best practices for designing scenario based tests. Today I am going to cover the, non Herculean(1), task of generating test vectors.

Good vector definitions have resolution down to the smallest time step

The “giddy” set-UP

Starting off happily let’s consider 3 things; the unit under test, the test harness and the analysis method.

  • Unit Under Test (UUT): The UUT is what you are testing. For the test to be valid, the unit must be fully encapsulated by the test harness. E.g. all inputs and outputs to the UUT come through the test harness.(2)
  • The test harness: (3)Provides the interface to the UUT, providing the inputs reading/logging the outputs. Test harnesses can be black, white or grey box. Test harnesses can be dynamic or static.(4)
  • Analysis method: Dynamic or static; how the results of the test execution are evaluated.

Not to put the cart before the horse but; we start with a test scenario. We need the test vectors. To have test vectors, we need a test harness. To have a test harness we need a well defined interface.(5)

Within the software testing domain (which includes MBD) a well defined interface means the following:

  • All the inputs and outputs of the system are known: Normally this is through a function interface (in C) or the root level inputs / outputs in a model
  • Type and timing are known: The execution rate (or trigger) for the UUT is known as are all of the data types and dimensions of the I/O.

Time to saddle up!

No more horsing around, once you have your interface designed, it is time to create your test harness. Given that we are working in the domain of Model-Based Design, the ideal objective is to automatically generate a test harness. (To all the neigh sayers out there)

A well defined interface!

Signal time!

There are four basic methods for creating signals

  • Manually: Ah…good old fashioned hand crafted test vectors. These take the most time but is where we normally start.
  • Automatically (general constraint): The next step up is to create test vectors using an auto generation tool. These tools generally allow for basic “types of tests” to be specified such as range, dead code, MCDC.
  • Automatically (constraints specified): The final approach is to use a test vector generation tool and apply constraints to the test vectors.
  • From device: Perhaps this is cheating, but a good percentage of input test vectors come from real world test data. They have all the pros and cons(6); noise and random data; they may not get what you are looking for but…

UUT and constraints

In this example we have the UUT and a “Test Assessment Block” as our method for imposing constraints. What we program into the Assessment Block is what we want to happen, not what we are checking against(7). For example, we could specify the input vectors for the WheelSpeed, WheelTqCmd and SlipRationDetected are at a given value and that the output vector is ABS_PWM . The automatic test vector generation would then create a set of tests that met that condition. You could then check for the cases where the ABS_Fault should be active.

COVID-19 Acceleration: issues with “from the device”

When you social distance from your co-workers you are, more often then not, social distancing from your physical hardware. This directly impacts the ability to gather “real world” test data. My prediction is that we will see 4 trends as a result.

  1. Greater use of existing real world data / public domain data sets: Lets be honest, there are times that data is gathered because it is easy to do so; go to the lab run the widget, collect the data and go. However there is, no doubt within your company and within government, and university data bases a wealth of existing data that will match what you need down to the 90% level
  2. Increased automation of test data collection: To some extent being in a lab or in a vehicle will always be required for collecting data, however many of the processes around setup, data collection and data transmission can be automated to reduce both the time on site and the frequency of the time on site.
  3. Improved physical models: I know what you are thinking, this is about collecting real world data! What sort of trick is this(8)! What I am suggesting is that collection of physical data will be prioritized for the creation of better physical models to reduce the net time in lab.
  4. In use collection: The next step will be the transmission of data from existing objects in the field back to the manufacture. The model “IC-2021” freezer in the field will, most likely, share 95% of the same hardware and software. This means you have a lab in the field.
The Lambert projection for more projects see

All of these methods will be used going forward to supplement traditional real-world data collection methods. With the physical modeling approach I am going to dive into how to select data to collect to rapidly improve the models. With the “in the field” we will look take our first look at big data methods.

Final thoughts

Test vectors are just one part of the overall testing infrastructure; the necessary starting point. We are going to keep looking at all the points along the Verification and Validation process; both in depth and at the impact that COVID conditions continue to have.

Footnotes

  1. With the use of one last Greek hero of antiquity, I hope to build a metaphor for the 12 labors of Hercules as applied to testing (with far fewer labors)
  2. We will look at how large the UUT should be in another blog post. For now, we will give the ballpark that a UUT should be linked to 5 ~ 8 related requirements. Each requirement will have multiple tests associated with it.
  3. A good test harness should be like the harness for a horse, e.g. provides a secure connection to the horse (software) enabling it to run fully, have the minimum number of attachment points (e.g. don’t overload with test points) and connect without chaffing (crashing or changing the behavior of the code).
  4. A dynamic test harness has the test validation software as part of the test harness, e.g. the UUT is evaluated as the test is run. A static test harness simply logs the data for post processing.
  5. Step 1 is to swallow a fly, today you will learn why!
  6. Noise is, and is not a problem. Since it will exist in the real world you should welcome noise into your test cases since that is what you will find once you deploy your product once and for all.
  7. As an example of what we want to happen, we may want to get an dessert (objective) but do not want one with coconut flavor (test).
  8. Not a very good trick, and 8! is 40,320.

Modular testing environments

Foundations define and limit the structures we create; this is as true in Model-Based Design as it is in architecture.  With that in mind, I want to use this post to discuss the concept of modular testing environments (MTE).  First, I will point to an earlier blog post “Testing is software“, before I drill deeper into the concept of MTE.

What is a modular testing environment?

A modular testing environment consists of 5 parts

  1. Test manager:test manager provides the framework for running, evaluating and reporting on one or more test cases. Further, the test manager provides a single hook for the automation process.
  2. Test harnesses: a test harness is the software construct that “wraps” the unit under test.  Ideally, the test harness does not change the unit under test in any fashion; e.g. it allows ‘black box’ testing.
  3. Evaluation primitives: the evaluation primitives are a set of routines that are commonly used to evaluate the results of the test.  Evaluation primitives range from a simple comparison against an expected value to complex evaluations of a sequence of events.
  4. Reporting: there are two types of reports, human and machine readable.  The human readable reports are used as part of the qualification and review process.  Machine-readable reports are used for tracking of data across the project development.
  5. Data management: testing requires multiple types of data, inputs, outputs, parameters and expected results.

Why is a modular testing environment important?

Having helped hundreds of customers develop testing environments the 5 most common issues that I have encountered are

  1. Reinventing the wheel, wrong:  Even the simplest evaluation primitive can have unexpected complexities.  When people rewrite the same evaluation multiple times mistakes are bound to occur.
  2. Tell me what happened:  When tests are pulled together in an individual fashion it is common for there to be limited or inconsistent reporting methods.
  3. Fragile tests: A fragile test is one where if the inputs change in a significant fashion the test has to be completely rewritten.
  4. “Bob” has left the company:  Often tests are written by an individual and when that person leaves the information required to maintain those tests leaves with them.
  5. It takes too much time:  When engineers have to build up tests from scratch, versus assembling from components, it does take more time to create a test.  Hence, tests are not written.

Final thoughts

Verification and validation activities are central to any software development project, Model-Based Design or otherwise.  The easier you make the system to use the more your developers will embrace them.

Video Blog: Fault detection

This video blog looks at fault detection and error handling.  The included images of State Machines show templates for how I generally model fault detection algorithms.

In this first example there are two things to note:

  1. Debounce protection:  Returning from “move to fault” and “no fault.”  The signal needs to fall below the trigger signal – a delta to the signal.  This prevents “jitter” in the signal.  (Green circle.)
  2. Temporal logic:  The move to “in fault” only takes place after you have held the fault condition as true for a set period of time. (Orange circle and black circle.)

faultPattern

The next example is more complex; in this example, a single variable “engine temp” can result in two different error modes.  “High Temp” or “Critical High Temp.”  In reality, the pattern is a slight variation on the previous version however it shows how it can be expanded to more complex fault conditions.

criticalTemp

User-friendly testing environments: Analysis and testing

Within a software development organization, whether for embedded code or a desktop application, there are distinct roles.  They are the controls engineer, the system architect, and the quality engineer.  Depending on the size of the development team some of these roles may be done by a single person.

Analysis versus testing

During the development phase of a project, the controls engineer should perform analysis tasks on the model.  These analysis tasks enable the controls engineer to determine if the algorithm they are developing is functionally correct and is compliance with the requirements.

It is common for the analysis tasks to be performed in an informal fashion.  It is common for engineers to simulate a model and then view the graphs of the outputs to determine if they have correctly implemented the algorithm.

The differentiating word in this description is informal.  When comparing analysis with testing we see that testing (either verification or validation) requires a formalized and “locked down” framework.  How then can the informal analysis be used during the formal testing?

Transitioning from analysis to testing

Ideally, the transition from informal analysis to functional combined-testing-analysistesting would flow seamlessly.   However, it is often the case that the work done in the analysis phase is thrown away in the transition to the testing phase.   This is understandable in a non-MBD environment but with the single truth approach of MBD, the analysis results should not be thrown away.  This is where the idea of “golden data” comes into use.  Golden Data is a set of data, both inputs, and outputs that an experienced engineer verifies as meeting the requirements of the algorithm.

Enabling the use of golden data to create test cases

who-moved-my-data-why-tracking-changes-and-sources-of-data-is-critical-to-your-data-lake-success-by-russ-savage-cask-3-638The easiest way to enable the use of golden data is to provide controls engineers with a simple interface in which they can provide the analysis data set and the information that transforms it into a testing data set.

Analysis data is transformed into test data by providing a method for “locking down” the results.  To locked down data the controls engineer needs to provide the test engineer information on what is expected from the analysis data.  This information could include the following types of golden data tests.

  • Strict:  The output data from testing must match the golden output data exactly.  This is normally done for Boolean or integer outputs.
  • Tolerance:  The output data from testing must match the golden output data within some bounded tolerance.  Tolerances can be absolute, or percentage.  Note special care needs to be taken with data with values around zero for percentage based tolerances.
  • Temporal: The output from testing must match the golden output data within some time duration.  These tests can also include tolerance and strict conditions.

In addition to the type of golden data tests to run the controls engineers should include information on which requirements the test maps onto.

Formal tests in support of development

In the same way that golden data can support testing the formal testing can support the controls engineers by informing them of the constraints that the requirements place on their design.  This can only be achieved if the tests are easy for the controls engineers to run.

What is “user-friendly?”

User-friendly interfaces for testing are defined by the following characteristics

  1. Data is accepted in “natural” format:  Any formatting or interpolation of the data is performed by the testing environment.
  2. Test results are presented in “human readable” format:  The results from the tests should be provided both in a summary format (pass/fail) and with detailed data, such as graphs and tabular data.
  3. Selection and execution of tests should be simple: Tests should be launchable from a user interface that provides a list of the valid tests and enables the running of tests in either single or batch modes.
  4. Test files should be automatically associated:  The management of test data (inputs and results) should be handled by the test manager.

friendly

Final thoughts

This blog post has described how information should be shared and how tests should be run.  In an upcoming post, I will cover the basics of modular test design.

 

 

Testing is software

This blog is a re-posting of early work from linkedin; I will be re-posting this week while I am at the Software Design for Medical Devices Europe conference in Munich.

Enabling the adoption of Model-Based Design

Test early, test often, test against requirements and test using formal methods. This is the mantra that developers (hopefully) hear. But what does it mean in practice? How do you produce effective and maintainable tests? I will argue that the first step is to think of test development in the same light as software development. Good testing infrastructure has requirements, is built from reusable components and written in a clear fashion to facilitate extensions and debugging efforts.

Why should you care?

In my 20+ years working in software, 2/3 of it in a consultative role, the most common problem I am called in to work on is mushroom code(1). Mushroom code is the end result of unstructured development, new algorithms are added on top of existing algorithms with little understanding of what it is feeding on. The result is an organic mess that is hard to sort out. This is prevalent in algorithmic development and even more common in testing which is often done “late and under the gun”

Testing components

A fully developed testing infrastructure consists of 5 components, a manager, execution methods, harnesses, reporting methods, and evaluation methods.

1.) Evaluation methods: use the data created through the execution of the test to determine the pass / fail / percentage complete status of the test:

Example a.) A MCDC test the evaluation would determine the percentage of conditions taken

Example b.) A regression test could compare the output values of between the baseline version of the code and the current release.

2.) Reporting methods: take the data from the evaluation methods and generate both human readable and history reports. The history reports are used to track overall trends in the software development process.(2)

3.) Harness: the harness provides a method for calling the unit under test (UUT) without modifying the UUT. Note test harnesses facilitate black box testing, e.g. the internal states of the unit under test are not known. However if internal states of the UUT are outputs at the root level of the model then white box testing can be done using the unit under test.(3)

4.) Execution methods: is how the test is run. This could be the simulation of a Simulink model, the execution of a .exe file, static testing (as with Polyspace) or the Real-Time execution (4)of the code.

As the name implies there is more than one “execution method.” They should be developed as a general class that allows the same method (simulation) to be applied to multiple harnesses. Each instance of a execution method applied to a harness is considered a test case.

5.) Test manager: is were all of these components come together. The test manager

  • Holds a list of the test cases
  • Automates the loading of associated test data
  • Triggers the execution of the test
  • Triggers the evaluation of the results
  • Triggers the generate of the test report

Sadly it will not yet fetch you a cold beverage.

Notes

1.) Mushroom code and spaghetti code are similar that they develop due to a lack of planning. Spaghetti code is characterized with convoluted calling structures; mushroom code is accumulation of code on top of code.

2.) An interesting list of what should go into a report can be found here.

3.) Any model can be turned into white box testing if global data is used. However the use of global data potential introduces additional failure cases.

4.) Yes, this blog retreads the work from 6 months ago, however it is good to review these issues.

Model-Based Design: Projects of interest

Early in my career one of my mentors made the statement

“If we understand the system we can model it.
If we can model it we can make predictions.
If we can make predictions we can make improvements”

In the past 20+ years, I have not heard a better statement of the driving ethos behind Model-Based Design.

If we understand the system, we can model it:(What we need to do): when the system is understood, it can be described mathematically.  This could be a derived first principal model or a statistical model; the important thing is that the confidence in the model fidelity is understood.

If we can model, we can make predictions:(What we can do): once the model is known it can be used.  The use of the model can be in the design of a controller, predicting a rainfall or embedded within a system to allow the system to respond with better insight.

If we can make predictions, we can make improvements:(Why we do it): this last part is the heart of Model-Based Design.  Once we can make accurate predictions we can use that information to improve what we are doing.

Model and equation…

Models build on a foundation of equationsflightdynamics to provide a dynamic, time variant representation of the real-world phenomenon.  Moreover those equations are working as part of a system; you leverage models when you move into complex systems with multiple interdependent equations.  Within the Model-Based Design world, we most often think of these systems as closed loop system.  Similar examples can be seen in the social sciences, in biology and chemistry.

Understanding from a sewage treatment plant…

Coming from an aerospace background, and starting my working career out in the automotive industry the general nature of models sunk in during one of my earliest consulting engagement; helping a customer model a sewage treatment plant to determine optimal processing steps against a set of formal requirements.

  • Requirements
    • The plant may not discharge more than N% of water in untreated state
    • The plant’s physical size cannot exceed Y square miles
  •  Objectives
    • Minimize the total processing cost of sewage treatment (weight: ω)
    • Minimize the total processing time of sewage (weight: λ)
    • Maximize the production of  energy from bio-gas (weight: Φ)
    • ….
  • Variants of inputs
    • Sewage inflow base rate has +/- 15% flow rate change
    • Extreme storm conditions can increase flow rate by 50%
    • ….

The final system model included bio-chemical reactions, fluid dynamic models, statistical “flush rates” and many other domains that I have now forgotten.  The final model was not able to answer all of the questions that the engineers had, however, it did allow them to design a plant with significantly lower untreated discharge rates and lower sewage processing costs.  This was possible because of the models.  This was the project that showed me just how expensive Model-Based Design is.