Most years the “year in review post” seems like a good time for a lighthearted review of what has been and what is to come; with the events of this year some amount of gravity seems in order.
A safety critical life
With one of my my earliest mentee I described the job of an engineer as “trying to find the minimum in Murphy’s equation and its first derivative.”; that is to say, design for the best outcomes and protect against the worst. The blog posts this year have focused heavily on Verification and Validation methodologies (finding the minimum) and writing requirements (protecting against the worst). It is easy to see the parallels between these tasks for Model-Based Design and for life during Covid times.
The pendulum swings
With the introduction of COVID-19 vaccines there is an understandable desire to “go back to normal”; in fact some studies suggest that post events we will swing in the opposite direction; depending on how you look at it, this could be seen as embracing life or throwing caution to the wind. They key as I see it is to embrace life while remembering the lessons learned as the pendulum will swing back.
Looking forward 2021
If hindsight is 20/20, and yes I have been waiting all year to make that joke, as a ratio (20/21; 20/22;…20/99) our view into the future is less clear. In 2021 this blog will be moving to a bi-weekly update (Tuesday / Thursday) and I will be writing more “multi-part” posts to go into greater depth on topics of interest.
I want to wish everyone a safe and happy 2021; I look forward to your comments in the coming year.
Welcome to the home stretch! In today’s post we finish up the engineering process for the fault insertion. We left the last post with three outstanding questions; in todays post, I will walk through the design decisions that I made in implementing this solution.
How much of the tool should be automated?
What is the end user responsible for implementing?
How do we make this scalable?
Automation and scalability
The weak point in this solution revolves around the naming and maintaining the fault injection names; e.g., as the end user adds new “faults” into the system there is a need to create and define the global data, ensure that it is unique, and add it into the test harness.
Check and define
The first two tasks are accomplished using a masked subsystem and a setup script; when the user double clicks on the fault subsystem they are prompted for a fault name. Closing the dialog will then launch a script that checks and defines the fault data.
The first step is to check to see if the variable already exists; this is done by evaluating the base workspace to see if the data is already defined. If it is a warning, dialog is generated.
If the data does not exist, we assign the names of our fault data to the data store read and data store writes inside the library subsystem.
Finally, we assign the data to the base workspace. As part of this task we inspect the incoming signals to determine the data type and dimension.
This code takes care of what happens when you add a block. However, in the long term you are just opening up your model with the data already in place. The next step is to provide a method for saving off the data and adding it to the test harness. In the design stage I tried out methods for auto-triggering this save and add step; however all of them had potential data conflicts. In the end the solution was to provide a script that the user could run to generate the data for themselves.
At this point with this code, our end users have the basic functionality that is required. Additionally, because of how we implemented it, the fault blocks can be inserted anywhere in the model, not just between referenced models which was one of our “stretch goals.” What remains now is the “error checking” or ruggedization tasks required to have a product out in the field standing on it’s own. That, however, will be a future set of posts.
Having ended the last post on a cliff hanger, I’m going to reward your return with an answer to two questions that no doubt plagued your thoughts. “Why Fault Lines?” and “what changed in the problem statement?” The first answer is because I am inserting a “fault” on the Simulink “line.”(1)
Changes to the problem statement
At the end of the last post I had the problem statement
For a system of models, create a method for simulating dropped signals for models not directly connected to the root level I/O
As I started working through possible solutions to this process I hit on a further problem. How do we keep this method from impacting the code generation process for the full system?
The prototype solution
The normal method for creating tests that do not change the models’ released behavior is to create a test harness that “wraps” the unit under test. However, in this instance since our objective is to work with models inside the system this approach will not work. Fortunately Simulink provides utilities to enable this.
The Simulink Block “Environment Controller” is the first key; we can have a default path (e.g., what is deployed) and the fault path (e.g., what is simulated). When code is generated the blocks in the “faultStuff” subsystem are optimized out of the generated code.
Command from the top?
Having solved our first (new) problem(2) how do we get the data from the driving test harness to the “faultStuff” subsystem? The solution, in this case, was to use global data.
While as a general practice the use of global data is not recommended, in this case it was considered an acceptable solution for 3 reasons:
The global data would only exist in the simulation version of the model
Creation of the global data would be automated and not available to the developer
Isolation of the global data could be validated as part of the fault-line tool
Two new tasks and a way forward
In developing my rational in support of global data I created two new tasks for myself (reasons 2 & 3).(3) However, for the prototype I just needed to create the example version of the faultLine.
There are three parts to this solution
The fault subsystem: this subsystem consists of two data store read blocks and one data store write block. Based on the value of “faultLineActive_<NAME>” either the model path data or the fault value (faultLineValue_<NAME>) will be passed onto the simulation.
The data definition: to use the global data, the Data Store Variables must be declared as part of a data dictionary. The full definition of the data e.g., type and size must be provided.
Excitation: the final part is the “driver” of the test; in this example a simple test sequence that turns on and off the fault at a set time.
From prototype to release
Moving from a prototype to a released version of the tool asks a new set of questions.
How much of the tool should be automated?
What is the end user responsible for implementing?
How do we make this scalable?
As with the last post, I will leave you on a cliff hanger…. What are the design decisions that take this from prototype to reality?
Perhaps not as earthshaking as you hoped, but well, I do my best.
The process of creating a final released product is often one of iterative problem detection and solution, with each subsequent problem being smaller then the previous one.
And these new tasks fit in with the “iterative problem detection” description of footnote 2.
Whether you prefer fishing(1) or gardening(2) metaphors, for the next few posts I’m going to take you into the process I follow when solving a problem to show you how I fish.
The problem statement
As is often the case, I will pull a problem from work I have done across multiple customers. I do this for two reasons. First, I have vetted the solution “in the field,”(4) and second I know that this is an issue that is commonly faced.
How do we simulate dropped and corrupted signals inside our system?
The first step
The first step in designing a solution is to understand the problem. The questions that I have are…
Does a solution already exist?
What does “inside our system” mean?
In the deployment code where and how can signals be corrupted?
The critical question to answer is “where and how?” There are three primary ways that signals and data can be corrupted. The first is through incorrectly written code; since there are other methods for determining this issue it is usually dropped from the analysis. The second method is through interrupt data events (e.g., the execution of a function is put on hold while a higher priority function executes). The value of data changes during the interrupt resulting in corrupted data. This is of potential interest, however it results in a very large testing scope as the interrupt can occur at any point in function. As a result solutions to this problem are placed at a lower priority.(5)
The final method is through corruption of transmitted signals (e.g., messages sent over CAN / LIN / RS-232).(6) Thinking about this issue, we get the solution to our second question: what does “inside the system” mean? Transmitted errors occur at the boundary of functions, and from there we have an answer to the first question: does a solution already exist…
What we already have…
For individual models we have a solution; test vectors can be created that simulate dropped and corrupted signals. However for a system of models, Simulink does not have a built-in method for dropping signals for models on the “inside” of the system.
With this understanding I came up with a new problem statement. One that reflected the design I would build towards. As you will see in the upcoming blog, there is further iteration on the new problem statement as the life cycle issues are considered.(7)
For a system of models create a method for simulating dropped signals for models not directly connected to the root level I/O
E.g. give a person a fish, they eat for a day, but teach a person to fish and they eat for life.
A teacher is a gardener who plants the seeds of learning in your mind(3)
That both of these metaphors are about foundational tasks of humanity gives an idea how old this concept is.
Since one of my customers was an agricultural company it was literally validated in multiple fields
That we talk about priorities with interrupts is only appropriate as the priorities determines if the interrupt can occur.
No interface, no matter how ancient ever fully goes away.
This is my “cliff hanger” to bring you back to the next post.
Over the past few weeks I have been thinking about how to test systems while introducing faults at lower levels within the system. At the topmost level the problem is simple; faults can be treated as just a signal. At a lower level either the model would need to be instrumented or the computational chain would need to be understood.
Where are the faults?
Within controls design a fault is defined as either an interruption in, or else a delay or “crossed” signal in the data flow. Our first question is to ask “where and how do faults occur?” Faults occur at the boundaries of systems, either though direct I/O or communication protocols (CAN/LIN/Ethernet…) or through errors in global memory (pointer corruption).
In this example we wrap our UUT (unit-under-test) as a referenced model. In the test environment we point to a secondary referenced model that includes a fault-injection block. Taking this approach we can instrument the UUT without challenging the UUT.
The actual “injection” block can be comprised of almost anything. In this case I used simple test sequence block to short the signal after 5 seconds
When to fault test?
Fault based testing is an edge case for testing; not all systems require or merit fault testing. A general rule of thumb is…
If the system is safety critical and
The system directly communicates with external systems and
The system does not have redundant communication then…
…the system should have fault testing implemented.
At the heart of every controls system is a feedback loop. An event occurs, it is measured, there is a response and then the disturbances are measured again; continue until “controlled.” Without thinking, on a daily basis we engage with feedback loops; someone says something to you, you respond. If your answer clarifies things you move closer to a “closed loop,” if not, there is an error detected and you and your conversational partner resolve the difference.
Within Simulink models the closing of a loop is realized using a state variable; often a unit delay that provides data from the last pass to the controller.
Being a controls engineer requires two things, to understand physically what a controls loop is and programmatically how it is implemented. Today I will give examples of the 3 everyday ways in which we engage in feedback loops…
As you drive along the road, you see the cars in front of you. As you get closer you either slow down or switch lanes to pass. In this example your speed is the controlled variable in response to the distance between you and the other cars.
In the heat of the summer you may open up your home at night to let the cool evening air in; both freshening up your dwelling and bringing the temperature down. As the sun rises you close up the window and draw the blinds to keep the heat from making your home too hot. Opening and closing blinds and shades; this is your response to the heating input.
Scratching and Itch
There is nothing quite like that feeling when you hit the spot that itches, you scratch for a few moments and the issue goes away. However, this is a good example of a potential problem with un-dampened feedback; if you don’t stop you end up doing more harm than good.
Now it is your chance. Drop me a line in the feedback section to help me “tune” this blog.