Big Data (the other kind)

Back when I swam competitively(1) there was a saying: “train like you compete”; which is to say the habits you develop when training are the behaviors you exhibit when you compete. The same principle holds when developing a model; the block and data architecture you select during development carries into production. So… onto “big data.”

Three little bears

Like porridge or beds, we want our data to be “just right”; neither too big nor too small; so we are talking about how much RAM or ROM the data consumes. Basic data types use 8, 16, 32 or 64 bits. Depending on your processor the minimum and maximum data size may be different.(3) In designing your algorithm select the smallest data type that accurately represents your data and using simulation verify that this data type is sufficient for the full system.

Let’s get real. Talking about complex data

When I write (about complex data) I am not being imaginary, that is a hole pole apart.(4) Rather in this instance I am thinking of vectors and structures. The rule of thumb for vectors is simple, allocate enough entries for the data that needs to be stored. Structures require a rule of hand.(5).

Structures provide a method for grouping different variables and data types together into a single package. From a data organization perspective this provides a clear advantage as all the related data can be found in one location. When thinking about size of data the question becomes “how closely are you related?”

A concrete example

If I was creating a physical model of a road I would want to know the attributes of the concrete I was pouring. Attributes include tensile and compressive strength, porosity, density, thermal conductivity, cost per cubic foot and many more. My full physical model will require all of those attributes but not all parts of my model require all the parts.

  • Construction “strength” model: tensile & compressive coefficients & density
  • “Wear and tear” model: porosity, thermal conductivity
  • Lifetime cost model: cost per cubic foot…

Megastructures

If I created one “megastructure” including all the data then I would pass the following data (for this simple example):

  • 3 unneeded data bits to “strength”
  • 4 unneeded data bits to “wear and tear”
  • 5 unneeded data bits to “cost”

Over time and in more realistic examples this unnecessary data quickly adds up consuming processor memory.

Footnotes

  1. H2Okies(2)
  2. The fact that the swim team name had a chemistry pun embedded endeared me to the team right away.
  3. If your smallest data type is 16 bits, creating an 8 bit variable does not save any memory and may in some cases result in different behaviors between Simulation and the deployed code. Setting the target hardware configuration for your model addresses this issue.
  4. Complex data can also mean data with real and imaginary parts (e.g., A = 1 + 3i). More often than not it is plotted in polar graphs with “poles and holes” for the real and imaginary components.
  5. The most famous “rule of hand” would be the “right hand rule.” In this case I am just thinking that unlike rules of thumbs I have 4 more points…

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.