Tuesday, March 31, 2020

Don't It Make My Brown Eyes Blue

The Role of Ambiguity in Real Life and Relational Database Applications

For a long time I have maintained that ambiguity is not good in a relational database application even though it is required in humor, poetry, politics, and as it turns out, in song lyrics—"Don't it make my brown eyes blue". That line depends on the dual, i.e. ambiguous,  meaning of "blue". To get it, you have to be competent not only with the dictionary meaning of blue as a color, but the metaphorical meaning of blue as an emotional state. That ambiguity adds spice to the song.

Ambiguity is not so good, however, in a relational database application. We can't trust data if it might mean one thing today and something else tomorrow. Color is color, emotion is emotion. Sure, relational database applications are not as much fun as puns, but far more useful in providing reliable information.

Therefore, in order to build a useful set of relational database tables, we have to be ruthless in eliminating ambiguity by making sure our understanding of each element is restricted to one, and only one, possible interpretation. Such attributes are immutable.

Speaking of Eye Color, for example, the appropriate formulation in a relational database application is:
Color (X) is an immutable attribute of Human Eyes (Y).
Having claimed that, of course, I hasten to acknowledge that often a baby's eye color changes in its early days. But, by and large, if you have brown eyes as an adult, you're not likely to wake up to blue eyes one morning, even if your sweetheart leaves for greener pastures.

On the other hand, there IS a way metaphorically the current color of your eyes can change. In the words of the Crystal Gay classic, sometimes your brown eyes can "be blue". Being blue—a metaphor for sorrow—is a good way to capture the feeling of sorrow we all experience from time to time. That's entirely valid in the proper context, i.e. poetry, humor, and music.

But unlike in song lyrics and puns, we can't tolerate dual meanings in a relational database application.

In order to formalize the difference between an immutable attribute and a variable status in a relational database application, we need to add an X and Y rhetorical structure to reflect that some attributes are temporal.

Sorrow (X) is a variable status of a person's emotional state (Y).
In the words of a country singer, "Blue" is a stand-in for "Sad", which is one possible status of a person's emotional state. Other states might be "Happy", or "Angry" or "Confused". By the way, in the database world, there might well be a relational database application in which a person's emotional state would be an attribute of interest, say in recording patients' responses to different experiences in a clinical therapist's practice.

A key difference between a variable status and an immutable attribute is mutability or immutability. Eyes can metaphorically change from brown to blue, but not physically. Emotional state, on the other hand, is variable and even volatile at times. We can be happy, angry, sad and confused all on the same day.

Generalizing the Principle

The same kind of analysis can be applied to any attribute we consider for inclusion in a relational database application. We can examine candidates for entities (or tables) and attributes (or fields) in light of their mutability. Depending on how we decide to handle them, the tables and fields needed to support them will need to be set up differently. The important thing to keep in mind is that we do have some principles, formulated as X and Y structures, to guide us.

Roles and Statuses are Variable

Mutability, or variability, means we can handle relational tables for status in the same we handle relational tables for roles. Specifically, there is a temporal, or sequential, or historical, component to both. That means if we need to capture the history of changing statuses for an entity, we need multiple, date flagged records in a history table. In the relatively more rare situation that we are only interested in current status, of course, we can create an updateable attribute field in the parent table.

Let's consider some examples of such statuses for our hypothetical Human Resources relational database application.

One status commonly tracked by HR for is whether an employee is paid
  • an hourly wage. 
  • a monthly salary
  • a piece rate
These statuses might be called "Compensation" or something similar.

For example, employees who drive forklifts in a warehouse are usually paid hourly wages, while the warehouse manager is usually paid a monthly salary. The line workers making the widgets that go into the boxes moved by forklifts may be paid according to the number of widgets they produce in a shift.

It's tempting to say, on first look, to say "Compensation" is an attribute of an employee, that's how each one is paid.
X (compensation) is A VARIABLE STATUS OF Y (employee).
It's also wrong. This particular situation has to do with employee roles as well as the employees themselves.

Remember, employees can, and do, move from position to position. The forklift driver can be promoted to supervising a shift in the warehouse and move, at the same time, from an hourly wage to a monthly salary. A line worker might go to training and learn to drive a forklift. And so on.

So, it's clear that "Compensation" is an attribute of the role which the employee holds, even though the common reference is to "hourly employees" or "salaried employees".

The correct X and Y structure is, therefore.
X (compensation) is AN IMMUTABLE ATTRIBUTE OF Y (Employee Role).
 And that calls for a field for "Compensation" in the Role table.

We now have four ways to analyze candidates for inclusion as tables and fields in a relational database application.
X is A KIND of Y
  • Human Resources is a Department within Company A 
  • Hire Date is an immutable attribute of an employee. 
  • Full-time is an employment status of an employee.
  • Forklift driver is a variable role of an employee
 The first two X and Y formulations are materialized in a relational database application as fields and tables in those fields, reflecting their non-variable nature.

The second two X and Y formulations are usually materialized in a relational database application as two (or more) tables, one of which holds the history of the variable status or role.