Tuesday, April 14, 2020

COVID-19 Supplies Tracking Database

My friend, Juan Soto, at IT Impact, has created a free tool for small, isolated teams working to combat COVID-19 in their communities. His team put the tool together over the past couple of weeks and he's ready to make it available now. Go to COVID-19 Supplies Tracking Database and download it.

Once you've installed it from the internet download, it will run locally on your computer or laptop without an internet connection, making it ideal for all sorts of situations.

Please share the link and take care

Tuesday, March 31, 2020

Don't It Make My Brown Eyes Blue

The Role of Ambiguity in Real Life and Relational Database Applications

For a long time I have maintained that ambiguity is not good in a relational database application even though it is required in humor, poetry, politics, and as it turns out, in song lyrics—"Don't it make my brown eyes blue". That line depends on the dual, i.e. ambiguous,  meaning of "blue". To get it, you have to be competent not only with the dictionary meaning of blue as a color, but the metaphorical meaning of blue as an emotional state. That ambiguity adds spice to the song.

Ambiguity is not so good, however, in a relational database application. We can't trust data if it might mean one thing today and something else tomorrow. Color is color, emotion is emotion. Sure, relational database applications are not as much fun as puns, but far more useful in providing reliable information.

Therefore, in order to build a useful set of relational database tables, we have to be ruthless in eliminating ambiguity by making sure our understanding of each element is restricted to one, and only one, possible interpretation. Such attributes are immutable.

Speaking of Eye Color, for example, the appropriate formulation in a relational database application is:
X is AN IMMUTABLE ATTRIBUTE OF Y.
Color (X) is an immutable attribute of Human Eyes (Y).
Having claimed that, of course, I hasten to acknowledge that often a baby's eye color changes in its early days. But, by and large, if you have brown eyes as an adult, you're not likely to wake up to blue eyes one morning, even if your sweetheart leaves for greener pastures.

On the other hand, there IS a way metaphorically the current color of your eyes can change. In the words of the Crystal Gay classic, sometimes your brown eyes can "be blue". Being blue—a metaphor for sorrow—is a good way to capture the feeling of sorrow we all experience from time to time. That's entirely valid in the proper context, i.e. poetry, humor, and music.

But unlike in song lyrics and puns, we can't tolerate dual meanings in a relational database application.

In order to formalize the difference between an immutable attribute and a variable status in a relational database application, we need to add an X and Y rhetorical structure to reflect that some attributes are temporal.

X is A VARIABLE STATUS OF Y.
Sorrow (X) is a variable status of a person's emotional state (Y).
In the words of a country singer, "Blue" is a stand-in for "Sad", which is one possible status of a person's emotional state. Other states might be "Happy", or "Angry" or "Confused". By the way, in the database world, there might well be a relational database application in which a person's emotional state would be an attribute of interest, say in recording patients' responses to different experiences in a clinical therapist's practice.

A key difference between a variable status and an immutable attribute is mutability or immutability. Eyes can metaphorically change from brown to blue, but not physically. Emotional state, on the other hand, is variable and even volatile at times. We can be happy, angry, sad and confused all on the same day.

Generalizing the Principle

The same kind of analysis can be applied to any attribute we consider for inclusion in a relational database application. We can examine candidates for entities (or tables) and attributes (or fields) in light of their mutability. Depending on how we decide to handle them, the tables and fields needed to support them will need to be set up differently. The important thing to keep in mind is that we do have some principles, formulated as X and Y structures, to guide us.

Roles and Statuses are Variable

Mutability, or variability, means we can handle relational tables for status in the same we handle relational tables for roles. Specifically, there is a temporal, or sequential, or historical, component to both. That means if we need to capture the history of changing statuses for an entity, we need multiple, date flagged records in a history table. In the relatively more rare situation that we are only interested in current status, of course, we can create an updateable attribute field in the parent table.

Let's consider some examples of such statuses for our hypothetical Human Resources relational database application.

One status commonly tracked by HR for is whether an employee is paid
  • an hourly wage. 
  • a monthly salary
  • a piece rate
These statuses might be called "Compensation" or something similar.

For example, employees who drive forklifts in a warehouse are usually paid hourly wages, while the warehouse manager is usually paid a monthly salary. The line workers making the widgets that go into the boxes moved by forklifts may be paid according to the number of widgets they produce in a shift.

It's tempting to say, on first look, to say "Compensation" is an attribute of an employee, that's how each one is paid.
X (compensation) is A VARIABLE STATUS OF Y (employee).
It's also wrong. This particular situation has to do with employee roles as well as the employees themselves.

Remember, employees can, and do, move from position to position. The forklift driver can be promoted to supervising a shift in the warehouse and move, at the same time, from an hourly wage to a monthly salary. A line worker might go to training and learn to drive a forklift. And so on.

So, it's clear that "Compensation" is an attribute of the role which the employee holds, even though the common reference is to "hourly employees" or "salaried employees".

The correct X and Y structure is, therefore.
X (compensation) is AN IMMUTABLE ATTRIBUTE OF Y (Employee Role).
 And that calls for a field for "Compensation" in the Role table.

We now have four ways to analyze candidates for inclusion as tables and fields in a relational database application.
X is A KIND of Y
  • Human Resources is a Department within Company A 
X is AN IMMUTABLE ATTRIBUTE of Y
  • Hire Date is an immutable attribute of an employee. 
X is A VARIABLE STATUS of Y
  • Full-time is an employment status of an employee.
X is A VARIABLE ROLE of Y
  • Forklift driver is a variable role of an employee
 The first two X and Y formulations are materialized in a relational database application as fields and tables in those fields, reflecting their non-variable nature.

The second two X and Y formulations are usually materialized in a relational database application as two (or more) tables, one of which holds the history of the variable status or role.

Friday, March 27, 2020

I'm Not a Manager, I Just Play One in the EmployeeRoles Table

Last time out I outlined a couple of different ways to understand tables (the names of things) and fields (attributes of those things) in terms of our natural language understanding of common rhetorical structures.

I explained two of the X and Y rhetorical structures at work.
X is ONE KIND OF Y.
X is AN ATTRIBUTE OF Y. 
I offered the examples of departments within an organization and employees working for the organization. However, we need to drill further down in order to get a more complete and accurate picture of the real world we're trying to model in a relational database application. Let's do that next.

First, it is true that "Human Resources" is one kind of department, along with Information Technology, Marketing and so on. It is also true that "size" or "number of employees" is an attribute of departments.

However, I also went on to describe the different roles played by employees as attributes of those employees, although that formulation is not quite precise enough. I think it's more useful to propose a third X and Y rhetorical structure to account for the changeable nature of roles.
X CURRENTLY FILLS THE ROLE OF Y.
In my defense, it's not totally inaccurate to say that Role is an attribute of an employee of importance to the organization. On the other hand, I think it's more useful to recognize that employees can and do move more or less fluidly between roles.  And that means it's not simply the same thing as the immutable attributes of a person, like their birthdate.

Current Role is fundamentally different from some of the other attributes we can identify for employees. For example, the original Date of Hire for an employee doesn't change. That's an immutable attribute of the employee's relationship to the organization. People's names--for the most part--are also immutable attributes of people whether they work for the organization or not, and so on.

So, rhetorically speaking, we need to analyze candidates for inclusion in a relational database application as tables and as fields in those tables in at least the following three ways.

X is ONE KIND OF Y.
X is AN IMMUTABLE ATTRIBUTE OF Y.
X is currently IN THE ROLE OF Y. 
 I was initially a bit hesitant to include "immutable" as part of the second X and Y formula, because it's always possible that many such attributes can actually change. Women often change their last names after marriage, for example. Even a person's gender can be changed. So, although I am sure that it is useful to think of immutable and changeable as differentiators between attributes and roles, I do recognize that it's not totally unambiguous.

In relational database application design terms, we need a better way to think about the way we create tables and fields to handle these three structures.

As we saw last time, a table is the name of one thing: a noun, in other words.

At a general level, we can identify different attributes of those nouns: adjectives, in other words.

Now, though, we have a better, more granular, way to differentiate between those attributes, using the X and Y rhetorical structures.

X is ONE KIND OF Y.
  • Example: Human Resources is a Department at Company ABC. 
X is AN IMMUTABLE ATTRIBUTE OF Y.
  • Example: John Smith was hired in the Human Resources Department at Company ABC on June 1, 2010. 
X is CURRENTLY IN THE ROLE OF Y. 
  • Example: As of September 12, 2018, John Smith is Head of the Compensation Team in the Human Resources Department at Company ABC.
Because there is a fundamental difference between the unchangeable initial hire date (an immutable attribute) and the current Role Date (a current role), we may need to structure our tables differently. In other words, the important thing here is NOT the simple fact that we have to store "a date". The important thing is that different kinds of dates can have different implications about how we need to store the data.

Whether the current role is set up as an field in the Employee table depends on how Company ABC uses their data.
  • Is it important to know which roles John Smith has held and when he held them? 
  • Or do we only need to know John Smith's current role? 
 In most cases I would imagine that the historical record would be important here. And that, in turn, calls for a separate table: EmployeeRole. In this table, we store information about the employee, the role or roles they have held in the organization, and the begin and end dates of each of those roles.

I suppose that there are cases where only a current role or current attribute is important. For example, it could be argued that the employee's current name is all we ever need. The fact that Darlene changed her last name from Jones to Barker when she got married may not be relevant for our purposes. All we need to know here at Company ABC is what name to print on her paycheck.

Decisions of that sort can only be made within the context of the specific project. On the other hand, knowing that it is a decision to be made certainly is important.

If the history of an employee's different roles within the company is important, we need to create a table to hold it. That history is a thing, an entity separate from the employee.

Summing it all up.

For this hypothetical relational database application, I now see a Department Table, with at least one field for "DepartmentName".

I also see an Employee table, with fields for employee First, Middle and Last Names and HireDate. As noted, those four attributes are essentially immutable and, therefore, qualify to be attached to the Employee in that table.

However, "Current Role" is not an immutable attribute of an employee. Partly for that reason and partly because of decision about the purpose of the relational database application, it becomes a candidate for another table entirely: EmployeeRoleHistory. I like to call such tables "History" tables.

For now, we can sum up by identifying three X and Y rhetorical structures and their implications in designing a relational database application.

X is one kind of Y.
  • These are implemented as tables (Y) and fields (X) in those tables to hold the names of those kinds of things. 
X is an immutable attribute of Y.
  •  These are also implemented as tables (Y) and fields (X) in those tables to hold the domain of possible values of the attribute.
X is a current role held by Y.
  • These can be implemented as tables (Y) and fields (X) in those tables if, and only if, the CURRENT role is the only role of importance.
  • If the HISTORY of roles is important, these must be implemented as two tables (Y1 for Employee and Y2 for Role) in order to link up the role (X2) and the holder of that role (X1).

In my next installment, I'll dig deeper into the nature of the "X is a current role held by Y" structure and of the EmployeeRoleHistory table we need to properly support it in a relational database application.

Thursday, March 26, 2020

You Say Whatchamacallit, I Say Department

Last time out I tried to explain some of my thinking behind selecting the names of things we need to track in a relational database application. Specifically, I likened them to the nouns of a natural language as a way to help differentiate between entities themselves and the attributes that qualify them. It's more complex than that, though. Let's drill into one aspect of that complexity using the example of departments within an organization.

In terms of the Discourse Blocs analysis originated in Will Pitkin's original formulation, the specific structure here is:
X is one kind of Y.
The relationship between X and Y is that Y is a higher level group of similar X's, a hierarchy. X does not modify Y, which would be another possible X and Y rhetorical structure.
X is an attribute of Y.
Let's go back and look at two of the proposed tables I outlined in the previous post.

  • 2018 Budget
  • 2019 Budget
  • 2020 Budget
  • Current Employee Assignment
  • Department
  • Human Resources
  • Location
  • Employee
  • Payroll
  • Sick Days
  • Vacation Days

We're focusing on "Human Resources" and  "Department".

It's obvious to users of our natural language, that the rhetorical structure here is that "Human Resources" is ONE KIND OF "Department". (At least I think it's obvious to most of us.) We recognize that "Human Resources" is a qualifier (also referred to as an adjective) for the noun "Department". But the underlying rhetorical structure is that Human Resources is ONE KIND of Department, which is different in rhetorical meaning from, say, "Large Department" or "Small Department". "Large" and "Small" are attributes of ANY department.

In company A, Human Resources is a large Department.
In company B, Human Resources is a small Department.

There are rhetorically speaking, two X and Y structures at work.

  • X is ONE KIND of Y.
    Human Resources is a Department in Company A and in Company B.
  • X is AN ATTRIBUTE of Y.
    Human Resources is large in Company A and small in Company B.

Turning to the way we apply this understanding to the way we design tables in a relational database application, we can make the following conclusions.

In a relational database application that means we have one table called "Department", i.e. the noun, and that "Human Resources" is one of the records in that table  along with "Operations",  "Marketing", etc. They are all "One Kind Of", and that means they are all records in the Department table.

In other words, the specific terms differentiate between instances of "Department". Each such instance becomes one record. (I'll get to another aspect of this principle later, when I talk more about the problem of repeating groups and spreadsheet style tables.)

In the previous post, I also described the relationship between Employees and Managers along the same X and Y principle.

A "Manager" is ONE KIND OF "Employee".

Note that unlike Departments, we don't normally attach descriptors like "Manager" to nouns like "Employees" even though the X is one Kind of Y structure is the same. I.e. we don't say "John is a Manager Employee." We DO recognize, though, that the underlying rhetorical concept is consistent, i.e. "X is a kind of Y".

 Pitkin explored this important concept in his original work: most rhetorical structures can be expressed in more than one way. It's important to look for and identify those rhetorical relationships and not focus exclusively on the specific words in a block of discourse.

I realize that I'm pounding hard on this point. I'm doing so because it's so important to my way of thinking about table and relationship design.

It's not just the specifics of the words in a phrase that we need to understand. We also have to understand the principles by which we describe one thing in terms of another. At the risk of over-simplification, what I'm proposing is that it's not enough to notice that we can, and do, refer to a "Human Resources Department", even though we do not refer to "Manager Employee" or "Salesperson Employee". It's not just the actual syntax that matters, it's also the semantics behind that syntax which matters.

And now we have a better way to understand that meaning in terms of the X and Y structures.

Let me illustrate it this way. One of these statements is true, one is not. It has to do with whether the X and Y relationship is ONE KIND OF or AN ATTRIBUTE OF.

Once a Human Resources Department, always a Human Resources Department.
Once a receptionist, always a receptionist.

If we go to work on Monday in the Human Resources Department, we don't expect to go into work on Tuesday and find that the Human Resources Department is now in charge of Sales instead.

If we go to work on Monday as a receptionist, it's quite possible that we can go back to work on Tuesday as a Marketing Specialist due to a transfer or promotion.

And that's the kind of logic we need to bring to our tables in a relational database application.

I'll sum it up by saying that, if you decide you are looking at a potential component of your database, and you decide that it is an example of ONE KIND OF thing, it becomes a record in a table. If you decide it is an example of AN ATTRIBUTE OF, it becomes an attribute, or field, in a table.


Records and Attributes in a Relational Table





Thursday, March 19, 2020

I Need One of Those Whatchamacallits

Tables—The "Nouns" of the Database Language

The first rule of normalization specifies how fields and columns in a table should be set up. The other rules add greater specificity. However, in my opinion, before we can even get to that point, we have to have a clear idea of what the table itself should look like. I like to make the analogy that the tables in a relational database application play the role played by nouns in a natural language. In other words, they are the names of the things we want to talk about. But we have to be very disciplined in selecting those names and the things they represent.

Let's consider a more or less random list of names of things that you might want to materialize as tables in a relational database application. At this point, they are still candidates for inclusion, still in the realm of natural language, because we are still just thinking about the things we see and talk about in our day to day work; things we may want to create tables for in a relational database application.

As I review them, I'll try to explain why some of them ARE good candidates to become tables, and some ARE NOT. If we can codify the thinking behind those decisions, we can reuse that decision-making to create other tables for other databases. Also, we'll identify at least one table that is missing from our initial list of candidates.
  • 2018 Budget
  • 2019 Budget
  • 2020 Budget
  • Current Employee Assignment
  • Department
  • Employee
  • Human Resources 
  • Location
  • Manager
  • Payroll
  • Sick Days
  • Vacation Days

Human Resources Functions, I Presume?

First, we can deduce that these might be tables you'd find in a Human Resources relational database application. In fact, Human Resources is one of the names in the list of candidates to be materialized as a table. So let's see what else we know about Human Resources. 

Most organizations organize themselves into  departments; Human Resources is one of them along with Operations, Marketing, Information Services, and usually others. People who work in Human Resources are tasked with managing the people who work for the organizations, how and how much they get paid, which departments they work in and so on. People working in Operations are responsible for organizing the way the organization does its work, Marketing employees develop programs to sell the organization's goods and services, and so on.

The first thing that jumps out to me is that the list of candidates includes Department and Human Resources. Obviously, we are primarily concerned about HR functions here, so it's understandable that HR would be proposed as one of the table candidates. However, what we already know about how organizations work in the normal world tells us this is not going to work in a relational database application.

Egos aside, does it make sense to lump other departments into one category while isolating the HR department into another, separate category at the same level? From a natural language perspective, if you were going to create a presentation to the board of directors, would this rhetorical structure make sense?
The XYZ Company consists of two divisions: Human Resources and Other Departments.
Is it really a binary choice between two categories? Probably not.

You might see a rhetorical organization like this, though:
The XYZ Company consists of numerous divisions, including Human Resources, which is the subject of our presentation today.
According to Discourse Bloc theory, the basic rhetorical structure of these two formations is very similar. Both follow one of the common Bloc structures found throughout technical discourse. Specifically, that would be:
Bloc X consists of one or more units of Bloc Y.
For example:
Animal species (Bloc X) include (or consists of) mammals, reptiles, insects, and others (Bloc Y). 
 In our relational database application, the rhetorical formulation is this:
Bloc X (the XYZ Company) is organized into multiple Bloc Y parts (divisions or departments). 
  • The first formulation artificially limits the Y number of parts to two UNEQUAL parts. 
  • The second formulation recognizes that there are actually multiple, EQUAL parts.
You probably agree with me that that the first formulation would make no sense. In fact, it might get you in trouble with the other departments represented at that meeting. However, the second formulation is semantically, as well as rhetorically, sound because it a) correctly identifies the relationship between whole (XYZ Corporation) and its parts ( XYZ Departments), b) acknowledges the parity of Human Resources with other departments, and c) identifies the primary focus of the presentation.

In our database language, then, the noun we choose for our table name should reflect the same logic. We need tables which accurately reflect the real world.  I suspect the logic is a lot easier to recognize "in the real world" because that's where we spend most of our time. We know that there are many  more or less equal departments even if we only want to talk about one of them. In a relational database application, sometimes we lose sight of that.

In the database we need one table, called Department (the whole). One of the departments (the parts) listed in that table will be Human Resources. That's true even though all of the other things we name and talk about in this database all pertain to that one department. We're tracking the functions of HR, but we're not ignoring reality to do so. In this relational database application, we're not tracking what the Operations or Training Departments do, we're tracking the people, i.e. the Human Resources, who work in all of the organization's many departments.

Conclusion: We eliminate "Human Resources" from further consideration to become a table. We retain "Department" as a table candidate. Department is the name of the entity we want to track in our database. It's the proper noun.

The Things Human Resources Must Track

So, if we know that these are all related to Human Resources activities, the next task is to decide which of these proposed tables should be included, and if they are included, should they be used "as is" or further refined.

Managers are People Too

There is a proposed table for "Managers" and a proposed table for "Employees". Does that make sense? I suggest not, for this reason.

The people who work for XYZ Corportation  includes all of those people who are employed to "do something" on behalf of the organization. The Discourse Bloc for this is similar to the one we saw for departments, Bloc X consists of multiple Bloc Ys.
Bloc X (people who work for XYZ) fills several Bloc Y roles (managers, sales people, IT Developers, assembly line workers, and so on).
Formulated in that X/Y structure, it's pretty easy to see that we need a higher level, Bloc X category for all of the Bloc Y roles. The obvious choice is Employee. All managers are employees; all delivery drivers are employees, all software engineers are employees.

Once again, we drop the lower level, Y Bloc, terminology from consideration as a table. We should have an employee table; we should have no manager table.

Y Blocs Don't Disappear

At this point, we need to point out that removing "Human Resources" and "Manager" from consideration as tables in themselves does not mean they aren't relevant. They are. They are "Bloc Y" members of the two remaining "Bloc X" categories. And that means they'll become records in their respective tables, along with any other Bloc Y units that catch our attention.

Another way to say this is that "Department" and "Employee" are two of the nouns we need to include in order to talk logically about Human Resources activities at the XYZ Corporation.

We can continue review and analysis of the other candidates on the list; some will make into the Bloc X group that become tables, some into the Bloc Y group that will become records in those tables. I'll take up that task in my next installment.