Thursday, March 19, 2020

I Need One of Those Whatchamacallits

Tables—The "Nouns" of the Database Language

The first rule of normalization specifies how fields and columns in a table should be set up. The other rules add greater specificity. However, in my opinion, before we can even get to that point, we have to have a clear idea of what the table itself should look like. I like to make the analogy that the tables in a relational database application play the role played by nouns in a natural language. In other words, they are the names of the things we want to talk about. But we have to be very disciplined in selecting those names and the things they represent.

Let's consider a more or less random list of names of things that you might want to materialize as tables in a relational database application. At this point, they are still candidates for inclusion, still in the realm of natural language, because we are still just thinking about the things we see and talk about in our day to day work; things we may want to create tables for in a relational database application.

As I review them, I'll try to explain why some of them ARE good candidates to become tables, and some ARE NOT. If we can codify the thinking behind those decisions, we can reuse that decision-making to create other tables for other databases. Also, we'll identify at least one table that is missing from our initial list of candidates.
  • 2018 Budget
  • 2019 Budget
  • 2020 Budget
  • Current Employee Assignment
  • Department
  • Employee
  • Human Resources 
  • Location
  • Manager
  • Payroll
  • Sick Days
  • Vacation Days

Human Resources Functions, I Presume?

First, we can deduce that these might be tables you'd find in a Human Resources relational database application. In fact, Human Resources is one of the names in the list of candidates to be materialized as a table. So let's see what else we know about Human Resources. 

Most organizations organize themselves into  departments; Human Resources is one of them along with Operations, Marketing, Information Services, and usually others. People who work in Human Resources are tasked with managing the people who work for the organizations, how and how much they get paid, which departments they work in and so on. People working in Operations are responsible for organizing the way the organization does its work, Marketing employees develop programs to sell the organization's goods and services, and so on.

The first thing that jumps out to me is that the list of candidates includes Department and Human Resources. Obviously, we are primarily concerned about HR functions here, so it's understandable that HR would be proposed as one of the table candidates. However, what we already know about how organizations work in the normal world tells us this is not going to work in a relational database application.

Egos aside, does it make sense to lump other departments into one category while isolating the HR department into another, separate category at the same level? From a natural language perspective, if you were going to create a presentation to the board of directors, would this rhetorical structure make sense?
The XYZ Company consists of two divisions: Human Resources and Other Departments.
Is it really a binary choice between two categories? Probably not.

You might see a rhetorical organization like this, though:
The XYZ Company consists of numerous divisions, including Human Resources, which is the subject of our presentation today.
According to Discourse Bloc theory, the basic rhetorical structure of these two formations is very similar. Both follow one of the common Bloc structures found throughout technical discourse. Specifically, that would be:
Bloc X consists of one or more units of Bloc Y.
For example:
Animal species (Bloc X) include (or consists of) mammals, reptiles, insects, and others (Bloc Y). 
 In our relational database application, the rhetorical formulation is this:
Bloc X (the XYZ Company) is organized into multiple Bloc Y parts (divisions or departments). 
  • The first formulation artificially limits the Y number of parts to two UNEQUAL parts. 
  • The second formulation recognizes that there are actually multiple, EQUAL parts.
You probably agree with me that that the first formulation would make no sense. In fact, it might get you in trouble with the other departments represented at that meeting. However, the second formulation is semantically, as well as rhetorically, sound because it a) correctly identifies the relationship between whole (XYZ Corporation) and its parts ( XYZ Departments), b) acknowledges the parity of Human Resources with other departments, and c) identifies the primary focus of the presentation.

In our database language, then, the noun we choose for our table name should reflect the same logic. We need tables which accurately reflect the real world.  I suspect the logic is a lot easier to recognize "in the real world" because that's where we spend most of our time. We know that there are many  more or less equal departments even if we only want to talk about one of them. In a relational database application, sometimes we lose sight of that.

In the database we need one table, called Department (the whole). One of the departments (the parts) listed in that table will be Human Resources. That's true even though all of the other things we name and talk about in this database all pertain to that one department. We're tracking the functions of HR, but we're not ignoring reality to do so. In this relational database application, we're not tracking what the Operations or Training Departments do, we're tracking the people, i.e. the Human Resources, who work in all of the organization's many departments.

Conclusion: We eliminate "Human Resources" from further consideration to become a table. We retain "Department" as a table candidate. Department is the name of the entity we want to track in our database. It's the proper noun.

The Things Human Resources Must Track

So, if we know that these are all related to Human Resources activities, the next task is to decide which of these proposed tables should be included, and if they are included, should they be used "as is" or further refined.

Managers are People Too

There is a proposed table for "Managers" and a proposed table for "Employees". Does that make sense? I suggest not, for this reason.

The people who work for XYZ Corportation  includes all of those people who are employed to "do something" on behalf of the organization. The Discourse Bloc for this is similar to the one we saw for departments, Bloc X consists of multiple Bloc Ys.
Bloc X (people who work for XYZ) fills several Bloc Y roles (managers, sales people, IT Developers, assembly line workers, and so on).
Formulated in that X/Y structure, it's pretty easy to see that we need a higher level, Bloc X category for all of the Bloc Y roles. The obvious choice is Employee. All managers are employees; all delivery drivers are employees, all software engineers are employees.

Once again, we drop the lower level, Y Bloc, terminology from consideration as a table. We should have an employee table; we should have no manager table.

Y Blocs Don't Disappear

At this point, we need to point out that removing "Human Resources" and "Manager" from consideration as tables in themselves does not mean they aren't relevant. They are. They are "Bloc Y" members of the two remaining "Bloc X" categories. And that means they'll become records in their respective tables, along with any other Bloc Y units that catch our attention.

Another way to say this is that "Department" and "Employee" are two of the nouns we need to include in order to talk logically about Human Resources activities at the XYZ Corporation.

We can continue review and analysis of the other candidates on the list; some will make into the Bloc X group that become tables, some into the Bloc Y group that will become records in those tables. I'll take up that task in my next installment.