Wednesday, January 1, 2020

How Long Has This Been Going On?

It is axiomatic that an organization invests time and other resources in acquiring a well designed Relational Database Application because it needs and wants answers to common questions about their organization and its operations.

Things like:
  • How many people work here?
  • How long has Bob worked here?
  • Where does Nancy work, in Home Office or Regional Office One?
  • What's our best-selling line of shoes?
  • What is our profit margin on men's socks?
Dozens of other questions can be asked about nearly every organization. The job of your Relational Database Application is to gather, store, protect and analyze the data needed to answer all of those questions. And that means the  tables in that Relational Database Application have to be designed accordingly. It's the "Relational" part of the Database Application.

Here's an example of two ways to design tables, depending which questions need to be answered.

History vs Current Status

There are two ways to store Salary information for employees. One would be to store only the Current Salary paid to employees. The other would be to store the entire Salary History for those employees. The first requires only one table, the second requires two tables.

So, before you even start building out the tables for this employee tracking Relational Database Application, you need to know which question(s) the organization wants to be able to answer. This is a partial list of the kinds of questions Human Resources might ask about employees' salaries.

  1. How much does Jose, in Sales, make each year?
  2. How much did Anastasia, in IT, make in her previous position in the Shipping Department?
  3. When did Mark, in Customer Support, last get a raise?

Current Status

The first question can be answered from a single "Current Status" field in one table. It's a point in time question, and it refers only to THIS point in time. So we need only one value for Current Salary at a time.*

Dynamic vs Static Status

Note, however, that this does NOT mean the value of Current Salary is static. It can change from time to time. But regardless of those changes over time, the only question a Current Status field can answer is "How much is it when I ask the question?"

A static value, such as the employee's Date Of Birth, doesn't change. It too can be stored in a single "Current Status" field in one table.

If the organization decides, as a matter of policy, that they are concerned only with that Current Status, this "Current Status" field in a single table design is adequate. As a matter of fact, one can think of a number of data points for which we need to store only Current Status. Phone number comes to mind. If a customer changes their phone number, for example, we need to update our records to that new number, but there's no point to retaining the old, obsolete, phone number. You couldn't call it anyway!


Many other kinds of questions we typically ask in a Relational Database Application, though, require knowledge of the history beyond the current status. "When did this happen last?", "How Long has this been the case?", "What was it like Before and what is it like After?" questions all require us to keep track of the current status as well as each of the previous statuses along with the dates when those statuses change.

In order to answer the question about Anastasia's previous salary, we actually have to keep that data point in a table, along with the beginning and ending dates for that salary and her tenure in the Shipping Department.

Operational and Technical Decisions

Answering questions about "Current Status" requires only one data point, the current salary. That means a single field in a single table can store that data point. So, when it comes time to design the employee table for our organization, we can add a field called "Salary" to the Employee table and update it from time to time as required.

Because answering "History" questions requires a different approach to storing data, we also have to take a different approach to designing the tables to support it. We can't put a new field in a table for each salary change that occurs for every employee.  Think about how awkward that would become. Let's say our organization has ten employees on day one. We store their "StartingSalary" in the employee table along with other pertinent data points.

One month later, Anika gets promoted and receives a pay raise. If we want to keep the salary history, and if we're trying to store that history in the same table, we have to add a new field for Anika's new salary and call it something "Salary_2020_03_22" to reflect the date it happened. But that means the other nine employees now have the same field, "Salary_2020_03_22",  but nothing to put in it; they didn't yet get a raise.

Then, six months later, four other employees get raises and that means another new field, "Salary_2020_09_12". So, now Anika has values under "StartingSalary" and "Salary_2020_03_22". Four employees also have "StartingSalaries", but they have "Salary_2020_09_12" salaries. And five other employees have only "StartingSalaries" and nothing in the other two salary fields.

That scenario sounds far-fetched, of course, and it's obvious (I hope!) that it would be a terrible way to design and maintain a Relational Database Application**. Unfortunately, we see that kind thing all too often. It's not always as easy as this to see how ridiculous it is, but it always leads to problems.

Because we want to answer a different kind of question, we have to design our tables differently.

In this case, we have a one-to-many relationship between Employees and Salary (history). One employee has one or more salaries OVER TIME.

The proper design, therefore, is an employee table and a related EmployeeSalary table.

These screenshots illustrate how it has to work:
One Table Current Status of Salary vs Two Table of Employee Salary History
Note that Salary History requires at least a "from_date" in order to establish the proper sequence of values.
History of Salaries for One Employee
The query on the left is based on the table called "employee_salary". The query on the right is based on the two related tables, "employee" and "employee_salary_history."

 Look Before You Leap

Before you leap into creating a new Relational Database Application, make sure you know which questions the organization will need to answer, this year, next year and beyond.

* The third question MIGHT be answered from a "Current Status" field in one table if we also include the "As Of Date" for that status.

**On the other hand, it's pretty much the only way you can do it in an Excel Worksheet, isn't it? 😀