Tuesday, April 14, 2020

COVID-19 Supplies Tracking Database

My friend, Juan Soto, at IT Impact, has created a free tool for small, isolated teams working to combat COVID-19 in their communities. His team put the tool together over the past couple of weeks and he's ready to make it available now. Go to COVID-19 Supplies Tracking Database and download it.

Once you've installed it from the internet download, it will run locally on your computer or laptop without an internet connection, making it ideal for all sorts of situations.

Please share the link and take care

Tuesday, March 31, 2020

Don't It Make My Brown Eyes Blue

The Role of Ambiguity in Real Life and Relational Database Applications

For a long time I have maintained that ambiguity is not good in a relational database application even though it is required in humor, poetry, politics, and as it turns out, in song lyrics—"Don't it make my brown eyes blue". That line depends on the dual, i.e. ambiguous,  meaning of "blue". To get it, you have to be competent not only with the dictionary meaning of blue as a color, but the metaphorical meaning of blue as an emotional state. That ambiguity adds spice to the song.

Ambiguity is not so good, however, in a relational database application. We can't trust data if it might mean one thing today and something else tomorrow. Color is color, emotion is emotion. Sure, relational database applications are not as much fun as puns, but far more useful in providing reliable information.

Therefore, in order to build a useful set of relational database tables, we have to be ruthless in eliminating ambiguity by making sure our understanding of each element is restricted to one, and only one, possible interpretation. Such attributes are immutable.

Speaking of Eye Color, for example, the appropriate formulation in a relational database application is:
X is AN IMMUTABLE ATTRIBUTE OF Y.
Color (X) is an immutable attribute of Human Eyes (Y).
Having claimed that, of course, I hasten to acknowledge that often a baby's eye color changes in its early days. But, by and large, if you have brown eyes as an adult, you're not likely to wake up to blue eyes one morning, even if your sweetheart leaves for greener pastures.

On the other hand, there IS a way metaphorically the current color of your eyes can change. In the words of the Crystal Gay classic, sometimes your brown eyes can "be blue". Being blue—a metaphor for sorrow—is a good way to capture the feeling of sorrow we all experience from time to time. That's entirely valid in the proper context, i.e. poetry, humor, and music.

But unlike in song lyrics and puns, we can't tolerate dual meanings in a relational database application.

In order to formalize the difference between an immutable attribute and a variable status in a relational database application, we need to add an X and Y rhetorical structure to reflect that some attributes are temporal.

X is A VARIABLE STATUS OF Y.
Sorrow (X) is a variable status of a person's emotional state (Y).
In the words of a country singer, "Blue" is a stand-in for "Sad", which is one possible status of a person's emotional state. Other states might be "Happy", or "Angry" or "Confused". By the way, in the database world, there might well be a relational database application in which a person's emotional state would be an attribute of interest, say in recording patients' responses to different experiences in a clinical therapist's practice.

A key difference between a variable status and an immutable attribute is mutability or immutability. Eyes can metaphorically change from brown to blue, but not physically. Emotional state, on the other hand, is variable and even volatile at times. We can be happy, angry, sad and confused all on the same day.

Generalizing the Principle

The same kind of analysis can be applied to any attribute we consider for inclusion in a relational database application. We can examine candidates for entities (or tables) and attributes (or fields) in light of their mutability. Depending on how we decide to handle them, the tables and fields needed to support them will need to be set up differently. The important thing to keep in mind is that we do have some principles, formulated as X and Y structures, to guide us.

Roles and Statuses are Variable

Mutability, or variability, means we can handle relational tables for status in the same we handle relational tables for roles. Specifically, there is a temporal, or sequential, or historical, component to both. That means if we need to capture the history of changing statuses for an entity, we need multiple, date flagged records in a history table. In the relatively more rare situation that we are only interested in current status, of course, we can create an updateable attribute field in the parent table.

Let's consider some examples of such statuses for our hypothetical Human Resources relational database application.

One status commonly tracked by HR for is whether an employee is paid
  • an hourly wage. 
  • a monthly salary
  • a piece rate
These statuses might be called "Compensation" or something similar.

For example, employees who drive forklifts in a warehouse are usually paid hourly wages, while the warehouse manager is usually paid a monthly salary. The line workers making the widgets that go into the boxes moved by forklifts may be paid according to the number of widgets they produce in a shift.

It's tempting to say, on first look, to say "Compensation" is an attribute of an employee, that's how each one is paid.
X (compensation) is A VARIABLE STATUS OF Y (employee).
It's also wrong. This particular situation has to do with employee roles as well as the employees themselves.

Remember, employees can, and do, move from position to position. The forklift driver can be promoted to supervising a shift in the warehouse and move, at the same time, from an hourly wage to a monthly salary. A line worker might go to training and learn to drive a forklift. And so on.

So, it's clear that "Compensation" is an attribute of the role which the employee holds, even though the common reference is to "hourly employees" or "salaried employees".

The correct X and Y structure is, therefore.
X (compensation) is AN IMMUTABLE ATTRIBUTE OF Y (Employee Role).
 And that calls for a field for "Compensation" in the Role table.

We now have four ways to analyze candidates for inclusion as tables and fields in a relational database application.
X is A KIND of Y
  • Human Resources is a Department within Company A 
X is AN IMMUTABLE ATTRIBUTE of Y
  • Hire Date is an immutable attribute of an employee. 
X is A VARIABLE STATUS of Y
  • Full-time is an employment status of an employee.
X is A VARIABLE ROLE of Y
  • Forklift driver is a variable role of an employee
 The first two X and Y formulations are materialized in a relational database application as fields and tables in those fields, reflecting their non-variable nature.

The second two X and Y formulations are usually materialized in a relational database application as two (or more) tables, one of which holds the history of the variable status or role.

Friday, March 27, 2020

I'm Not a Manager, I Just Play One in the EmployeeRoles Table

Last time out I outlined a couple of different ways to understand tables (the names of things) and fields (attributes of those things) in terms of our natural language understanding of common rhetorical structures.

I explained two of the X and Y rhetorical structures at work.
X is ONE KIND OF Y.
X is AN ATTRIBUTE OF Y. 
I offered the examples of departments within an organization and employees working for the organization. However, we need to drill further down in order to get a more complete and accurate picture of the real world we're trying to model in a relational database application. Let's do that next.

First, it is true that "Human Resources" is one kind of department, along with Information Technology, Marketing and so on. It is also true that "size" or "number of employees" is an attribute of departments.

However, I also went on to describe the different roles played by employees as attributes of those employees, although that formulation is not quite precise enough. I think it's more useful to propose a third X and Y rhetorical structure to account for the changeable nature of roles.
X CURRENTLY FILLS THE ROLE OF Y.
In my defense, it's not totally inaccurate to say that Role is an attribute of an employee of importance to the organization. On the other hand, I think it's more useful to recognize that employees can and do move more or less fluidly between roles.  And that means it's not simply the same thing as the immutable attributes of a person, like their birthdate.

Current Role is fundamentally different from some of the other attributes we can identify for employees. For example, the original Date of Hire for an employee doesn't change. That's an immutable attribute of the employee's relationship to the organization. People's names--for the most part--are also immutable attributes of people whether they work for the organization or not, and so on.

So, rhetorically speaking, we need to analyze candidates for inclusion in a relational database application as tables and as fields in those tables in at least the following three ways.

X is ONE KIND OF Y.
X is AN IMMUTABLE ATTRIBUTE OF Y.
X is currently IN THE ROLE OF Y. 
 I was initially a bit hesitant to include "immutable" as part of the second X and Y formula, because it's always possible that many such attributes can actually change. Women often change their last names after marriage, for example. Even a person's gender can be changed. So, although I am sure that it is useful to think of immutable and changeable as differentiators between attributes and roles, I do recognize that it's not totally unambiguous.

In relational database application design terms, we need a better way to think about the way we create tables and fields to handle these three structures.

As we saw last time, a table is the name of one thing: a noun, in other words.

At a general level, we can identify different attributes of those nouns: adjectives, in other words.

Now, though, we have a better, more granular, way to differentiate between those attributes, using the X and Y rhetorical structures.

X is ONE KIND OF Y.
  • Example: Human Resources is a Department at Company ABC. 
X is AN IMMUTABLE ATTRIBUTE OF Y.
  • Example: John Smith was hired in the Human Resources Department at Company ABC on June 1, 2010. 
X is CURRENTLY IN THE ROLE OF Y. 
  • Example: As of September 12, 2018, John Smith is Head of the Compensation Team in the Human Resources Department at Company ABC.
Because there is a fundamental difference between the unchangeable initial hire date (an immutable attribute) and the current Role Date (a current role), we may need to structure our tables differently. In other words, the important thing here is NOT the simple fact that we have to store "a date". The important thing is that different kinds of dates can have different implications about how we need to store the data.

Whether the current role is set up as an field in the Employee table depends on how Company ABC uses their data.
  • Is it important to know which roles John Smith has held and when he held them? 
  • Or do we only need to know John Smith's current role? 
 In most cases I would imagine that the historical record would be important here. And that, in turn, calls for a separate table: EmployeeRole. In this table, we store information about the employee, the role or roles they have held in the organization, and the begin and end dates of each of those roles.

I suppose that there are cases where only a current role or current attribute is important. For example, it could be argued that the employee's current name is all we ever need. The fact that Darlene changed her last name from Jones to Barker when she got married may not be relevant for our purposes. All we need to know here at Company ABC is what name to print on her paycheck.

Decisions of that sort can only be made within the context of the specific project. On the other hand, knowing that it is a decision to be made certainly is important.

If the history of an employee's different roles within the company is important, we need to create a table to hold it. That history is a thing, an entity separate from the employee.

Summing it all up.

For this hypothetical relational database application, I now see a Department Table, with at least one field for "DepartmentName".

I also see an Employee table, with fields for employee First, Middle and Last Names and HireDate. As noted, those four attributes are essentially immutable and, therefore, qualify to be attached to the Employee in that table.

However, "Current Role" is not an immutable attribute of an employee. Partly for that reason and partly because of decision about the purpose of the relational database application, it becomes a candidate for another table entirely: EmployeeRoleHistory. I like to call such tables "History" tables.

For now, we can sum up by identifying three X and Y rhetorical structures and their implications in designing a relational database application.

X is one kind of Y.
  • These are implemented as tables (Y) and fields (X) in those tables to hold the names of those kinds of things. 
X is an immutable attribute of Y.
  •  These are also implemented as tables (Y) and fields (X) in those tables to hold the domain of possible values of the attribute.
X is a current role held by Y.
  • These can be implemented as tables (Y) and fields (X) in those tables if, and only if, the CURRENT role is the only role of importance.
  • If the HISTORY of roles is important, these must be implemented as two tables (Y1 for Employee and Y2 for Role) in order to link up the role (X2) and the holder of that role (X1).

In my next installment, I'll dig deeper into the nature of the "X is a current role held by Y" structure and of the EmployeeRoleHistory table we need to properly support it in a relational database application.

Thursday, March 26, 2020

You Say Whatchamacallit, I Say Department

Last time out I tried to explain some of my thinking behind selecting the names of things we need to track in a relational database application. Specifically, I likened them to the nouns of a natural language as a way to help differentiate between entities themselves and the attributes that qualify them. It's more complex than that, though. Let's drill into one aspect of that complexity using the example of departments within an organization.

In terms of the Discourse Blocs analysis originated in Will Pitkin's original formulation, the specific structure here is:
X is one kind of Y.
The relationship between X and Y is that Y is a higher level group of similar X's, a hierarchy. X does not modify Y, which would be another possible X and Y rhetorical structure.
X is an attribute of Y.
Let's go back and look at two of the proposed tables I outlined in the previous post.

  • 2018 Budget
  • 2019 Budget
  • 2020 Budget
  • Current Employee Assignment
  • Department
  • Human Resources
  • Location
  • Employee
  • Payroll
  • Sick Days
  • Vacation Days

We're focusing on "Human Resources" and  "Department".

It's obvious to users of our natural language, that the rhetorical structure here is that "Human Resources" is ONE KIND OF "Department". (At least I think it's obvious to most of us.) We recognize that "Human Resources" is a qualifier (also referred to as an adjective) for the noun "Department". But the underlying rhetorical structure is that Human Resources is ONE KIND of Department, which is different in rhetorical meaning from, say, "Large Department" or "Small Department". "Large" and "Small" are attributes of ANY department.

In company A, Human Resources is a large Department.
In company B, Human Resources is a small Department.

There are rhetorically speaking, two X and Y structures at work.

  • X is ONE KIND of Y.
    Human Resources is a Department in Company A and in Company B.
  • X is AN ATTRIBUTE of Y.
    Human Resources is large in Company A and small in Company B.

Turning to the way we apply this understanding to the way we design tables in a relational database application, we can make the following conclusions.

In a relational database application that means we have one table called "Department", i.e. the noun, and that "Human Resources" is one of the records in that table  along with "Operations",  "Marketing", etc. They are all "One Kind Of", and that means they are all records in the Department table.

In other words, the specific terms differentiate between instances of "Department". Each such instance becomes one record. (I'll get to another aspect of this principle later, when I talk more about the problem of repeating groups and spreadsheet style tables.)

In the previous post, I also described the relationship between Employees and Managers along the same X and Y principle.

A "Manager" is ONE KIND OF "Employee".

Note that unlike Departments, we don't normally attach descriptors like "Manager" to nouns like "Employees" even though the X is one Kind of Y structure is the same. I.e. we don't say "John is a Manager Employee." We DO recognize, though, that the underlying rhetorical concept is consistent, i.e. "X is a kind of Y".

 Pitkin explored this important concept in his original work: most rhetorical structures can be expressed in more than one way. It's important to look for and identify those rhetorical relationships and not focus exclusively on the specific words in a block of discourse.

I realize that I'm pounding hard on this point. I'm doing so because it's so important to my way of thinking about table and relationship design.

It's not just the specifics of the words in a phrase that we need to understand. We also have to understand the principles by which we describe one thing in terms of another. At the risk of over-simplification, what I'm proposing is that it's not enough to notice that we can, and do, refer to a "Human Resources Department", even though we do not refer to "Manager Employee" or "Salesperson Employee". It's not just the actual syntax that matters, it's also the semantics behind that syntax which matters.

And now we have a better way to understand that meaning in terms of the X and Y structures.

Let me illustrate it this way. One of these statements is true, one is not. It has to do with whether the X and Y relationship is ONE KIND OF or AN ATTRIBUTE OF.

Once a Human Resources Department, always a Human Resources Department.
Once a receptionist, always a receptionist.

If we go to work on Monday in the Human Resources Department, we don't expect to go into work on Tuesday and find that the Human Resources Department is now in charge of Sales instead.

If we go to work on Monday as a receptionist, it's quite possible that we can go back to work on Tuesday as a Marketing Specialist due to a transfer or promotion.

And that's the kind of logic we need to bring to our tables in a relational database application.

I'll sum it up by saying that, if you decide you are looking at a potential component of your database, and you decide that it is an example of ONE KIND OF thing, it becomes a record in a table. If you decide it is an example of AN ATTRIBUTE OF, it becomes an attribute, or field, in a table.


Records and Attributes in a Relational Table





Thursday, March 19, 2020

I Need One of Those Whatchamacallits

Tables—The "Nouns" of the Database Language

The first rule of normalization specifies how fields and columns in a table should be set up. The other rules add greater specificity. However, in my opinion, before we can even get to that point, we have to have a clear idea of what the table itself should look like. I like to make the analogy that the tables in a relational database application play the role played by nouns in a natural language. In other words, they are the names of the things we want to talk about. But we have to be very disciplined in selecting those names and the things they represent.

Let's consider a more or less random list of names of things that you might want to materialize as tables in a relational database application. At this point, they are still candidates for inclusion, still in the realm of natural language, because we are still just thinking about the things we see and talk about in our day to day work; things we may want to create tables for in a relational database application.

As I review them, I'll try to explain why some of them ARE good candidates to become tables, and some ARE NOT. If we can codify the thinking behind those decisions, we can reuse that decision-making to create other tables for other databases. Also, we'll identify at least one table that is missing from our initial list of candidates.
  • 2018 Budget
  • 2019 Budget
  • 2020 Budget
  • Current Employee Assignment
  • Department
  • Employee
  • Human Resources 
  • Location
  • Manager
  • Payroll
  • Sick Days
  • Vacation Days

Human Resources Functions, I Presume?

First, we can deduce that these might be tables you'd find in a Human Resources relational database application. In fact, Human Resources is one of the names in the list of candidates to be materialized as a table. So let's see what else we know about Human Resources. 

Most organizations organize themselves into  departments; Human Resources is one of them along with Operations, Marketing, Information Services, and usually others. People who work in Human Resources are tasked with managing the people who work for the organizations, how and how much they get paid, which departments they work in and so on. People working in Operations are responsible for organizing the way the organization does its work, Marketing employees develop programs to sell the organization's goods and services, and so on.

The first thing that jumps out to me is that the list of candidates includes Department and Human Resources. Obviously, we are primarily concerned about HR functions here, so it's understandable that HR would be proposed as one of the table candidates. However, what we already know about how organizations work in the normal world tells us this is not going to work in a relational database application.

Egos aside, does it make sense to lump other departments into one category while isolating the HR department into another, separate category at the same level? From a natural language perspective, if you were going to create a presentation to the board of directors, would this rhetorical structure make sense?
The XYZ Company consists of two divisions: Human Resources and Other Departments.
Is it really a binary choice between two categories? Probably not.

You might see a rhetorical organization like this, though:
The XYZ Company consists of numerous divisions, including Human Resources, which is the subject of our presentation today.
According to Discourse Bloc theory, the basic rhetorical structure of these two formations is very similar. Both follow one of the common Bloc structures found throughout technical discourse. Specifically, that would be:
Bloc X consists of one or more units of Bloc Y.
For example:
Animal species (Bloc X) include (or consists of) mammals, reptiles, insects, and others (Bloc Y). 
 In our relational database application, the rhetorical formulation is this:
Bloc X (the XYZ Company) is organized into multiple Bloc Y parts (divisions or departments). 
  • The first formulation artificially limits the Y number of parts to two UNEQUAL parts. 
  • The second formulation recognizes that there are actually multiple, EQUAL parts.
You probably agree with me that that the first formulation would make no sense. In fact, it might get you in trouble with the other departments represented at that meeting. However, the second formulation is semantically, as well as rhetorically, sound because it a) correctly identifies the relationship between whole (XYZ Corporation) and its parts ( XYZ Departments), b) acknowledges the parity of Human Resources with other departments, and c) identifies the primary focus of the presentation.

In our database language, then, the noun we choose for our table name should reflect the same logic. We need tables which accurately reflect the real world.  I suspect the logic is a lot easier to recognize "in the real world" because that's where we spend most of our time. We know that there are many  more or less equal departments even if we only want to talk about one of them. In a relational database application, sometimes we lose sight of that.

In the database we need one table, called Department (the whole). One of the departments (the parts) listed in that table will be Human Resources. That's true even though all of the other things we name and talk about in this database all pertain to that one department. We're tracking the functions of HR, but we're not ignoring reality to do so. In this relational database application, we're not tracking what the Operations or Training Departments do, we're tracking the people, i.e. the Human Resources, who work in all of the organization's many departments.

Conclusion: We eliminate "Human Resources" from further consideration to become a table. We retain "Department" as a table candidate. Department is the name of the entity we want to track in our database. It's the proper noun.

The Things Human Resources Must Track

So, if we know that these are all related to Human Resources activities, the next task is to decide which of these proposed tables should be included, and if they are included, should they be used "as is" or further refined.

Managers are People Too

There is a proposed table for "Managers" and a proposed table for "Employees". Does that make sense? I suggest not, for this reason.

The people who work for XYZ Corportation  includes all of those people who are employed to "do something" on behalf of the organization. The Discourse Bloc for this is similar to the one we saw for departments, Bloc X consists of multiple Bloc Ys.
Bloc X (people who work for XYZ) fills several Bloc Y roles (managers, sales people, IT Developers, assembly line workers, and so on).
Formulated in that X/Y structure, it's pretty easy to see that we need a higher level, Bloc X category for all of the Bloc Y roles. The obvious choice is Employee. All managers are employees; all delivery drivers are employees, all software engineers are employees.

Once again, we drop the lower level, Y Bloc, terminology from consideration as a table. We should have an employee table; we should have no manager table.

Y Blocs Don't Disappear

At this point, we need to point out that removing "Human Resources" and "Manager" from consideration as tables in themselves does not mean they aren't relevant. They are. They are "Bloc Y" members of the two remaining "Bloc X" categories. And that means they'll become records in their respective tables, along with any other Bloc Y units that catch our attention.

Another way to say this is that "Department" and "Employee" are two of the nouns we need to include in order to talk logically about Human Resources activities at the XYZ Corporation.

We can continue review and analysis of the other candidates on the list; some will make into the Bloc X group that become tables, some into the Bloc Y group that will become records in those tables. I'll take up that task in my next installment.


Friday, March 13, 2020

Answering the Big Questions with Simple Answers

Mathematical and Linguistic Logic — The Rules of Normalization and the Rules of Rhetoric.

Relational database design is based on predicate logic mathematics, more specifically, the mathematical theory of sets. Unfortunately for many folks, the mathematics involved tend to be dauntingly complex. And I'm speaking from personal experience here.
At least partly for that reason, people have codified a set of design principles, or rules, to make it more accessible to a wider audience. We know these principles as the Rules of Normalization.
There are at least five Rules of Normalization, although only the first three are considered crucial to sound database design in many, if not most, cases. To make matters more challenging, different writers have come up with different formulations of the rules!  Here's a fairly typical overview of one of those formulations.
For a table to be in the First Normal Form, it should follow these 4 rules:
  1. It should only have single (atomic) valued attributes/columns.
  2. Values stored in a column should be of the same domain
  3. All the columns in a table should have unique names.
  4. The order in which data is stored does not matter.
For a table to be in the Second Normal Form,
  1. It should be in the First Normal form.
  2. It should not have Partial Dependency.
A table is said to be in the Third Normal Form when,
  1. It is in the Second Normal form.
  2. It doesn't have Transitive Dependency.

Do What Now?

See what I mean about accessibility? What are atomic valued attributes? What is a partial dependency and how does it differ from a transitive dependency and how do you know either of them when you see them? It's enough to make a grown person cry, right?
Seriously, though, while it is does take dedication and study, nearly anyone can understand and implement the Rules of Normalization. As a matter of fact, if you intend to work with MS Access to build relational database applications for yourself, your organization or clients and customers, you must do so.
As self-identified MS Access trainer and advisor, one of the ways I've tried to penetrate that layer of complexity on behalf of new developers is to offer analogies between the natural languages we all speak and the "language of the database" inherent in the Rules of Normalization. In a recent post, for example, I compared Tables, Fields and Relationships  between tables in a database to the Nouns, Adjectives and Verbs of a natural language.
I would like to expand on that further in this post.

Natural Language and Rhetoric, Database Language and Normalization

While the range of potential topics for discussion in natural languages is open-ended, we rely on a commonly understand set of rhetorical patterns for all of them. We use the same language structures to talk about an endless variety of topics: poetry, politics, business, science, wisdom, war, disease, culture, family, religion, and on and on and on. The important concept at work here is that there is considerable overlap both in vocabulary and—more importantly—in the rhetorical structures that organize our discussions of each of those subjects. That is the central thesis of Will Pitkin's PhD Dissertation, A Pedagogical Model of Discourse in which he argued that virtually all "discourse blocs" follow the same clearly identifiable rhetorical patterns. Understanding how discourse is composed of those building blocs is an essential skill. We seldom pay attention to that fact, but it is crucial to our ability to communicate complex subject matter. There are, as I see it, "Rules of Normalization" for rhetoric and we all intuitively rely on them to communicate.

Databases, in my opinion, are very much like that too. Any given database uses the same building blocks regardless of the subject matter it contains. A college professor, for example, might have a database to organize her library of poetry and literature, but it would probably not have a component devoted to rare mammals of the tropics. The scientist studying those mammals, on the other hand, would have tables devoted to species and habitats, but probably nothing about rhyme and meter. Yet poet and scientist both rely on the components of a database—tables, attributes and relationships between tables—in exactly the same ways.

The Role of Ambiguity

The problem is that natural languages tend to be a lot more forgiving of ambiguity.
Another way in which we can contrast natural languages and database languages is to recognize that ambiguity is an inherent, and sometimes even a desirable, element of natural language discourse. Poetry relies on it, as do politics and humor. The interplay between two or more possible interpretations of a phrase or sentence delights us, confuses us, and even enrages or mislead us. All of us have employed ambiguity at one time or another to achieve one or more of those goals. And the more skillful the writer, the more enjoyable it can be.

Technical discourse—which is the closest natural language analogue to the relational database applications of interest to us—is the most likely rhetoric to try to limit ambiguity.

To be useful, the language of a database must exclude ambiguity to the greatest extent possible. That is, after all, one of the primary reasons the scientist goes to the trouble of using it to classify the specimens collected for study. If you can't differentiate between those specimens, what's the point? It matters a great deal that the attributes of a small furry mammal are clearly and unambiguously stored in a table.
So how does this relate to our theory of database design? Let's take a look at the three components needed to "describe" any subject in the language of the database.

The Language of the Database

  • Tables are the Nouns of the Database Language
  • Fields are the Adjectives of the Database Language
  • Relationships are the Verbs of the Database Language

Tables—The "Nouns" of the Database Language

Tables are the objects in which we store data about the subjects of concern to that database. In a database of poetry, we'll need to talk about things like "poets" and "poems", at the very least. If the database is intended to support analysis of poems, then it needs to talk about things like "rhyme" and "meter" and so on. If it's intended to support biographical history of poets, it will need still other tables. Each of those topics—or entities to use a more database-friendly term— is a candidate to become one table. And the names of those things correspond to the names of the tables. A table called Poet", a table called "Poem", a table called "Rhyme", and a table called "Meter" would appear in this database. More importantly, only those things identified as "poems" can be stored in the poem table. On the surface, this seems obvious when we are in the natural language realm. A poem is a poem, not a helicopter. But it's not always that simple, unfortunately. Is James Joyce's Ulysses intended to be a novel or a poem? It's certainly much longer than other poems like a novel is, but the language and style of it set it apart from most novels. It's been compared to Homer's Odyssey, itself an extended poetic tour de force hundreds of pages long. So, poem or novel, or a bit of both? That's where ambiguity fits nicely in a natural language, but much less so in a database. You can talk about Ulysses without ever classifying definitively. You can't store it in a database both as a poem and as a novel.
And that means we have to identify and define the characteristics by which we classify "Poems" in this database. By identifying the appropriate attributes, we can safely store data without ambiguity.

Attributes—The "Adjectives" of the Database Language

To make it clear what is, and what isn't, a Poem, the Poem table needs to store attributes, or characteristics, that pertain to Poems and ONLY to Poems. Poems may or may not have rhyme schemes, for example. And if they do, that rhyme scheme will be one of those previously defined,  iambic pentameter, for example, or blank verse. That means one of the fields needed for the Poem table is "RhymeScheme". There are many different possible values for that attribute, but there's only one category, or field, for them all. I'll provide more information on this in a later blog.

Relationships Between Tables—The "Verbs" of the Database Language

Poets WRITE poetry. Poems ARE WRITTEN BY poets. No other relationship exists. Poems are not BAKED BY poets. Poets don't DRIVE poems. In order for the database of poetry to make sense, then, there is one and only one possible relationship between the tables for Poets and Poems.
The verb "WRITE" in this natural language formulation corresponds to the logical relationship between the Poet table and the Poem table.
Let's foreshadow a future discussion by pointing out that Poets seldom limit themselves to a single poem in their entire career, at least they don't do so by design. So, while the relationship itself is between "poet" and "poem", the reality is that we have a table of poets and a table of poems, there will be one OR MORE poems in the poem table for each of the poets.
I think that's enough for now. I'll continue my thoughts on the Language of Databases in future installments.

Tuesday, March 10, 2020

Speaking of Relationships, How You Doin' ?

Many years ago I first started thinking about relationships. At that point, I was studying language and linguistics in preparation for what I thought would be a career in teaching English. That relationship didn't work out.

Many years later I discovered MS Access and fell in love all over again, starting another relationship that's lasted quite a bit longer.

What I didn't realize at the time, though, is that much of what I'd learned about rhetorical competence way back in the 70's was applicable to understanding normalization in developing relational database applications some 20 years later. And it's still relevant today as a I look back over several decades of a (mostly) enjoyable career.

Specifically, the linear legacy in both careers is that relationships are at the heart of everything.

The concepts behind Bloc Discourse* included the realization that comprehension of the content of any individual block of text is only one of the skills needed to understand expository discourse. There is a higher level principle at work. That principle depends on understanding the relationships between various units of discourse--sentences and paragraphs as well as longer blocs.

These days, when I try to explain Normalization to new developers, I often call upon a metaphor that combines both relational database design and rhetorical competence.

First, I like to think of Entities as the database counterpart of Nouns and phrases in a natural language. They are the names of things of interest.

Second, Entities and nouns don't stand alone. We need Attributes to provide the detail that makes our Entities meaningful within a context. They are the counterparts of adjectives in a natural language. They provide the nuance and flavor that distinguish one instance of an entity from another. The red car as opposed to the blue car. In a table of cars, therefore, we have a field called "CarColor" to hold that attribute of each car.

And third, in order to construct meaningful sentences, we need the linking words, verbs in a natural language and relationships in a database language. "Customers PURCHASE cars." "Dealers SELL cars."

All three components are required to communicate meaningfully, whether we are talking about a technical article describing the structure of discourse or a relational database application describing the sales operation of Red and Blue Motor Car Company.






*A Pedagogical Model of Discourse, Willis J. Pitkin, 1973

Monday, March 9, 2020

I May Have Been Ahead of Myself By 20 Years

First, a bit of background is in order. Recently, my ex-wife and her new husband suggested, if that's the right word, I might want to remove a few boxes of my things from their garage. They've been patient for a while, so even though I have no space for it all here, I went and picked them up.

One of those boxes contains papers I wrote while working on an advanced degree in English in the early 1970's, so maybe 45 years ago or there about.

I pulled one out and re-read it. It was on the general topic of rhetorical competence and the ability to understand technical papers by recognizing the structural relationships between units of discourse in those papers. My conclusions included the observation that, while content is important to reading competently, understanding those relationships and the writers' intent in establishing them was an equally important skill. In other words, the relationships between blocs* of text are just as important as the semantic meaning of those blocs of text.

So, as I read this paragraph, I couldn't help but think I was foreshadowing the days when I moved on to the development of relational database applications and returned to that fundamental concept: relationships between entities.

And finally, during the course of this analysis, it became clear that there were different kinds of relationships involved between certain units Specifically, it is possible to identify units from a structural viewpoint exclusively, in terms of bloc signals and sequencing of units. But in many cases, the analysis here resorted to what is more properly termed an "operational" analysis, wherein other considerations were made in deciding what the relationship with and between blocs should be. (emphasis added)
Forty five years ago, the key role of relationships was already being developed in how I understood effective technical communication. Twenty years later, when I first encountered Database Normalization, I guess it's fair to say that I was already primed to "get it".

And for that I will always be grateful to the late Will Pitkin and the other members of my thesis committee, John E. Lackstrom and Karl J. Krahnke.




=====================
*I used the spelling "blocs" because much of the thinking behind this approach came from a PhD Dissertation written by one of my professors, Dr. Willis J. Pitkin, who used that term in his work.

Thursday, January 2, 2020

All At Once or One At a Time?

Like most Access developers, I pay a lot of attention to the technical aspects of creating tables in a Relational Database Application. Making sure all of the relevant entities are identified and tables are built for them. Identifying relationships between those entities and defining Primary and Foreign Keys to support and enforce referential integrity within those relationships. Sometimes, though, it's not enough to identify relationship as one-to-one, one-to-many or many-to-many. You also have to take into account the logic, or business rules, that apply to the business itself. That's what I want to talk about today.

Concurrent Status Vs Sequential Status

When designing and building a new Relational Database Application it's rather important that you understand whether a relationship is "All at Once" (i.e. concurrent) or "One at a Time" (i.e. sequential). Initially, they can look very much the same, but they're not. And that has implications for the design of the tables and the relationships between them. Let's look at a couple of examples, and see if we can sort out a workable solution.

One at a Time

These may sound like dumb questions, but I'm trying to make a point I'll explore later.

How many employees can drive the same company car at the same time? 
How many company cars can the same employee drive at the same time?


Pretty obviously ( I hope), only one person can physically drive a car at the same time, and one person can only drive one car at a time. Those logical restriction are based on the nature of "driving a car". Over time many different people can take turns driving different cars, of course. (If you happen to have a teen-age driver in your household and only one car, I suppose that could seem like more of a hypothetical proposition đŸ˜‰.)

It also makes sense that an employee can drive different cars on different days. And different employees can drive one specific company car on different days. This can be thought of as a "One at a Time" scenario.

All at Once


Here's another set of questions that might help understand why it's important to know how it works before you start building tables in a Relational Database Application.

How many classes is Yoshiko enrolled in this semester?
How many students are enrolled in the Math 101 class this semester?

Again, it should be obvious that a person can enroll in one or more classes each semester. There's an upper limit to the course load any one person can handle, but the exact number varies from person to person and from semester to semester. Moreover, because a Class can enroll one or more persons, there is no logical restriction on the number of concurrent enrollments by different students. The only restriction here is that one student can't take the same class more than once in a given semester. This can be thought of as an "All at Once" scenario.

Concurrent Vs Sequential History

At first glance, one might assume that both of these scenarios represent Many-to-Many relationships, which means, in turn, that they require a junction table like this one, which illustrates a concurrent status for students and classes.

Students Concurrently Enrolled in Classes
Each student can concurrently enroll in one or more classes each semester. Each class can concurrently be taken by one or more students each Semester that class is offered by the school.

Our junction table for students and classes includes two foreign keys, one for each student and one for each class. In addition, to enforce the requirement that students can enroll in a class only once each semester, the StudentClass junction table includes the SemesterStartDate as part of the three field composite Primary Key. That allows students to enroll in a class more than once, sequentially. That might happen if a student failed the class on the first attempt and need to retake it.

For Company Cars assigned to Employees, we initially might want to design our tables the same way because we know that one or more employees can check out a company car, and that a company car can be assigned to one or more employees.
Company Cars Assigned to Employees
Our first attempt at a junction table for employees and company cars includes two foreign keys, one for employees and one for company cars. In addition, the CheckOutDate is included in the composite key. That means an employee can only check out a Company Car once on any given date. That is, in fact, what we want.

However, it turns out that this table design actually allows us to create records that would violate our rule about Sequential, or One at a Time, assignments. It is possible, with this design, to assign an Employee to two or more different Company Cars on the same date, or to assign the same Company Car to two or more different Employees on the same date. See the sample records in this query.
One Company Car Assigned to Two Employees on the Same Day

Both Bezalel Simmel and Georgi Facello end up being assigned the Ford Focus on the 2nd of January, 2020. And that is fully supported by the Junction Table. Obviously, this is not logically acceptable. We need a different kind of table design. Here's how the first attempt at resolving this problem might look.
Composite Primary Key on Company Car and Checkout Date
 This approach seems like it might work. The Ford Focus can only be assigned once per day, which means that Giorgio gets it on the 2nd, Bezalel gets it on the 3rd, and Parto gets it on the 5th. 

However, there's still another problem. Parto gets two different Company Cars on the 5th. The Primary Key restraint doesn't prevent that; it only prevents assigning the same car twice on the same day.

The same problem would appear, though, if we tried to enforce the Primary Key restraint on EmployeeID and CheckoutDate. Each employee could get one car per day, but the same car could be assigned twice (or more) on that same day.

Our next step in resolving this problem will be to add an additional unique index to the junction table, like this.
Unique Index on Two Fields Not in Composite Primary Key


By adding a second unique index on the other two fields we need to restrict, we can now prevent duplicate assignments to the same employee on the same date, in addition to preventing duplicate assignments of the same company car on the same date. 

Note: the desired restriction is enforced by the Unique property. The Primary Key is unique by default. The second index we created, on EmployeeID and CheckoutDate, could be applied without the unique constraint. However, because it is possible to create that second index and make it unique, we can achieve the desired restriction on Employees and Company Cars as "One at a Time".

There Is More than One Way to Get the Right Answer

It's worth pointing out that there are other ways to go about handling this situation. For example, the composite primary key could be on EmployeeID and Checkout date while the second, unique index could be set up on CompanyCarID and CheckoutDate. 

Not only that, we could assign a new AutoNumber Primary Key to this table, and make two other unique indexes which would achieve the same result.
Surrogate Primary Key and Two Unique Indexes
The key features of this approach to designing the tables for a Sequential Type of Many-to-Many Relationship include the following:


  1. It limits records for each combination to "One at a Time"
  2. It supports a history of transactions in which two entities (in this case, employees and company cars) are involved over time.
  3. It enforces all of the restrictions at the table level. That means no additional wads of code have to be written.

 

Wednesday, January 1, 2020

Singing (and Dancing) With Access

Dancing with the Stars, Singing with the Idols


I like to think I can make MS Access sing and dance. Give me a task and most of the time I'll find a way to accomplish it. Once in a while it takes longer to navigate through a new kind of process, but so far I've never failed (well there was that once when I couldn't mimic Excel exactly in a datasheet view in a form, but it was on the clients "it would be nice..." list anyway). However, there are limits to what even Access can do. When I find myself stumped, I tend to want to blame Access. But that's just an excuse, on several levels. It shames me when I resort to blaming someone else for my shortcomings. That makes me human, I guess. It doesn't make me right.

Que SerĂ¡ SerĂ¡.

 Access is what it is. How trite and tautological, right? But one of the things that alternately frustrates and amuses me is that Microsoft designed Access to do exactly what it does, no more and no less. I refuse to admit defeat until I've exhausted all of the options. Sometimes I'm quite delighted to learn there is a way to do something I would have thought impossible. Often those tasks relate to working with URLs or other things webbie. It's something I know little about. However, I've learned never to say never, except when I've spent enough time Bingoogling to conclude I'm not going to find an answer. Then it's time for a long walk and maybe a fresh cuppa Costa Rican dark roast.

The corollary to that, of course, is that Access isn't what it isn't. VBA is a rich, powerful coding language that works under Windows on the desktop. It's not a web coding language. It never was, and despite all the Sturm und Drang about it over recent years, there's little chance it ever will be (I'd say zero chance, but, you know, never say never). You want to make web pages, go learn a web coding language. There are lots of them and you are smart enough to figure it out, if you must. Just don't waste my time and yours whining about something that nunca serĂ¡.

 Mommas Don't Let Your Babies Grow Up to Be Cowboy Coders

I run into this from time to time on the on-line forums. The question comes in the form of "How do I write a function (or maybe some SQL) to do X, Y, and Z? And then, when the answer is along the lines of "Use the built-in Access feature that does X, Y and Z, the response comes back, "But I want to do it this way. How do I do that?"

One common example is form design. Bound forms are the heart and soul of nearly every Access Relational Database Application. Writing a lot of code to handle unbound forms seems like a real waste of time and resources in all but a few very specific situations. If you want to ride off into the sunset on a white mustang, be my guest. But don't lose sight of the fact that the end of the road for most cowboys includes a lot of isolation.

If you don't want to do it "the Access Way", that's your call, of course. But  don't complain when it turns out to be harder, less reliable and less flexible than "the Access Way".

Know Your Limitations

 Access, for example, doesn't do a great job with data security. It has other strengths. Keeping your data absolutely protected is not one of them. If that's a requirement for your Relational Database Application, find a different tool. Access is, and always be, my favored tool for creating Relational Database Applications. On the other hand, just like I wouldn't haul pigs to market in a Cadillac I am pretty sure there are better tools for a lot of tasks.


Access and I are going to be dancing partners for a long time to come, I hope.We just have to avoid stepping on each others toes.

How Long Has This Been Going On?

It is axiomatic that an organization invests time and other resources in acquiring a well designed Relational Database Application because it needs and wants answers to common questions about their organization and its operations.

Things like:
  • How many people work here?
  • How long has Bob worked here?
  • Where does Nancy work, in Home Office or Regional Office One?
  • What's our best-selling line of shoes?
  • What is our profit margin on men's socks?
Dozens of other questions can be asked about nearly every organization. The job of your Relational Database Application is to gather, store, protect and analyze the data needed to answer all of those questions. And that means the  tables in that Relational Database Application have to be designed accordingly. It's the "Relational" part of the Database Application.

Here's an example of two ways to design tables, depending which questions need to be answered.

History vs Current Status

There are two ways to store Salary information for employees. One would be to store only the Current Salary paid to employees. The other would be to store the entire Salary History for those employees. The first requires only one table, the second requires two tables.

So, before you even start building out the tables for this employee tracking Relational Database Application, you need to know which question(s) the organization wants to be able to answer. This is a partial list of the kinds of questions Human Resources might ask about employees' salaries.

  1. How much does Jose, in Sales, make each year?
  2. How much did Anastasia, in IT, make in her previous position in the Shipping Department?
  3. When did Mark, in Customer Support, last get a raise?

Current Status

The first question can be answered from a single "Current Status" field in one table. It's a point in time question, and it refers only to THIS point in time. So we need only one value for Current Salary at a time.*

Dynamic vs Static Status

Note, however, that this does NOT mean the value of Current Salary is static. It can change from time to time. But regardless of those changes over time, the only question a Current Status field can answer is "How much is it when I ask the question?"

A static value, such as the employee's Date Of Birth, doesn't change. It too can be stored in a single "Current Status" field in one table.

If the organization decides, as a matter of policy, that they are concerned only with that Current Status, this "Current Status" field in a single table design is adequate. As a matter of fact, one can think of a number of data points for which we need to store only Current Status. Phone number comes to mind. If a customer changes their phone number, for example, we need to update our records to that new number, but there's no point to retaining the old, obsolete, phone number. You couldn't call it anyway!

History

Many other kinds of questions we typically ask in a Relational Database Application, though, require knowledge of the history beyond the current status. "When did this happen last?", "How Long has this been the case?", "What was it like Before and what is it like After?" questions all require us to keep track of the current status as well as each of the previous statuses along with the dates when those statuses change.

In order to answer the question about Anastasia's previous salary, we actually have to keep that data point in a table, along with the beginning and ending dates for that salary and her tenure in the Shipping Department.

Operational and Technical Decisions

Answering questions about "Current Status" requires only one data point, the current salary. That means a single field in a single table can store that data point. So, when it comes time to design the employee table for our organization, we can add a field called "Salary" to the Employee table and update it from time to time as required.

Because answering "History" questions requires a different approach to storing data, we also have to take a different approach to designing the tables to support it. We can't put a new field in a table for each salary change that occurs for every employee.  Think about how awkward that would become. Let's say our organization has ten employees on day one. We store their "StartingSalary" in the employee table along with other pertinent data points.

One month later, Anika gets promoted and receives a pay raise. If we want to keep the salary history, and if we're trying to store that history in the same table, we have to add a new field for Anika's new salary and call it something "Salary_2020_03_22" to reflect the date it happened. But that means the other nine employees now have the same field, "Salary_2020_03_22",  but nothing to put in it; they didn't yet get a raise.

Then, six months later, four other employees get raises and that means another new field, "Salary_2020_09_12". So, now Anika has values under "StartingSalary" and "Salary_2020_03_22". Four employees also have "StartingSalaries", but they have "Salary_2020_09_12" salaries. And five other employees have only "StartingSalaries" and nothing in the other two salary fields.

That scenario sounds far-fetched, of course, and it's obvious (I hope!) that it would be a terrible way to design and maintain a Relational Database Application**. Unfortunately, we see that kind thing all too often. It's not always as easy as this to see how ridiculous it is, but it always leads to problems.

Because we want to answer a different kind of question, we have to design our tables differently.

In this case, we have a one-to-many relationship between Employees and Salary (history). One employee has one or more salaries OVER TIME.

The proper design, therefore, is an employee table and a related EmployeeSalary table.

These screenshots illustrate how it has to work:
One Table Current Status of Salary vs Two Table of Employee Salary History
Note that Salary History requires at least a "from_date" in order to establish the proper sequence of values.
History of Salaries for One Employee
The query on the left is based on the table called "employee_salary". The query on the right is based on the two related tables, "employee" and "employee_salary_history."

 Look Before You Leap

Before you leap into creating a new Relational Database Application, make sure you know which questions the organization will need to answer, this year, next year and beyond.

--------------------------------------------------------------------------------
* The third question MIGHT be answered from a "Current Status" field in one table if we also include the "As Of Date" for that status.

**On the other hand, it's pretty much the only way you can do it in an Excel Worksheet, isn't it? đŸ˜€