Tuesday, March 31, 2020

Don't It Make My Brown Eyes Blue

The Role of Ambiguity in Real Life and Relational Database Applications

For a long time I have maintained that ambiguity is not good in a relational database application even though it is required in humor, poetry, politics, and as it turns out, in song lyrics—"Don't it make my brown eyes blue". That line depends on the dual, i.e. ambiguous,  meaning of "blue". To get it, you have to be competent not only with the dictionary meaning of blue as a color, but the metaphorical meaning of blue as an emotional state. That ambiguity adds spice to the song.

Ambiguity is not so good, however, in a relational database application. We can't trust data if it might mean one thing today and something else tomorrow. Color is color, emotion is emotion. Sure, relational database applications are not as much fun as puns, but far more useful in providing reliable information.

Therefore, in order to build a useful set of relational database tables, we have to be ruthless in eliminating ambiguity by making sure our understanding of each element is restricted to one, and only one, possible interpretation. Such attributes are immutable.

Speaking of Eye Color, for example, the appropriate formulation in a relational database application is:
X is AN IMMUTABLE ATTRIBUTE OF Y.
Color (X) is an immutable attribute of Human Eyes (Y).
Having claimed that, of course, I hasten to acknowledge that often a baby's eye color changes in its early days. But, by and large, if you have brown eyes as an adult, you're not likely to wake up to blue eyes one morning, even if your sweetheart leaves for greener pastures.

On the other hand, there IS a way metaphorically the current color of your eyes can change. In the words of the Crystal Gay classic, sometimes your brown eyes can "be blue". Being blue—a metaphor for sorrow—is a good way to capture the feeling of sorrow we all experience from time to time. That's entirely valid in the proper context, i.e. poetry, humor, and music.

But unlike in song lyrics and puns, we can't tolerate dual meanings in a relational database application.

In order to formalize the difference between an immutable attribute and a variable status in a relational database application, we need to add an X and Y rhetorical structure to reflect that some attributes are temporal.

X is A VARIABLE STATUS OF Y.
Sorrow (X) is a variable status of a person's emotional state (Y).
In the words of a country singer, "Blue" is a stand-in for "Sad", which is one possible status of a person's emotional state. Other states might be "Happy", or "Angry" or "Confused". By the way, in the database world, there might well be a relational database application in which a person's emotional state would be an attribute of interest, say in recording patients' responses to different experiences in a clinical therapist's practice.

A key difference between a variable status and an immutable attribute is mutability or immutability. Eyes can metaphorically change from brown to blue, but not physically. Emotional state, on the other hand, is variable and even volatile at times. We can be happy, angry, sad and confused all on the same day.

Generalizing the Principle

The same kind of analysis can be applied to any attribute we consider for inclusion in a relational database application. We can examine candidates for entities (or tables) and attributes (or fields) in light of their mutability. Depending on how we decide to handle them, the tables and fields needed to support them will need to be set up differently. The important thing to keep in mind is that we do have some principles, formulated as X and Y structures, to guide us.

Roles and Statuses are Variable

Mutability, or variability, means we can handle relational tables for status in the same we handle relational tables for roles. Specifically, there is a temporal, or sequential, or historical, component to both. That means if we need to capture the history of changing statuses for an entity, we need multiple, date flagged records in a history table. In the relatively more rare situation that we are only interested in current status, of course, we can create an updateable attribute field in the parent table.

Let's consider some examples of such statuses for our hypothetical Human Resources relational database application.

One status commonly tracked by HR for is whether an employee is paid
  • an hourly wage. 
  • a monthly salary
  • a piece rate
These statuses might be called "Compensation" or something similar.

For example, employees who drive forklifts in a warehouse are usually paid hourly wages, while the warehouse manager is usually paid a monthly salary. The line workers making the widgets that go into the boxes moved by forklifts may be paid according to the number of widgets they produce in a shift.

It's tempting to say, on first look, to say "Compensation" is an attribute of an employee, that's how each one is paid.
X (compensation) is A VARIABLE STATUS OF Y (employee).
It's also wrong. This particular situation has to do with employee roles as well as the employees themselves.

Remember, employees can, and do, move from position to position. The forklift driver can be promoted to supervising a shift in the warehouse and move, at the same time, from an hourly wage to a monthly salary. A line worker might go to training and learn to drive a forklift. And so on.

So, it's clear that "Compensation" is an attribute of the role which the employee holds, even though the common reference is to "hourly employees" or "salaried employees".

The correct X and Y structure is, therefore.
X (compensation) is AN IMMUTABLE ATTRIBUTE OF Y (Employee Role).
 And that calls for a field for "Compensation" in the Role table.

We now have four ways to analyze candidates for inclusion as tables and fields in a relational database application.
X is A KIND of Y
  • Human Resources is a Department within Company A 
X is AN IMMUTABLE ATTRIBUTE of Y
  • Hire Date is an immutable attribute of an employee. 
X is A VARIABLE STATUS of Y
  • Full-time is an employment status of an employee.
X is A VARIABLE ROLE of Y
  • Forklift driver is a variable role of an employee
 The first two X and Y formulations are materialized in a relational database application as fields and tables in those fields, reflecting their non-variable nature.

The second two X and Y formulations are usually materialized in a relational database application as two (or more) tables, one of which holds the history of the variable status or role.

Friday, March 27, 2020

I'm Not a Manager, I Just Play One in the EmployeeRoles Table

Last time out I outlined a couple of different ways to understand tables (the names of things) and fields (attributes of those things) in terms of our natural language understanding of common rhetorical structures.

I explained two of the X and Y rhetorical structures at work.
X is ONE KIND OF Y.
X is AN ATTRIBUTE OF Y. 
I offered the examples of departments within an organization and employees working for the organization. However, we need to drill further down in order to get a more complete and accurate picture of the real world we're trying to model in a relational database application. Let's do that next.

First, it is true that "Human Resources" is one kind of department, along with Information Technology, Marketing and so on. It is also true that "size" or "number of employees" is an attribute of departments.

However, I also went on to describe the different roles played by employees as attributes of those employees, although that formulation is not quite precise enough. I think it's more useful to propose a third X and Y rhetorical structure to account for the changeable nature of roles.
X CURRENTLY FILLS THE ROLE OF Y.
In my defense, it's not totally inaccurate to say that Role is an attribute of an employee of importance to the organization. On the other hand, I think it's more useful to recognize that employees can and do move more or less fluidly between roles.  And that means it's not simply the same thing as the immutable attributes of a person, like their birthdate.

Current Role is fundamentally different from some of the other attributes we can identify for employees. For example, the original Date of Hire for an employee doesn't change. That's an immutable attribute of the employee's relationship to the organization. People's names--for the most part--are also immutable attributes of people whether they work for the organization or not, and so on.

So, rhetorically speaking, we need to analyze candidates for inclusion in a relational database application as tables and as fields in those tables in at least the following three ways.

X is ONE KIND OF Y.
X is AN IMMUTABLE ATTRIBUTE OF Y.
X is currently IN THE ROLE OF Y. 
 I was initially a bit hesitant to include "immutable" as part of the second X and Y formula, because it's always possible that many such attributes can actually change. Women often change their last names after marriage, for example. Even a person's gender can be changed. So, although I am sure that it is useful to think of immutable and changeable as differentiators between attributes and roles, I do recognize that it's not totally unambiguous.

In relational database application design terms, we need a better way to think about the way we create tables and fields to handle these three structures.

As we saw last time, a table is the name of one thing: a noun, in other words.

At a general level, we can identify different attributes of those nouns: adjectives, in other words.

Now, though, we have a better, more granular, way to differentiate between those attributes, using the X and Y rhetorical structures.

X is ONE KIND OF Y.
  • Example: Human Resources is a Department at Company ABC. 
X is AN IMMUTABLE ATTRIBUTE OF Y.
  • Example: John Smith was hired in the Human Resources Department at Company ABC on June 1, 2010. 
X is CURRENTLY IN THE ROLE OF Y. 
  • Example: As of September 12, 2018, John Smith is Head of the Compensation Team in the Human Resources Department at Company ABC.
Because there is a fundamental difference between the unchangeable initial hire date (an immutable attribute) and the current Role Date (a current role), we may need to structure our tables differently. In other words, the important thing here is NOT the simple fact that we have to store "a date". The important thing is that different kinds of dates can have different implications about how we need to store the data.

Whether the current role is set up as an field in the Employee table depends on how Company ABC uses their data.
  • Is it important to know which roles John Smith has held and when he held them? 
  • Or do we only need to know John Smith's current role? 
 In most cases I would imagine that the historical record would be important here. And that, in turn, calls for a separate table: EmployeeRole. In this table, we store information about the employee, the role or roles they have held in the organization, and the begin and end dates of each of those roles.

I suppose that there are cases where only a current role or current attribute is important. For example, it could be argued that the employee's current name is all we ever need. The fact that Darlene changed her last name from Jones to Barker when she got married may not be relevant for our purposes. All we need to know here at Company ABC is what name to print on her paycheck.

Decisions of that sort can only be made within the context of the specific project. On the other hand, knowing that it is a decision to be made certainly is important.

If the history of an employee's different roles within the company is important, we need to create a table to hold it. That history is a thing, an entity separate from the employee.

Summing it all up.

For this hypothetical relational database application, I now see a Department Table, with at least one field for "DepartmentName".

I also see an Employee table, with fields for employee First, Middle and Last Names and HireDate. As noted, those four attributes are essentially immutable and, therefore, qualify to be attached to the Employee in that table.

However, "Current Role" is not an immutable attribute of an employee. Partly for that reason and partly because of decision about the purpose of the relational database application, it becomes a candidate for another table entirely: EmployeeRoleHistory. I like to call such tables "History" tables.

For now, we can sum up by identifying three X and Y rhetorical structures and their implications in designing a relational database application.

X is one kind of Y.
  • These are implemented as tables (Y) and fields (X) in those tables to hold the names of those kinds of things. 
X is an immutable attribute of Y.
  •  These are also implemented as tables (Y) and fields (X) in those tables to hold the domain of possible values of the attribute.
X is a current role held by Y.
  • These can be implemented as tables (Y) and fields (X) in those tables if, and only if, the CURRENT role is the only role of importance.
  • If the HISTORY of roles is important, these must be implemented as two tables (Y1 for Employee and Y2 for Role) in order to link up the role (X2) and the holder of that role (X1).

In my next installment, I'll dig deeper into the nature of the "X is a current role held by Y" structure and of the EmployeeRoleHistory table we need to properly support it in a relational database application.

Thursday, March 26, 2020

You Say Whatchamacallit, I Say Department

Last time out I tried to explain some of my thinking behind selecting the names of things we need to track in a relational database application. Specifically, I likened them to the nouns of a natural language as a way to help differentiate between entities themselves and the attributes that qualify them. It's more complex than that, though. Let's drill into one aspect of that complexity using the example of departments within an organization.

In terms of the Discourse Blocs analysis originated in Will Pitkin's original formulation, the specific structure here is:
X is one kind of Y.
The relationship between X and Y is that Y is a higher level group of similar X's, a hierarchy. X does not modify Y, which would be another possible X and Y rhetorical structure.
X is an attribute of Y.
Let's go back and look at two of the proposed tables I outlined in the previous post.

  • 2018 Budget
  • 2019 Budget
  • 2020 Budget
  • Current Employee Assignment
  • Department
  • Human Resources
  • Location
  • Employee
  • Payroll
  • Sick Days
  • Vacation Days

We're focusing on "Human Resources" and  "Department".

It's obvious to users of our natural language, that the rhetorical structure here is that "Human Resources" is ONE KIND OF "Department". (At least I think it's obvious to most of us.) We recognize that "Human Resources" is a qualifier (also referred to as an adjective) for the noun "Department". But the underlying rhetorical structure is that Human Resources is ONE KIND of Department, which is different in rhetorical meaning from, say, "Large Department" or "Small Department". "Large" and "Small" are attributes of ANY department.

In company A, Human Resources is a large Department.
In company B, Human Resources is a small Department.

There are rhetorically speaking, two X and Y structures at work.

  • X is ONE KIND of Y.
    Human Resources is a Department in Company A and in Company B.
  • X is AN ATTRIBUTE of Y.
    Human Resources is large in Company A and small in Company B.

Turning to the way we apply this understanding to the way we design tables in a relational database application, we can make the following conclusions.

In a relational database application that means we have one table called "Department", i.e. the noun, and that "Human Resources" is one of the records in that table  along with "Operations",  "Marketing", etc. They are all "One Kind Of", and that means they are all records in the Department table.

In other words, the specific terms differentiate between instances of "Department". Each such instance becomes one record. (I'll get to another aspect of this principle later, when I talk more about the problem of repeating groups and spreadsheet style tables.)

In the previous post, I also described the relationship between Employees and Managers along the same X and Y principle.

A "Manager" is ONE KIND OF "Employee".

Note that unlike Departments, we don't normally attach descriptors like "Manager" to nouns like "Employees" even though the X is one Kind of Y structure is the same. I.e. we don't say "John is a Manager Employee." We DO recognize, though, that the underlying rhetorical concept is consistent, i.e. "X is a kind of Y".

 Pitkin explored this important concept in his original work: most rhetorical structures can be expressed in more than one way. It's important to look for and identify those rhetorical relationships and not focus exclusively on the specific words in a block of discourse.

I realize that I'm pounding hard on this point. I'm doing so because it's so important to my way of thinking about table and relationship design.

It's not just the specifics of the words in a phrase that we need to understand. We also have to understand the principles by which we describe one thing in terms of another. At the risk of over-simplification, what I'm proposing is that it's not enough to notice that we can, and do, refer to a "Human Resources Department", even though we do not refer to "Manager Employee" or "Salesperson Employee". It's not just the actual syntax that matters, it's also the semantics behind that syntax which matters.

And now we have a better way to understand that meaning in terms of the X and Y structures.

Let me illustrate it this way. One of these statements is true, one is not. It has to do with whether the X and Y relationship is ONE KIND OF or AN ATTRIBUTE OF.

Once a Human Resources Department, always a Human Resources Department.
Once a receptionist, always a receptionist.

If we go to work on Monday in the Human Resources Department, we don't expect to go into work on Tuesday and find that the Human Resources Department is now in charge of Sales instead.

If we go to work on Monday as a receptionist, it's quite possible that we can go back to work on Tuesday as a Marketing Specialist due to a transfer or promotion.

And that's the kind of logic we need to bring to our tables in a relational database application.

I'll sum it up by saying that, if you decide you are looking at a potential component of your database, and you decide that it is an example of ONE KIND OF thing, it becomes a record in a table. If you decide it is an example of AN ATTRIBUTE OF, it becomes an attribute, or field, in a table.


Records and Attributes in a Relational Table





Thursday, March 19, 2020

I Need One of Those Whatchamacallits

Tables—The "Nouns" of the Database Language

The first rule of normalization specifies how fields and columns in a table should be set up. The other rules add greater specificity. However, in my opinion, before we can even get to that point, we have to have a clear idea of what the table itself should look like. I like to make the analogy that the tables in a relational database application play the role played by nouns in a natural language. In other words, they are the names of the things we want to talk about. But we have to be very disciplined in selecting those names and the things they represent.

Let's consider a more or less random list of names of things that you might want to materialize as tables in a relational database application. At this point, they are still candidates for inclusion, still in the realm of natural language, because we are still just thinking about the things we see and talk about in our day to day work; things we may want to create tables for in a relational database application.

As I review them, I'll try to explain why some of them ARE good candidates to become tables, and some ARE NOT. If we can codify the thinking behind those decisions, we can reuse that decision-making to create other tables for other databases. Also, we'll identify at least one table that is missing from our initial list of candidates.
  • 2018 Budget
  • 2019 Budget
  • 2020 Budget
  • Current Employee Assignment
  • Department
  • Employee
  • Human Resources 
  • Location
  • Manager
  • Payroll
  • Sick Days
  • Vacation Days

Human Resources Functions, I Presume?

First, we can deduce that these might be tables you'd find in a Human Resources relational database application. In fact, Human Resources is one of the names in the list of candidates to be materialized as a table. So let's see what else we know about Human Resources. 

Most organizations organize themselves into  departments; Human Resources is one of them along with Operations, Marketing, Information Services, and usually others. People who work in Human Resources are tasked with managing the people who work for the organizations, how and how much they get paid, which departments they work in and so on. People working in Operations are responsible for organizing the way the organization does its work, Marketing employees develop programs to sell the organization's goods and services, and so on.

The first thing that jumps out to me is that the list of candidates includes Department and Human Resources. Obviously, we are primarily concerned about HR functions here, so it's understandable that HR would be proposed as one of the table candidates. However, what we already know about how organizations work in the normal world tells us this is not going to work in a relational database application.

Egos aside, does it make sense to lump other departments into one category while isolating the HR department into another, separate category at the same level? From a natural language perspective, if you were going to create a presentation to the board of directors, would this rhetorical structure make sense?
The XYZ Company consists of two divisions: Human Resources and Other Departments.
Is it really a binary choice between two categories? Probably not.

You might see a rhetorical organization like this, though:
The XYZ Company consists of numerous divisions, including Human Resources, which is the subject of our presentation today.
According to Discourse Bloc theory, the basic rhetorical structure of these two formations is very similar. Both follow one of the common Bloc structures found throughout technical discourse. Specifically, that would be:
Bloc X consists of one or more units of Bloc Y.
For example:
Animal species (Bloc X) include (or consists of) mammals, reptiles, insects, and others (Bloc Y). 
 In our relational database application, the rhetorical formulation is this:
Bloc X (the XYZ Company) is organized into multiple Bloc Y parts (divisions or departments). 
  • The first formulation artificially limits the Y number of parts to two UNEQUAL parts. 
  • The second formulation recognizes that there are actually multiple, EQUAL parts.
You probably agree with me that that the first formulation would make no sense. In fact, it might get you in trouble with the other departments represented at that meeting. However, the second formulation is semantically, as well as rhetorically, sound because it a) correctly identifies the relationship between whole (XYZ Corporation) and its parts ( XYZ Departments), b) acknowledges the parity of Human Resources with other departments, and c) identifies the primary focus of the presentation.

In our database language, then, the noun we choose for our table name should reflect the same logic. We need tables which accurately reflect the real world.  I suspect the logic is a lot easier to recognize "in the real world" because that's where we spend most of our time. We know that there are many  more or less equal departments even if we only want to talk about one of them. In a relational database application, sometimes we lose sight of that.

In the database we need one table, called Department (the whole). One of the departments (the parts) listed in that table will be Human Resources. That's true even though all of the other things we name and talk about in this database all pertain to that one department. We're tracking the functions of HR, but we're not ignoring reality to do so. In this relational database application, we're not tracking what the Operations or Training Departments do, we're tracking the people, i.e. the Human Resources, who work in all of the organization's many departments.

Conclusion: We eliminate "Human Resources" from further consideration to become a table. We retain "Department" as a table candidate. Department is the name of the entity we want to track in our database. It's the proper noun.

The Things Human Resources Must Track

So, if we know that these are all related to Human Resources activities, the next task is to decide which of these proposed tables should be included, and if they are included, should they be used "as is" or further refined.

Managers are People Too

There is a proposed table for "Managers" and a proposed table for "Employees". Does that make sense? I suggest not, for this reason.

The people who work for XYZ Corportation  includes all of those people who are employed to "do something" on behalf of the organization. The Discourse Bloc for this is similar to the one we saw for departments, Bloc X consists of multiple Bloc Ys.
Bloc X (people who work for XYZ) fills several Bloc Y roles (managers, sales people, IT Developers, assembly line workers, and so on).
Formulated in that X/Y structure, it's pretty easy to see that we need a higher level, Bloc X category for all of the Bloc Y roles. The obvious choice is Employee. All managers are employees; all delivery drivers are employees, all software engineers are employees.

Once again, we drop the lower level, Y Bloc, terminology from consideration as a table. We should have an employee table; we should have no manager table.

Y Blocs Don't Disappear

At this point, we need to point out that removing "Human Resources" and "Manager" from consideration as tables in themselves does not mean they aren't relevant. They are. They are "Bloc Y" members of the two remaining "Bloc X" categories. And that means they'll become records in their respective tables, along with any other Bloc Y units that catch our attention.

Another way to say this is that "Department" and "Employee" are two of the nouns we need to include in order to talk logically about Human Resources activities at the XYZ Corporation.

We can continue review and analysis of the other candidates on the list; some will make into the Bloc X group that become tables, some into the Bloc Y group that will become records in those tables. I'll take up that task in my next installment.


Friday, March 13, 2020

Answering the Big Questions with Simple Answers

Mathematical and Linguistic Logic — The Rules of Normalization and the Rules of Rhetoric.

Relational database design is based on predicate logic mathematics, more specifically, the mathematical theory of sets. Unfortunately for many folks, the mathematics involved tend to be dauntingly complex. And I'm speaking from personal experience here.
At least partly for that reason, people have codified a set of design principles, or rules, to make it more accessible to a wider audience. We know these principles as the Rules of Normalization.
There are at least five Rules of Normalization, although only the first three are considered crucial to sound database design in many, if not most, cases. To make matters more challenging, different writers have come up with different formulations of the rules!  Here's a fairly typical overview of one of those formulations.
For a table to be in the First Normal Form, it should follow these 4 rules:
  1. It should only have single (atomic) valued attributes/columns.
  2. Values stored in a column should be of the same domain
  3. All the columns in a table should have unique names.
  4. The order in which data is stored does not matter.
For a table to be in the Second Normal Form,
  1. It should be in the First Normal form.
  2. It should not have Partial Dependency.
A table is said to be in the Third Normal Form when,
  1. It is in the Second Normal form.
  2. It doesn't have Transitive Dependency.

Do What Now?

See what I mean about accessibility? What are atomic valued attributes? What is a partial dependency and how does it differ from a transitive dependency and how do you know either of them when you see them? It's enough to make a grown person cry, right?
Seriously, though, while it is does take dedication and study, nearly anyone can understand and implement the Rules of Normalization. As a matter of fact, if you intend to work with MS Access to build relational database applications for yourself, your organization or clients and customers, you must do so.
As self-identified MS Access trainer and advisor, one of the ways I've tried to penetrate that layer of complexity on behalf of new developers is to offer analogies between the natural languages we all speak and the "language of the database" inherent in the Rules of Normalization. In a recent post, for example, I compared Tables, Fields and Relationships  between tables in a database to the Nouns, Adjectives and Verbs of a natural language.
I would like to expand on that further in this post.

Natural Language and Rhetoric, Database Language and Normalization

While the range of potential topics for discussion in natural languages is open-ended, we rely on a commonly understand set of rhetorical patterns for all of them. We use the same language structures to talk about an endless variety of topics: poetry, politics, business, science, wisdom, war, disease, culture, family, religion, and on and on and on. The important concept at work here is that there is considerable overlap both in vocabulary and—more importantly—in the rhetorical structures that organize our discussions of each of those subjects. That is the central thesis of Will Pitkin's PhD Dissertation, A Pedagogical Model of Discourse in which he argued that virtually all "discourse blocs" follow the same clearly identifiable rhetorical patterns. Understanding how discourse is composed of those building blocs is an essential skill. We seldom pay attention to that fact, but it is crucial to our ability to communicate complex subject matter. There are, as I see it, "Rules of Normalization" for rhetoric and we all intuitively rely on them to communicate.

Databases, in my opinion, are very much like that too. Any given database uses the same building blocks regardless of the subject matter it contains. A college professor, for example, might have a database to organize her library of poetry and literature, but it would probably not have a component devoted to rare mammals of the tropics. The scientist studying those mammals, on the other hand, would have tables devoted to species and habitats, but probably nothing about rhyme and meter. Yet poet and scientist both rely on the components of a database—tables, attributes and relationships between tables—in exactly the same ways.

The Role of Ambiguity

The problem is that natural languages tend to be a lot more forgiving of ambiguity.
Another way in which we can contrast natural languages and database languages is to recognize that ambiguity is an inherent, and sometimes even a desirable, element of natural language discourse. Poetry relies on it, as do politics and humor. The interplay between two or more possible interpretations of a phrase or sentence delights us, confuses us, and even enrages or mislead us. All of us have employed ambiguity at one time or another to achieve one or more of those goals. And the more skillful the writer, the more enjoyable it can be.

Technical discourse—which is the closest natural language analogue to the relational database applications of interest to us—is the most likely rhetoric to try to limit ambiguity.

To be useful, the language of a database must exclude ambiguity to the greatest extent possible. That is, after all, one of the primary reasons the scientist goes to the trouble of using it to classify the specimens collected for study. If you can't differentiate between those specimens, what's the point? It matters a great deal that the attributes of a small furry mammal are clearly and unambiguously stored in a table.
So how does this relate to our theory of database design? Let's take a look at the three components needed to "describe" any subject in the language of the database.

The Language of the Database

  • Tables are the Nouns of the Database Language
  • Fields are the Adjectives of the Database Language
  • Relationships are the Verbs of the Database Language

Tables—The "Nouns" of the Database Language

Tables are the objects in which we store data about the subjects of concern to that database. In a database of poetry, we'll need to talk about things like "poets" and "poems", at the very least. If the database is intended to support analysis of poems, then it needs to talk about things like "rhyme" and "meter" and so on. If it's intended to support biographical history of poets, it will need still other tables. Each of those topics—or entities to use a more database-friendly term— is a candidate to become one table. And the names of those things correspond to the names of the tables. A table called Poet", a table called "Poem", a table called "Rhyme", and a table called "Meter" would appear in this database. More importantly, only those things identified as "poems" can be stored in the poem table. On the surface, this seems obvious when we are in the natural language realm. A poem is a poem, not a helicopter. But it's not always that simple, unfortunately. Is James Joyce's Ulysses intended to be a novel or a poem? It's certainly much longer than other poems like a novel is, but the language and style of it set it apart from most novels. It's been compared to Homer's Odyssey, itself an extended poetic tour de force hundreds of pages long. So, poem or novel, or a bit of both? That's where ambiguity fits nicely in a natural language, but much less so in a database. You can talk about Ulysses without ever classifying definitively. You can't store it in a database both as a poem and as a novel.
And that means we have to identify and define the characteristics by which we classify "Poems" in this database. By identifying the appropriate attributes, we can safely store data without ambiguity.

Attributes—The "Adjectives" of the Database Language

To make it clear what is, and what isn't, a Poem, the Poem table needs to store attributes, or characteristics, that pertain to Poems and ONLY to Poems. Poems may or may not have rhyme schemes, for example. And if they do, that rhyme scheme will be one of those previously defined,  iambic pentameter, for example, or blank verse. That means one of the fields needed for the Poem table is "RhymeScheme". There are many different possible values for that attribute, but there's only one category, or field, for them all. I'll provide more information on this in a later blog.

Relationships Between Tables—The "Verbs" of the Database Language

Poets WRITE poetry. Poems ARE WRITTEN BY poets. No other relationship exists. Poems are not BAKED BY poets. Poets don't DRIVE poems. In order for the database of poetry to make sense, then, there is one and only one possible relationship between the tables for Poets and Poems.
The verb "WRITE" in this natural language formulation corresponds to the logical relationship between the Poet table and the Poem table.
Let's foreshadow a future discussion by pointing out that Poets seldom limit themselves to a single poem in their entire career, at least they don't do so by design. So, while the relationship itself is between "poet" and "poem", the reality is that we have a table of poets and a table of poems, there will be one OR MORE poems in the poem table for each of the poets.
I think that's enough for now. I'll continue my thoughts on the Language of Databases in future installments.

Tuesday, March 10, 2020

Speaking of Relationships, How You Doin' ?

Many years ago I first started thinking about relationships. At that point, I was studying language and linguistics in preparation for what I thought would be a career in teaching English. That relationship didn't work out.

Many years later I discovered MS Access and fell in love all over again, starting another relationship that's lasted quite a bit longer.

What I didn't realize at the time, though, is that much of what I'd learned about rhetorical competence way back in the 70's was applicable to understanding normalization in developing relational database applications some 20 years later. And it's still relevant today as a I look back over several decades of a (mostly) enjoyable career.

Specifically, the linear legacy in both careers is that relationships are at the heart of everything.

The concepts behind Bloc Discourse* included the realization that comprehension of the content of any individual block of text is only one of the skills needed to understand expository discourse. There is a higher level principle at work. That principle depends on understanding the relationships between various units of discourse--sentences and paragraphs as well as longer blocs.

These days, when I try to explain Normalization to new developers, I often call upon a metaphor that combines both relational database design and rhetorical competence.

First, I like to think of Entities as the database counterpart of Nouns and phrases in a natural language. They are the names of things of interest.

Second, Entities and nouns don't stand alone. We need Attributes to provide the detail that makes our Entities meaningful within a context. They are the counterparts of adjectives in a natural language. They provide the nuance and flavor that distinguish one instance of an entity from another. The red car as opposed to the blue car. In a table of cars, therefore, we have a field called "CarColor" to hold that attribute of each car.

And third, in order to construct meaningful sentences, we need the linking words, verbs in a natural language and relationships in a database language. "Customers PURCHASE cars." "Dealers SELL cars."

All three components are required to communicate meaningfully, whether we are talking about a technical article describing the structure of discourse or a relational database application describing the sales operation of Red and Blue Motor Car Company.






*A Pedagogical Model of Discourse, Willis J. Pitkin, 1973

Monday, March 9, 2020

I May Have Been Ahead of Myself By 20 Years

First, a bit of background is in order. Recently, my ex-wife and her new husband suggested, if that's the right word, I might want to remove a few boxes of my things from their garage. They've been patient for a while, so even though I have no space for it all here, I went and picked them up.

One of those boxes contains papers I wrote while working on an advanced degree in English in the early 1970's, so maybe 45 years ago or there about.

I pulled one out and re-read it. It was on the general topic of rhetorical competence and the ability to understand technical papers by recognizing the structural relationships between units of discourse in those papers. My conclusions included the observation that, while content is important to reading competently, understanding those relationships and the writers' intent in establishing them was an equally important skill. In other words, the relationships between blocs* of text are just as important as the semantic meaning of those blocs of text.

So, as I read this paragraph, I couldn't help but think I was foreshadowing the days when I moved on to the development of relational database applications and returned to that fundamental concept: relationships between entities.

And finally, during the course of this analysis, it became clear that there were different kinds of relationships involved between certain units Specifically, it is possible to identify units from a structural viewpoint exclusively, in terms of bloc signals and sequencing of units. But in many cases, the analysis here resorted to what is more properly termed an "operational" analysis, wherein other considerations were made in deciding what the relationship with and between blocs should be. (emphasis added)
Forty five years ago, the key role of relationships was already being developed in how I understood effective technical communication. Twenty years later, when I first encountered Database Normalization, I guess it's fair to say that I was already primed to "get it".

And for that I will always be grateful to the late Will Pitkin and the other members of my thesis committee, John E. Lackstrom and Karl J. Krahnke.




=====================
*I used the spelling "blocs" because much of the thinking behind this approach came from a PhD Dissertation written by one of my professors, Dr. Willis J. Pitkin, who used that term in his work.