Sunday, March 23, 2008

Normalization for Normal People, Part I--The "Language" of the Database

In my last post, I promised to explain my approach to the process of normalization. Today, I'll start that explanation, but before we dive into the normalization process itself, it might be helpful to consider a metaphor about the language, or perhaps, the meta-language, of the database. As I present my view of the concepts of normalization, I'll return to this language metaphor from time to time as a way of relating those concepts to you in a more accessible way.

Think of it this way, a database is a tool with which you can talk about the information important to your organization. With this database tool, you can talk about the history of your organization, describe the processes it uses to complete its work, and explain the results of that work. To do all that, you need a language--that's what I mean by the language of the database, and here's how I think it works.

A good data model language is similar to, and based on, natural human language. Human languages are composed of, and built up from, nouns, verbs, adjectives, conjunctions, and prepositions.

Unlike natural human language, though, the language of the database really only requires three components: Nouns, Adjectives and Verbs.

This week I'll talk a little bit about the nouns. In future blogs, I'll touch on adjectives and verbs.

Nouns or "Entities"

Entities, which we'll discuss in more detail in a later blog entry, are the “Nouns” of this database language. Nouns are important because, when you start recording the names of things you want to track in a database, you are listing the nouns in your data model language.

Database folk refer to these "nouns as "entities". You'll see that term a lot in formal discussions. Just remember that, in order to talk about anything of importance to your organization, you need to know its name. If you don't already have a name for all of those things, you'll need to define one before you can begin to create a database to talk about it.

Later, when you create tables in your relational database, you’ll need one table for each entity you want to track. Tables, of course, are named for the entity they contain. And that's why I call them the nouns of the database language.

Ambiguity--the enemy of all databases
In a natural language like English, of course, some nouns are more ambiguous than others. Moreover, humans have the unique ability to select words and construct them into phrases so as to foster ambiguity. We tolerate, and even encourage, ambiguity in natural languages for a number of reasons. It can be downright beneficial to leave the exact meaning of words open to interpretation. Politicians, poets, and comedians, in particular, all take advantage of that fact.

However, ambiguity is one of the great enemies of well-designed databases. Therefore, the first step in creating a well-designed database is to make sure--and I mean very, very sure--that the nouns you choose to describe the things of importance to your organization are as concrete, precise and unambiguous as possible. These nouns, or entities, can refer to one and only one thing and they can not mean different things to different people at different times or in different circumstances.

The practical application of that fact is in two parts. First, when you create a table to house those entities, you must be able to assign every entity you encounter in your workflow uniquely to the ONE table in which it belongs. Conversely, every entity recorded into a table must be clearly identifiable as an example of that kind of thing.

There should never be a case when the same entity gets stored in two or more tables in your database. Save layered meanings for your poetry and double-entendres for your jokes.

Consider a very trivial and obvious example, a database is built to track assignment of company pool cars to company employees. In this database, you have two entities of importance to track: Employees and Cars. We all know what an employees is and what a car is. The nouns you chose to describe them, and the tables into which they are inserted, are easily defined and created. You need a table of Employees and a table of Cars. The names you give to these tables, Employee and Car, clearly and unambiguously identify each. No one using this database would be tempted, I hope, to record “John Smith” in the car table, or “Ford Focus” in the Employee table.

As the workflows you wish to describe become more complex, the language you need to describe them also becomes more complex, of course. Nonetheless, it is crucial to your success that you always start by naming, or defining the names of, everything single entity of importance so that you can assign one and only one name to everything in your database.

With the nouns of your database language in place, you're ready to start choosing adjectives to describe them.

Next Up--the Adjectives of the database language

Next time, I'll tell you how I look at the adjectives of the database language. I'll tell you why I think they are just as important as the nouns, and why the danger of ambiguity is just as great when selecting those adjectives.