Sunday, November 2, 2008

Normalization for Normal People -- How Much is That Doggie in the WIndow?

Many, perhaps even most, databases involve tracking one or more kinds of transactions. That's because, in a general sense, a transaction is more than the exchange of goods for money. Purchases, of course, are transactions, and we can start our investigation of transactions by looking at the components of a purchase as a way to get to an understanding of how transactions work and how the database must be designed to handle them. Next time I'll try to expand that to non-sales transactions.
I Do Hope that Doggie's For Sale
What do I have to do to buy that dog? And, when we go to record the sale in our database, what attributes of the purchase do we need to record?

When we are talking about purchase transactions, the answer is we need at least six attributes:

Product or Service

Sale Price
Sale Date

The vendor, of course, is the entity who owns a product or who provides a service and wants to exchange it for money.
The purchaser is the entity who has money and wants to exchange it for a product or service.
The product or service is the entity passed from vendor to purchaser.
The Sale Price is the money passed from the purchaser to the vendor.
The Sale Date is the calendar Date (and time, in most cases) when this sale was completed.

These components are the minimal, invariant attributes required to fully describe a sales transaction. I can also envision other attributes that might be required by some businesses--the identity of the sales agent representing the seller, for example. Therefore, any analysis of a particular organization's sales transaction database requirements should start with these six. It should also include an evaluation of other potential attributes to the extent they are relevant to the organization's business.

Note also, as we defined the terms in a previous installment, these attributes are sequentially invariable. They don't change over time, they can't change. The last thing you'd ever want to do is change the details of a sale after it's consummated. Some folks who HAVE tried to do that have ended up doing hard time in the Big House, in fact.

From the BIg House to the Big Brick House on the Corner
That said, though, it is possible for one or more of these attributes to be concurrently variable. That is, you can have two or more values for some of the components of a sale. Let's just take the commonplace example of a home purchased by one couple, John and Mary Smith, from another couple, Jorge and Maria Gonzalez. Obviously, John and Mary are two people, not one. They participate equally in the purchase. Jorge and Maria are also two people. They participate equally in the sale.

Here's where the business rules that govern your organization come into play. Although the values for both the vendor and the purchaser are concurrently variable (there's more than one of each involved at the same time) you may choose to either record them as a single entity involved in the transaction or you may choose to record them as separate entities. I'm not going to go too far into the arcanity of real estate transactions because, among other things, I don't know how it works at that level of detail.

From the point of view of the database designer creating a tracking tool for the real estate agency, though, that point is crucial.

Why? Because if you decide to treat the buyer entity and the seller entity participating in a sale as concurrently variable (i.e., you want to track details on each separately) then you have to design accordingly, and that means a second, detail, table to handle them. Remember, any attribute (such as the identity of the buyer) which can have one or more values either sequentially or concurrently, has to have its own separate table. You can't put variable attribute values into the same table as the thing they describe.

On the other hand, if you decide to consider John and Mary, and Jorge and Maria, to be one entity and to track their identities as a single value, then you can do so by recording those attribute values (Seller Identity and Buyer Identity) as fields in the main transaction table.

Do You Want a Garage Door with That?
Okay, so both vendors and purchasers can either be invariable or variable attributes of a sale. The business rules that govern your organization determine how you handle them. But what about the product or service involved in the transaction? Well, the answer is the same as it usually is out there in the real world--it depends.

Are we talking about home purchases or visits to the local box store for "stuff" to maintain that new home?

Chances are pretty good that most residences are sold as a unit. I'm not aware of any situations where buying the garage on the property was considered an optional part of a sale. You buy the house -- the garage is part of the deal. In terms of recording the sale in a database, therefore, it's not likely you'll have to set up a variable attribute -- and create a separate table for it -- for the item sold.

However, if your business is about selling home improvement products, exactly the opposite business rule applies. Almost every sale involves multiple products. You get a hammer and the nails. You get a blue plastic tarp and some rope to tie it down. And by now, it should be completely obvious that a database designed for this business will have a sales detail table in which those variable attributes of the sale can be recorded.

The next two components of the sales transaction to consider include sales price and sales date. Not surprisingly, deciding whether these are invariable or variable attributes is anything but a simple decision, despite what we may want to think.

Let's Make a Deal
Let's consider the price. Obviously, once the transaction is recorded into the books, it can't be changed. But deciding WHAT price to record, and where to record it, can be, but is not always, simple. For example, if we're buying a house, we have a single price, negotiated and agreed to by the parties involved during the negotiations preceding the sale. That's basically an invariable attribute of that sale and gets recorded as part of it. No problems there.

But there are TWO factors that make other sales more complex.

One is existence of a "list price" of a product or service.
The other is the sale which involves multiple products or services in a single transaction.

Do You Remember the Cost in September?
As you may recall, one of the foundations of good table design is that we want to record things "one time in one place". The list price of a product or service definitely falls into that category of things. But it is equally true that list price can be, and almost always is, sequentially variable. That is, prices for products or services in a standard catalog of products or services change over time. I remember when the price of regular gas went over 99 cents a gallon. My father owned a gas station at the time, and the reader board out front with the current price only had room for two digits. He had to climb a ladder and tape the "1" in front of the price with duct tape until a more permanent solution could be implemented.

The point here is that the price for which you sell a product or service is often based on a list price, but that list price changes over time. And that means we CAN NOT rely on the principle of "one time in one place" for recording sales prices. Why? Because the sale took place on one date, at one time and the sales price was based on the list price AS OF that date and time. No matter what happens with the list price in the days, weeks, months or years following the sale, that sales price is always the same as it was on that day. We can't, therefore, look into the product table for the list price except at exact the time of the sale.

That is why I included sales price among the six components of a sales transaction,. Even when it's not negotiated as part of the sales process itself, it is a unique fact about the sale, separate from the list price which may have been the basis for it. And given that understanding, you can see that the table which records your sales transaction can--and almost always should--have a field for "Sales Price" in addition to the "List Price" in the product table.

What'll It Take to Get You Into this Fine Car Today?Now that you understand the nature of the Sales Price and List Price, the concept of the discounted or surcharged Sales Price should be completely obvious. These are merely special cases of the Sales Price/List Price distinction. The only thing that makes them stand out from other sales prices is that discounts or surcharges immediately differ from the list price even as they are entered into the sales table.
Round 'Em Up and Move 'Em OutI hope you're ahead of me on this one because by now, you should have a pretty good handle on the notions of sequential and concurrent varience that impact the way we track values in tables.
What do you do about Sales Price when you're selling (or buying) multiple products or services in a single sales transaction? I hope your answer is, "You move the sales prices out of the transaction table because there's more than one of them--they're concurrently variable, and besides they don't belong to the transaction, they belong to the product."
Yes? You said something like that? Good, we're making excellent progress. The fact of the matter is that each product or service has it's own sales price, and those have to be attached to that product or service in the table which stores them.
Something's Not Adding Up HereSo, if we move the individual Sales Prices on down into the detail table which tracks the products or services involved in the sales transaction, what about the total Sales Price for the entire transaction? Does it go in the main transaction table? Absolutely, positively not. Calculations don't belong in tables, transaction or otherwise. If for no other reason: "one time--one place". If you want to know the total of all the sales prices for a receipt, just add the calculation to the receipt itself.
Storing calculated values is the subject of a whole other blog. For now, all you really need to know is that it's generally not a very useful thing on which to spend your resources.
Thank You, Sir. Can I have Another?One last point (I promise, one) for now. I mentioned the six components of a sales transaction and listed quantity as one of them. At this point in the discussion, it shouldn't take much to see that the same considerations apply to this attribute as to the others we've discussed. This attribute either belongs in the sales transaction table itself--if the transaction is the kind that involves only one thing at a time--or it belongs in the detail table with the Products or Services that are components of the multi-part transaction.