Wednesday, March 30, 2016

One Size Fits All, Or All For One. NOT!


One of the most valuable aspects of Access Database development is that it is almost entirely custom.
That means nearly every new Access application is different in some non-trivial way from every one that preceded it--ever.

Need a database to track family addresses and phone numbers? Great. You'll find templates all over for that. But no two will be exactly alike and that's a good thing, IMO.

However, for a lot of new Access developers, that turns out to be a stumbling block to getting off the ground. If all you want to do is copy out a template and go to work, it's almost never possible to do so. Naming conventions, at the very least, are those chosen by the template designer. They may or may not be the same as those used in your organization.

And that’s just the trivial part.

You want to track one or more phone numbers for each contact. No problem, that’s a job for a related table of Phone Numbers. But what if the template maker decided every contact can have three Phone Numbers (Home, Work and Mobile)? That’s a limitation you’ll have to accept as a compromise. Or, you can modify the template.

At some point, such trade-offs and compromises become more of a hindrance to getting on with the task than simply starting out from scratch with your own design. And the more complex the business process, the more likely it will be that any template you find won’t stretch to fit it.

All of that can be summed up in the saying which is the title of this post. One Size Does Not Fit All.

You’ve been warned.

Wednesday, March 16, 2016

It's all about the data.


That's so obvious it seems silly to even comment on it in a page that's all about database design, doesn't it? But, every day, while addressing Access questions on my favorite Access forum, Utter Access, I run across questions can only be answered by looking at the specific data involved, not at the code used to manipulate it.

A recent example might help explain what I’m talking about. I’ve rephrased the question so as to avoid making it too easy to identify the source.

“My query raises a division by zero error. The query includes two calculated fields. The data is from three subqueries. The SQL for the final is shown below. How do I avoid the division by zero issue?”
Not picking on anyone, but the basic mathematician in me says “this ain’t a database question, it’s a math question. You avoid division by zero by not including zeros in the divisor.” But to the questioner, that simply hadn’t occurred, I guess. He or she was looking for “an Access solution” to the math problem.

There are, of course, two answers to this.

First, if records in the underlying tables have either Nulls or zeroes in one of the fields going into the calculation, then those values have to be resolved before you even START writing queries against that data. Exclude those records from the selection before you try to do any math on the remaining records.

There is, also in that sense, “an Access solution” to the question, "How do you handle Nulls and Zeroes in SQL so that they don’t blow up calculations?" The answers, of course, involve functions to convert Nulls and zeroes, as needed, when they appear. But that’s not the point of today’s comments.

Second, there is an even earlier, more fundamental, question to be addressed. If you are doing math (and a division by zero  error definitely signals that math is being attempted), then you have to decide whether Null is valid for the data at hand, and whether Zero is valid for the data at hand. If so, then why do you then want to do math on either of those values, knowing that your calculation is not valid under any circumstances?

For example, if you want to calculate the average number of days between the date an order is placed and the date it is shipped, you have to decide, right at the beginning, if you want to try to include orders without a ship date. Those orders are still being processed and not yet shipped. I submit the answer to that one is obvious. You must exclude them because it’s only meaningful to ask about average processing days for orders that were actually processed and shipped. So, by the time you write that SQL with a division in it, there are no nulls to fret over.

Another example would have to do with calculating error rates in a manufacturing process. Let’s say you want to calculate a percentage of errors detected in Quality Control versus errors reported by customers after products are delivered. Unfortunately, if the calculation is QA Errors divided by Customer Reported Errors, it’s entirely possible (and one would have to hope, likely) to have one or more errors found in QA and Zero reported by customers for a particular product. See what’s going to happen there? Yup, a divide by Zero error.

The answer to that one is equally obvious, I think. You don’t do the arithmetic that way in the first place. You come up with a more appropriate, mathematically valid, way to calculate this metric. And how you do that is as much a business rule as it is a math problem. Maybe the next approach might be, well, okay, lets SUM the QA and Customer Reported errors first and then divide the QA errors by that total to get a ratio of QA errors. Uh uh! If there are no errors of either sort, then you’re still dividing by Zero.

At this point, I will step in and acknowledge that there is an Access way to handle it, but you can only get there by understanding the math—and the business rule—behind it.

It's all about understanding your data first. After that, it's all just code.