Sunday, March 20, 2011

No, Excel is NOT a Database

So, what's got my dander up today? I’m glad you asked.

Excel is NOT a database. You can't import Excel spreadsheets into Access and call that a database. You can't create "spreadsheets" directly in Access and call that a database. I wish more people understood that. It would make my blood pressure go down, probably down a lot.

Hold off on the Pitchforks and Torches, Please

Excel is a great tool for data analysis and for charting. I love it for those purposes. When it comes time to create a nice presentation with charts or summaries or pivot tables, exporting your data into Excel is one of those "no-brainers"; there’s no better tool for those jobs.

But Excel spreadsheets are simply not set up to be tables in a relational database. More importantly, wishing it were that way simply can't make it so. Sorry.

I've been posting at Utter Access (http://www.utteraccess.com/) for years. I love getting questions from new users. New users’ questions are usually right in my wheel house, to use a cliché. Recently, though, I've seen a large number of posts where a new poster is trying to figure out to make a spreadsheet work in Access. I'm less excited about posts like that because they often lead to trouble.

Many new Access users have been using Excel successfully for a long time. Eventually, though, many of them realize Excel just doesn't quite do the full job and they move on to Access. Unfortunately, though, they often try to drag everything bad about Excel along with them. Trying to help them sort it all out can be frustrating and annoying for the person trying to explain how to do it “the right way”. And all too often, the questioner “just doesn’t get it” and feels like Utter Access members—like me—are picking on them for “not following the rules”. Rules that seem arbitrary and even capricious to the neophyte. That’s no fun for anyone. I wish it didn’t happen as often as it does.


Welcome to My World, Won't You Come On In

In my opinion, we have three primary reasons for striving to create more Normalized table designs:

1. Improve the integrity of our data.
2. Reduce ambiguity in data values.
3. Reduce redundancy in data storage.

Data integrity is better served by building a properly structured relational database. Inserts, updates and deletes must follow the rules of the tables and relations defined for the database. It’s always possible to make bad data entries, of course, but that’s different from allowing users to put essentially any value, of any kind, in any cell, anywhere in a spreadsheet. Talk about lack of integrity! Properly designed tables are, perhaps, a relative advantage, but it’s an important one.

Ambiguity creeps into a database when a value in a field (cell in a spreadsheet) can be interpreted in more than one way. Spreadsheets make it easy to create ambiguous data entries. Properly designed tables make it harder to do. Again, a relative, but important, advantage.

Redundancy is practically guaranteed in most spreadsheets; it’s inherent in the format of a “flat” sheet of rows and columns. Properly designed tables can eliminate redundancy.

Three Coins in a Fountain

So, why do people insist on retaining their spreadsheet view of the world when they move to Access?

I think there are three lines of thought, each seductive but completely illusive.

  • “I can keep everything in one table. That’s so much simpler.”
Well, yes, a single table does LOOK simpler—until you actually try to use it for something. Everything from data entry to reporting requires needlessly convoluted queries and forms to put the “right” data in the right places and then more complexity when you try to get it back out. Sorting, filtering and aggregating data is often quite out of the question with a flat, Excel type table. In other words, the very reasons most people want to move data into Access are foiled by the “ONE BIG TABLE” view of data.

  • “Normalization is too complicated. I just want to get something going.”
Proper table design takes more work, to be sure. You have to analyze your data, identify the ENTITIES of importance to the business purpose behind the database, list all of the relevant ATTRIBUTES of those ENTITIES, and finally, figure out how those ENTITIES RELATE to one another. Effort and understanding are required.

In previous blogs entries, I’ve argued that designing a database involves using the Language of Relational Databases to describe your data. ENTITIES, the things of importance to the database, are the NOUNS. ATTRIBUTES, the characteristics of importance for each ENTITY, are the ADJECTIVES. RELATIONSHIPS between ENTITIES are the VERBS. Together, they describe the flow of information into and out of the database in meaningful ways. You can "talk" about the data in an intelligent way.

  • “My data (or the reason for collecting this data) is unique. It doesn’t fit the 'Normalization' process.”
Wrong, wrong, wrong. Unless the only purpose for putting data into such tables is to keep it safe until you come back for it after you have a snack and a nice nap, this one just doesn’t hold water. It’s the difference between a “big ol’ barrel of scared King Crabs” and dinner at Chandler's Crab House in Seattle. (Your data is "raw" vs. being "cooked". [insert your bad pun groan here])

Let's Give Them Something to Talk About

I have no lock on the concepts of table design, nor the theory and practice of Normalization. I think I understand it all pretty well, of course, or I wouldn’t even try to talk about it. Still, I recognize that there are many others who know it better than me. Therefore, I’d like to solicit your input. If you read this blog post, and feel like you have something to contribute—a comment, a criticism, an addition, a correction, an example or just a kibitz—please feel free to do so. I’d like to think we can make the world a safer place to new database developers if we can dispel some of the misconceptions about spreadsheets in Access.