Common Mistakes to Avoid in Your Database Design

Technician in the sever room

Erik Isakson / Getty Images

Whether you are working with a database that holds hundreds of records or millions of records, proper database design is always important. Not only will it make retrieving the information much easier, but it will also simplify expanding the database in the future. Unfortunately, it's easy to fall into a few traps that can make things difficult in the future.

There are entire books written on the subject of normalizing a database, but if you simply avoid the common mistakes shown here, you will be on the right track to good database design.

Repeating Fields in a Table

A basic rule of thumb for good database design is to recognize repeating data and to put those repeating columns in their own table. Repeating fields in a table is common for those who have come from the world of spreadsheets, but while spreadsheets tend to be flat by design, databases should be relational. It's like going from 2D to 3D.

Luckily, repetitive fields are usually easy to spot. Just take a look at this table:

OrderID Product1 Product2 Product3
1 Teddy Bears Jelly Beans
2 Jelly Beans

What happens when an order contains four products? We would need to add another field to the table to support more than three products. And if we've built a client application around the table to help us input data, we may need to modify it with the new product field. And how do we find all of the orders with Jellybeans in the order? We would be forced to query every product field in the table with an SQL statement that might look like: SELECT * FROM Products WHERE Product1='Jelly Beans' OR Product2='Jelly Beans' OR Product3='Jelly Beans'.

Instead of having a single table that stuffs all the information together, we should have three tables that each hold a distinct piece of information. In this example, we would want an Orders table with information about the order itself, a Products table with all of our products and a ProductOrders tablet that linked products to the order.

OrderID CustomerID Order Date Total
1 7 1/24/17 19.99
2 9 1/25/17 24.99
ProductID Product Count
1 Teddy Bears 1
2 Jelly Beans 100
ProductOrderID ProductID OrderID
101 1 1
102 2 1

Notice how each table has its own unique ID field. This is the primary key. We link tables by using a primary key value as a foreign key in another table.

Embedding a Table in a Table 

This is another common mistake, but it doesn't always stand out quite as much as repetitive fields. When designing a database, you want to make sure all of the data in a table relates to itself. It's like that child's game about spotting what is different. If you have a banana, a strawberry, a peach, and a television set, the television set probably belongs somewhere else.

Along the same lines, if you have a table of salespeople, all of the information in that table should relate specifically to that salesperson. Any extra information that isn't unique to that salesperson may belong somewhere else in your database. 

SalesID First Last Address PhoneNumber Office OfficeNumber
1 Sam Elliot 118 Main St, Austin, TX (215) 555-5858 Austin Downtown (212) 421-2412
2 Alice Smith 504 2nd Street, New York, NY (211) 122-1821 New York (East) (211) 855-4541
3 Joe Parish 428 Aker St, Austin, TX (215) 545-5545 Austin Downtown (212) 421-2412

While this table might look like it is all related to the individual salesperson, it actually has a table embedded within the table. Notice how the Office and OfficeNumber repeat with "Austin Downtown". What if an office phone number changes? You would need to update a whole set of data for one single piece of information changing, which is never a good thing. These fields should be moved to their own table.

SalesID First Last Address PhoneNumber OfficeID
1 Sam Elliot 118 Main St, Austin, TX (215) 555-5858 1
2 Alice Smith 504 2nd Street, New York, NY (211) 122-1821 2
3 Joe Parish 428 Aker St, Austin, TX (215) 545-5545 1
OfficeID Office OfficeNumber
1 Austin Downtown (212) 421-2412
2 New York (East) (211) 855-4541

This type of design also gives you the ability to add additional information to the Office table without creating a nightmare of clutter in the salesperson table. Imagine how much work it would be to simply keep track of the street address, city, state, and zip code if all of that information was in the salesperson table!

Putting Two or More Pieces of Information Into a Single Field

Embedding the office information into the sales person table wasn't the only problem with that database. The address field contained three pieces of information: the street address, the city, and the state. Each field in the database should only contain one single piece of information. When you have multiple pieces of information in a single field, it can become harder to query the database for information.

For example, what if we wanted to run a query on all salespeople from Austin? We would need to search within the address field, which is not only inefficient but can return bad information. After all, what happens if someone lived on Austin street in Portland, Oregon? 

Here's what the table should look like:

SalesID First Last Address1 Address2 City State Zip Phone
1 Sam Elliot 118 Main St Austin TX 78720 2155555858
2 Alice Smith 504 2nd St New York NY 10022 2111221821
3 Joe Parish 428 Aker St Apt 304 Austin TX 78716 2155455545

There are a couple of things to note here. First, "Address1" and "Address2" would seem to fall under the repetitive fields mistake. However, in this case, they are referring to separate pieces of data that relate directly to the salesperson rather than a repeating group of data that should go in its own table.

Also, as a bonus mistake to avoid, notice how the formatting for the phone number has been stripped out of the table. You should avoid storing the format of fields when at all possible. In the case of phone numbers, there are multiple ways people write a phone number: 215-555-5858 or (215) 555-5858. This would make searching for a salesperson by their phone number or doing a search of salespeople in the same area code more difficult.

Not Using a Correct Primary Key

In most instances, you will want to use an automatically incrementing number or some other generated number or alphanumeric for your primary key. You should avoid using any actual information for the primary key even if it sounds like it would make a good identifier.

For example, we each have our own individual social security number, so using the social security number for an employee database might sound like a good idea. But while rare, it's possible for even a social security number to change, and we never want our primary key to change.

And that is the problem with using actual information as a key value. It can change.

Not Using a Naming Convention

This might not sound like a big deal when you first get started designing your database, but once you get to the point of writing queries against the database to retrieve information, having a naming convention will help as you memorize field names. Just imagine how much more difficult that process would be if names were stored as FirstName, LastName in one table and first_name, last_name in another table.

The two most popular naming conventions are capitalizing the first letter of every word in the field or separating words using an underscore. You may also see some developers capitalizing the first letter of every word except the first word: firstName, lastName.

You will also want to decide on using singular table names or plural table names. Is it an Order table or an Orders table? Is it a Customer table or Customers table? Again, you don't want to be stuck with an Order table and a Customers table.

The naming convention you choose isn't as important as the process of actually choosing and sticking to a naming convention.

Improper Indexing

Indexing is one of the hardest things to get right, especially for those new at database design. All primary keys and foreign keys should be indexed. These are what link tables together, so without an index, you will see very poor performance out of your database.

But what are too often missed are the other fields. These are the "WHERE" fields. If you are often going to narrow your search by using a field in a WHERE clause, you want to think about putting an index on that field. However, you don't want to overly index the table, which can also hurt performance.

How to decide? This is part of the art of database design. There are no hard limits on how many indexes you should put on a table. Primarily, you want to index any field that is frequently used in a WHERE clause.