Posted On: Wednesday, February 28th, 2007 (Databases)
Posted by: Paul Lefebvre

In this article we will discuss some basic database design skills such as how to identify entities, attributes, and relationships. We will also learn what it means to normalize a database. Let’s get started!

Database Analysis

This is the first step when designing a relational database. In the analysis phase, we are looking to identify entities and attributes. When the database is eventually built, the entities generally become tables and their attributes become columns within the tables.

In our last article we quickly created tables to store the schedule for our local Softball League. In this column we will step back and consider the initial design of these tables. To refresh you memory, here is the original problem:

We want to store the schedule for our local Softball League that consists of 8 teams, each with a coach. Each team plays 10 games. The teams are the Ducks, Penguins, Bears, Tigers, Lions, Wolves, Dolphins, and Marlins.

Entities are typically considered things, concepts or objects that contain some descriptive information. Here are some definitions to keep in mind as we work through the design:

  • Entities are typically considered things, concepts or objects that contain some descriptive information
  • Attributes help to describe entities
  • Relationships describe associations between entities

Now grab a pencil and some paper. We’re going to draw out our design for the Softball problem. As I re-read the problem, I see that team is referenced multiple times with specific details that, based on our definition above, suggest to me it is an entity. To indicate this, draw a small box and name it team. Notice that the team has a coach. Since that is describing one of our entities, we should save it as an attribute. Write it inside the box you drew for team. It is also helpful to identify potential data types of attributes, so mark that coach is a string. There are 8 teams listed, so it looks to me like the team name is another attribute; add that to the team entity. It is helpful to know when an attribute is unique because this can help identify a primary key. I think it makes sense to assume that team name is unique. Identify unique attributes by prefixing them with a “U”.

The Softball problem also indicates that we need to store a schedule and that the teams play games. Knowing a little bit about softball, I know that a schedule consists of games, so let’s create another entity and call it game. At this time, I don’t see any specific attributes for games, but I think that there could be some. We’ll get back to that in a moment. One thing that we all know about softball games is that there must be two teams in order to play a game: a home team and an away team. This is not specifically stated in the problem, but it is implied. Now, don’t start writing home team and away team as attributes because they are not. They are actually describing an association to another entity (in this case team). From our definitions above, we know that associations such as these are considered relationships. To indicate this in our drawing, draw two lines between game and team. Name one line away team and the other line home team.

We also want to indicate something about the nature of the relationships, specifically their multiplicity. Multiplicity describes how many of each entity is related to a single instance of another entity. In the case of these two relationships we know that one team may play 10 games. This is called a 1 to many relationship. This is what our drawing looks like up to this point:

Databases 102: Figure 1

I’m still thinking about other attributes for games. To me this is indicating an incomplete problem description. We need to follow-up with a qualifying question:

“Is there any information about games that you would like to capture?”

An answer may be:

“Yes. We play on several fields, so I need to know which one is being use for each game and whether it has lights. I also need to store the date and time for each game.”

Now I think I see some attributes for game: field, lights, date, and time. Add these to the game entity. Lastly, for the game entity we need to identify uniqueness. For this example it looks to me like the combination of field, date and time is unique.

This is what our drawing looks like now:
Databases 102: Figure 2

Normalization

Normalization describes techniques that can be applied to the entities and attributes in order to eliminate redundant data. There are 3 common steps to normalization.

First Normal Form (1NF) - Eliminate Repeating Groups

Our example does not have any repeating groups. If we had instead created our example as a single entity with attributes for team name, team coach, game 1, game 2, game 3, game 4, etc. then we would be violating 1NF because the game information would be repeated. This rule is commonly broken by people used to working with spreadsheets. Remember, Excel is not a database:

Databases 102: Figure 3

Second Normal Form (2NF) - Eliminate Redundant Data

Redundant (or duplicate) data is best avoided. It can make adding, deleting and changing data in the database much more complicated than necessary. Sometimes additional entities are used to achieve 2NF. In our example, we run the risk of violating 2NF because the team name is the unique identifier for the team entity. We don’t have duplicate data now, but we could when we map these entities to tables because we would have the team name in both the team and game tables. We’ll look at this more closely when we do the entity to table mapping.

Third Normal Form (3NF) - Eliminate columns not dependent on Unique ID

The game entity has a problem: field and lights really are not dependent on the game itself. If we add lights to a field we don’t want to update all the games. If we were to start tracking the field address we don’t want to store that along with the game information. To solve this problem we need another entity (location) and a relationship. This is our final Entity/Relationship diagram:

Databases 102: Figure 4

Applying normalization rules frequently results in additional entities. For design purposes you should strive to reach third normal form. However, as you implement your design you may find situations where you knowingly need to violate some of these rules. In my experience the most frequent reasons for violating normalization rules are to achieve extra retrieval performance or to simplify reporting.

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

This entry was posted on Wednesday, February 28th, 2007 at 10:25 am and is filed under Databases. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Leave a Reply