One of the most difficult tasks for any new database manager is designing tables, schemes, and stored procedures. Database design is usually a one-shot deal – whether the design is good or bad, you're stuck with it. Redesigning a database is not an option for most large businesses. You need to get it right the first time. The problem is that most people don't learn pitfalls until it's too late. This article helps you understand what to do, and what not to do.
What Not to Do
Most programming can be trial and error for developers, but database design is critical for application success. It determines if your application is scalable, if it will perform well, and if your data is secure. So much integrity and success relies on database design that you don't want to disregard its importance.
If you're faced with designing a database, let's first take a look at what you shouldn't do.
First, as we mentioned, don't disregard performance. Most of your application's performance relies on a good database design. As a new designer, you might not realize that your application can be crippled by poor database development. The MySQL Workbench has a performance dashboard, performance reports and query statistics. You'll need these tools to identify any poor performance due to database design and query programming.
Second, you shouldn't just assign any value type to a table column. For instance, if you know that you only need a number from one to ten, you shouldn't just use a BIGINT data type. You shouldn't assign a VARCHAR data type to a column that holds only integer values. Assigning data types to column values define the efficiency and performance of your tables.
One common mistake that new MySQL designers make is the lack of primary and foreign keys. Primary keys are unique values assigned to each record. Think of a primary key as the unique way to find a record. MySQL lets you create auto-incrementing numeric keys, or you can choose your own. For instance, your car's VIN number is a unique key that could be the primary key for a Car table.
Another mistake is forgetting a foreign key. MySQL is a relational database, which means tables are linked together using primary and foreign keys. A foreign key is basically a copy of the primary key in a secondary table. For instance, you have a list of customers. You create an auto-incrementing numeric value as the customer identification and primary key. You create a second table called Order. How do you link the Customer with the associated Order? You place the customer ID in the Order table. The customer ID column in the Order table is the foreign key, and you use the primary-foreign key combination regularly in your queries.
Don't forget to focus on data quality and integrity. You shouldn't have tables designed to store the same information in multiple locations. You don't want to have an application that updates data in one place, but then the developer forgets to update it in the second and third place. Data integrity suffers, and your application starts returning incorrect information. The result can be devastating if your application calculates revenue, marketing analysis or customer information. You should design tables based on data integrity and preserving quality. Databases follow the old programming rule "garbage in, garbage out."
Backups are often forgotten when a company first starts. Backups are critical if your database fails. While you can probably get away with avoiding backups in the beginning, you might need them weeks, months or years in the future. A good backup system will save you from losing all of your data, which usually translates to a loss in customers and sales.
It might seem counterintuitive, but you also want to avoid deleting data. Instead of deleting data, activate and deactivate it. You can also choose to archive it and update a record with changes. Deleting data can lead to valuable data being lost. If you don't have relationships set up, you risk creating orphaned records.
Finally, always consider security as a prime importance in your database design. Don't use root for applications, and restrict access to critical tables. Authentication and authorization are important implementations when protecting your data.
The Database Design Process
Now that you know what not to do, it's time to flesh out your database design. You first need to break down your ideas and application into compartments. We used a Customer and Order tables as examples in the previous section. When you first start designing databases, it can be difficult to understand how to break down an application into components that match database tables.
An easy way to start is to separate your application into screens. Each screen usually displays a part of your application that can be designed into a table. For instance, suppose you have an ecommerce store. You have a products page, a customer profile page, a place where customers can view orders, and perhaps a returns page. All of these pages could be designed into tables.
After you figure out your application components, draw out your tables. You can use database tools that help you design tables, or you can draw tables on paper. Some designers draw the design on a whiteboard. Using our ecommerce example, you'd draw out a square (table) for each page. Your table names would be Customer, Order, Return, and Product.
If you work on larger applications, you might have business requirements that you need to follow. Some developers provide a simple technical document that you then turn into a database design.
Something to remember when you create your tables is that data should only be recorded once. This is important for data integrity. We mentioned in the previous section on "what not to do" that storing data in multiple locations harms your data integrity. For instance, let's say you have a Customer table and store an address, but you also store the address in the Order table. You duplicate the location of the customer's address and risk accidentally updating it in one table but not the other. You also put an unnecessary load on the database, because it needs to update multiple tables for each address update.
Your tables should also make sense for your application. Each component of the application should match up with one or more tables. You'll later learn how to query this information, but your tables should be logically linked to application components. Remember that database tables are the back-end power for your applications.
After you determine the tables that you need, it's time to get more granular. Remember, a table can be visualized as a spreadsheet. Each table would be your spreadsheet, but now you need to set up your columns. Columns are sometimes referred to as "fields" in database design. These fields determine what values are stored in the table. Just like tables, field names should make logical sense when you determine data for each field.
For instance, you probably set up a Customer table. The Customer table holds the customer's first name, last name, address, phone number, and zip code. Each of these values would be a field or column in your table.
As you determine fields for your tables, you also need to consider relationships. MySQL is a relational database, which means that tables are linked using foreign keys. Foreign keys are fields that hold the relational data. For instance, suppose you have a Customer and Order table. You need a way to link these two tables. The Order table would contain the Customer ID for each order record.
In the "what not to do" section, we mentioned primary keys. A primary key is a unique ID for the record. MySQL has an auto-incrementing numerical column option that lets you create a unique key automatically with each record, but many database designers are against this type of design. Instead, they prefer designers to use composite keys if there is not one unique column. A social security number is a unique ID, but not every company needs to store a social security. Composite keys combine at least two columns and use the combination as a unique primary key.
New designers sometimes get confused with which table should contain the foreign key. You might make the mistake of putting the order ID in the customer record, but think about data integrity and the number of customer records you would have versus order records. You only need one customer record, but a customer can have several orders. If you put the order ID in the customer table, you'd need several customer records to account for each order. This means you'd duplicate the data, and database design must focus on storing data once.
This type of relationship is called a one-to-many relationship. You have one customer record, and then a customer can have many orders. This is how you think about table design. Each order can have several products, but only one order can have one return. When there is only one record that can only have a relationship with one other record in another table, this is called a one-to-one relationship.
There are three types of relationships you'll need to know:
Many-to-many relationships are when database design gets tricky. This relationship introduces the concept of normalization. Normalization is the process of structuring your table design and relationships to limit data duplication. It ensures that data is only stored once between tables, but in our example, an Order can be associated with many products, and products can be associated with many orders. This is a many-to-many relationship.
Many-to-many relationships use an intermediate container table. Let's use the Order and Product table. Since this is a many-to-many relationship, you want to normalize the information. You create an intermediate table that contains the primary key from the Order table, and then the primary key from the Product table. Instead of duplicating your data in the Order and Product table, you use this intermediate table to link records and keep data from being duplicated.
Normalization is a difficult concept to master until you start designing tables. When you start to understand the way data is stored and queried, you will better understand normalization. Normalization has several sections of rules. If your database design follows the first set of rules, it's said to be first normal form. If it follows section two, it's said to be second normal form.
Here are normalization rules.
First normal form
Eliminate repeat information
Create separate tables for relational data
Use primary keys to identify each record
Second normal form
Create separate tables for each set of related data. This is where many-to-many relationships are normalized.
Use foreign keys to link related tables
Third normal form
Eliminate fields that don't relate to the primary key
There are other normalization forms, but the first three are the main focus for new database designers.
After you have your data tables normalized, it's time to decide on data types. A data type determines the value that stores in a field. Relational databases put restrictions on the type of data you can store when you define a data type. For instance, you can't store a string "Mark" in a field that's marked as a numeric value. Dates are stored in date data types, and decimals store decimal values. If you attempt to store the wrong data type in a field, the MySQL engine returns an error. This can create bugs in your application, so you must map out the right data types for your fields.
The final note on table design is sticking to a standard naming convention. This makes it easier to query tables. Some database designers use underscores for spaces, and others use capital letters for each word instead of underscores. For instance, you could name the customer identification record "CustomerId" or "Customer_Id" or "customer_id." MySQL is not case sensitive with columns, so using capital letters doesn't matter when you create queries. If you use underscores, multi-case naming schemes, or use some kind of other standard, just make sure it stays uniform throughout the entire design. Standard naming conventions also make it easier for a second designer to understand your design without reading any documentation.
One final note on design: remember that MySQL is a dynamic database solution. You'll need user variables to capture user data and use it to insert or manipulate data changes. These user variables can be lowercase or all upper case. Standards in the industry ask that user variables are lowercase. Each variable has the @ symbol as its prefix.
You'll also need views. Views help improve data performance by querying large sets of data and returning them for your queries to use. You can use views in the same way that you use a table.
Views, user variables, table design, constraints, and several other factors should be designed into your database infrastructure.
Data modeling and table design are sometimes used interchangeably. Data modeling ties in with software development. Data modeling uses images to represent database designs, and it helps give designers a shorthand way to represent the way a table relates to other tables in the database.
Several data modeling tools are available online. Even developer tools such as Visual Studio have data modeling tools. These tools let you design tables as objects. If you're familiar with programming, an object is a class that represents a component in your application. This is where data modeling ties in with application development. For instance, your Customer table object is then mapped to a Customer class object in the application. Developers are then able to link methods and properties with your tables.
Data modeling extends from design to overall database relationship and security. Remember that a database is more than just a table design. It also includes security, views, stored procedures, and functions. You also need to index your columns, which helps the database order records for faster queries.
Stored procedures must be modeled properly to query data within a very small amount of time. Think of Facebook. This enormous social network uses MySQL for queries. You're able to find people on Facebook using their search tools. When you enter a name into Facebook's search, you're given several records in what seems like an instant result set. Some people think that it's from Facebook's massive data centers, which does factor in to quick results. However, Facebook's quick queries are also a result of a good database design and model. They have the right table design, security, stored procedures, views and functions that keep data integrity and speed up performance of their application.
Even after you create a design and model, you still need to fine tune the layout. This is where data modeling tools come in handy. They let you delete and redesign a table, insert another table and re-link relationships without using paper and pen. These tools also let you export data models into classes. These classes are then used by application developers when they integrate tables into the data layer. Data model tools also create scripts that make it easier to import the model into your database without manually creating tables. In other words, they save you and your developer time.
Database design is a skill that's learned after practice. This chapter focused on table design, but once you have your design finished, you will spend most of your time creating querying components. This includes building stored procedures with simple and advanced concepts. Put tables and programming together, and you have a fully functional database that you can use with your application.