Went Blog: Tips on Optimizing SQL Server Database Design

Bad logical database design results in bad physical database design, and generally results in poor database performance. So, if it is your responsibility to design a database from scratch, be sure you take the necessary time and effort to get the logical database design right. Once the logical design is right, then you also need to take the time to get the physical design right.
Both the logical and physical design must be right before you can expect to get good performance out of your database. If the logical design is not right before you begin the development of your application, it is too late after the application has been implemented to fix it. No amount of fast, expensive hardware can fix the poor performance caused by poor logical database design. [6.5, 7.0, 2000] Updated 6-27-2006
*****
One strong indicator of poor database design is the use of SELECT DISTINCT or UNION (not UNION ALL) in your queries. A properly designed database almost always negates the need for using either of these statements.
Besides this, using SELECT DISTINCT or UNION uses up a lot of resources, that could be better spent elsewhere speeding up your SQL Server. [7.0, 2000] Added 12-6-2005
*****
Following standard database normalization recommendations when designing OLTP databases can greatly maximize a database's performance. Here's why:
Helps to reduce the total amount of redundant data in the database. The less data there is, the less work SQL Server has to perform, speeding its performance.
Helps to reduce the use of NULLS in the database. The use of NULLs in a database can greatly reduce database performance, especially in WHERE clauses.
Helps to reduce the number of columns in tables, which means that more rows can fit on a single data page, which helps to boost SQL Server read performance.
Help to reduce the amount of Transact-SQL code that needs to be written to deal with non-normalized data. The less code there is, the less that has to run, speeding your application's performance.
Helps to maximize the use of clustered indexes, the most powerful and useful type of index available to SQL Server. The more data is separated into multiple tables because of normalization, the more clustered indexes become available to help speed up data access.
Helps to reduce the total number of indexes in your database. The less columns tables have, the less need there is for multiple indexes to retrieve it. And the fewer indexes there are, the less negative is the performance effect of INSERTs, UPDATES, and DELETES.
[6.5, 7.0, 2000] Updated 12-6-2005
*****
If normalizing your OLTP database forces you to create queries with many multiple joins (4 or more), you may want to consider denormalizing some of the tables in order to reduce the number of required joins. Denormalization is the process of selectively taking normalized tables and re-combining the data in them in order to reduce the number of joins needed them to produce the necessary query results. Sometimes the addition of a single column of redundant data to a table from another table can reduce a 4-way join into a 2-way join, significantly boosting performance by reducing the time it takes to perform the join.
While denormalization can boost join performance, it can also have negative effects. For example, by adding redundant data to tables, you risk the following problems:
More data means SQL Server has to read more data pages than otherwise needed, hurting performance.
Redundant data can lead to data anomalies and bad data.
In many cases, extra code will have to be written to keep redundant data in separate tables in synch, which adds to database overhead.
As you consider whether to denormalize a database to speed joins, be sure you first consider if you have the proper indexes on the tables to be joined. It is possible that your join performance problem is more of a problem with a lack of appropriate indexes that it is of joining too many tables.
Before you decide to denormalize a properly normalized database, be sure you thoroughly consider all of the implications and test performance both before and after you denormalize to see if your efforts have really bought you anything. [6.5, 7.0, 2000] Updated 12-6-2005
*****
When you are first designing your application, don't automatically assume you have to de-normalize your data. Your first goal in the design process should be to normalize your data. Next, you can test your design with realistic data and transactions. At this point, if you see that de-normalization will help, then by all means do so. But don't assume you need to de-normalize data until you can prove (through testing) that de-normalizing is the right thing to do. [6.5, 7.0, 2000] Updated 6-27-2006
*****
In SQL Server 6.5, when a null was stored in a fixed-length column, it is actually treated as a variable-length column. This was done to conserve space. But in SQL Server 7.0 and SQL 2000, this is no longer true. If a null is stored in a fixed-length column, then the entire column (even though it is null) still takes up the same amount of fixed space as established when the column was created. For example, a CHAR(100) field takes up 100 characters of storage, even if there is a null in the column.
This change has several implications. If you are upgrading a database from SQL Server 6.5 to 7.0 or 2000, then you may find that your database grows in size, sometimes a lot. The extra storage requirements require more data pages, which in turn reduces I/O performance. The same goes if you follow the same design style in your SQL Server 7.0 and SQL 2000 databases as you did in SQL Server 6.5.
To prevent the above "data explosion" problem, you will need to change your table design tactics. You will need to make your columns that hold many nulls variable-length instead of fixed-length, assuming the length of the column varies considerably. Of course, if the column length is not great, or if nulls to be stored are few, then using fixed-length data types will work fine. [6.5, 7.0, 2000] Updated 6-27-2006
*****
One option database designers have is to put all of their data for their application in a single large database, or to place their data in multiple smaller databases. Generally, if the data is highly similar and used the same way (e.g. the database is used only for OLTP, not OLAP), then it is better to use one large database rather than several smaller databases.
For example, let's say you have customers all over the world and you are designing an OLTP application. Except for the fact that your customers are spread throughout the world, the application works the same for all of them. Under this scenario, it would be more efficient to use one database instead of creating separate smaller databases for each country or region of the world. The biggest reason you would want one database instead of multiple databases is to take advantage of SQL Server's ability to cache data, such as the data used in lookup tables.
If you use multiple databases instead of a single database, then each database would have to cache the same lookup tables, over and over for each database. The same applies to common stored procedures used in each database. This would present an inefficient use of SQL Server's memory and could potentially reduce performance. [6.5, 7.0, 2000] Updated 6-27-2006
*****
If you are designing a database that potentially could be very large, holding millions or billions of rows, consider the option of horizontally partitioning your large tables. Horizontal partitioning divides what would typically be a single table into multiple tables, creating many smaller tables instead of a single, large table. The advantage of this is that is generally is much faster to query a single small table than a single large table.
For example, if you expect to add 10 million rows a year to a transaction table, after five years it will contain 50 million rows. In most cases, you may find that most queries (although not all) queries on the table will be for data from a single year. If this is the case, if you partition the table into a separate table for each year of transactions, then you can significantly reduce the overhead of the most common of queries.
Taking advantage of horizontal partitioning requires early planning as the various queries you develop for your application need to be designed to take advantage of your database's design. It is much easier to implement horizontal partitioning in the early design stage than it is to implement after your application has been put into production. [6.5, 7.0, 2000] Updated 6-27-2006
*****
When designing columns to store formatted data, such as Social Security numbers or phone numbers, you have two choices. You can choose to store the data with no formatting or with formatting. Each choice has its pros and cons. If you store the data with formatting, then your disk space requirements will be slightly higher than if you don't store the data with formatting. If you store the data without formatting, then you will have to add or remove formatting each time you INSERT or UPDATE a record, which adds to CPU and memory overhead on the server. In other words, the choice you make affects your server's load.
Which option you should choose in order to maximize performance is not always obvious, but many database designers choose to store formatted data because they feel the savings in CPU time and memory overhead (because formatting isn't performed on the fly) is more important than saving some space on the disk drive. You will have to closely examine your application to determine, in your particular case, which option will save your application the most overhead. [6.5, 7.0, 2000] Updated 7-24-2006
*****
If you expect your database to be accessed simultaneously by many different users, consider what would happen to your database's scalability and performance if you designed your database with just a few tables holding virtually all the data. For example, say you are designing a sales-based application that could potentially have over 10,000 new rows inserted into it every day. And also assume that as many as 1% of these rows would need to be updated or deleted daily. And also assume that as many as 100 people could be accessing the same table at the same time. Can you see the problem this could cause? As a rule of thumb, design your database to minimize potential user contention. This will both boost scalability and performance of your application. [6.5, 7.0, 2000] Updated 7-24-2006
*****
SQL Server 7.0 and 2000 data pages are 8K (8192 bytes) in size. Of the 8192 available bytes in each data page, only 8060 bytes are used to store a row. The rest is used for overhead.
So how does this information affect database design and the performance of your application? To optimize SQL Server performance, you should design your rows in such a way as to maximize the number of rows that can fit into a single data page. The more densely rows are stored in data pages, the less I/O SQL Server has to perform when reading data pages from disk, and the more rows it can squeeze into the buffer. The more rows that you can fit into the buffer increases the likelihood that the data you need is in the buffer and not on the disk, saving even more valuable I/O resources.
For example, if you have a row that was 4031 bytes long, then only one row could fit into a data page, and the remaining 4029 bytes left in the page would be empty. (Keep in mind that a row can be up to 8060 bytes, and half of this is 4030 bytes.) This is a great waste of space which can affect the I/O performance of the server.
If you run into a situation like this, try to redesign the row, if possible, so that the row is 4030 bytes or less. This way you would get two rows in each page and I/O performance would be greatly enhanced. This not only applies to this particular example. It can also apply to cases where you are trying to fit three, four, or more records efficiently into a single data page. [7.0, 2000] Updated 7-24-2006
*****
By default, TEXT, NTEXT, and IMAGE data is stored separately from the rest of the data in a table. The table itself (in the appropriate columns) contains a 16-bit pointer that points to separate data pages that contain the TEXT, NTEXT, or IMAGE data. The reason this is done is to enhance performance.
For example, if these datatypes hold large quantities of data, and they were stored in a table with the rest of the data, this would require SQL Server to work harder when accessing the table. The larger the table, the more I/O SQL Server has to perform in order to accomplish its tasks.
But if the TEXT, NTEXT, and IMAGE data is stored separately from the table, then SQL Server uses less I/O to access it, assuming that the TEXT, NTEXT, or IMAGE data is not being accessed in the current operation. On the other hand, if the TEXT, NTEXT, and IMAGE data does need to be accessed, SQL Server has to go to extra work to retrieve it because it has to locate the pointer in the table, and then find the necessary data from the data pages. Assuming the TEXT, NTEXT, and IMAGE data does not have to be accessed frequently, this is a good performance tradeoff.
But in some cases, storing these datatypes away from the table (in their own data pages) is not the most efficient method. This is especially true if the TEXT, NTEXT, or IMAGE data is not large and if the data is frequently accessed. In this case, it is often better to store this data in the table itself, instead of separate data pages. The reason for this is because it is faster for SQL Server to retrieve data directly from the table than retrieving it from separate data pages, as described above.
Starting with SQL Server 2000, you now have the option to store smaller TEXT, NTEXT, and IMAGE data directly in a table instead of in separate data pages. If you want to turn this option on, you will have to set the "text in row" table option to permit this. For example:
sp_tableoption 'tablename', 'text in row', 'on'
or
sp_tableoption 'tablename', 'text in row', 'size'
Where 'size' is a value ranging from 24 to 7,000 bytes.
Using this option in the right circumstances can greatly help your application's performance. On the other hand, if used incorrectly, it can hurt performance. [2000] Updated 7-24-2006
*****
If you have been using the "text in row" option in SQL Server 2000, but have now decided to no longer use it and to turn it off, SQL Server will automatically move any TEXT, NTEXT, and IMAGE data from regular data pages to a new area in the database where other similar data is stored, leaving only 16 byte pointers behind in the original data page. Depending on the amount of data that was stored using this "text in row" option in data pages, it may take a while for SQL Server to move all of the data. You may want to schedule such a task during a time when SQL Server is not too busy. [2000] Updated 7-24-2006
*****
Ideally, all tables should have a primary key in order to enforce entity integrity. This of course requires the use of a unique index. One method that is commonly used by database designers is to add an identity column to each of their tables in order to ensure uniqueness. In many cases, this is a good idea. But, if the rows in a table are naturally unique, then the use of an identity column is not needed. In fact, doing so can reduce the performance of your database because it may require the use of unnecessary joins, and because it uses up data pages unnecessarily, which requires SQL Server to perform more I/O to read your table.
So don't automatically add identify columns to your table. Only add them if there is no column (or columns if you want to create a composite primary key) in your table that can guarantee uniqueness. [6.5, 7.0, 2000] Updated 10-16-2006
*****
If you plan is to use replication as part of your SQL Server-based application, and your goal is high scalability and performance, then this factor should be considered when the database is first designed. The fact that you will be using replication, and the type of replication you plan to use, can directly affect how your databases should be designed. [6.5, 7.0, 2000] Updated 10-16-2006
*****
Don't reinvent the wheel when it comes to database design; in other words, take advantage of SQL Server's built-in features to enforce data integrity, instead of writing your own. These built-in integrity features use much less overhead and perform faster than the ones you can create.
Examples of the built-in SQL Server integrity features you should take advantage of include the CHECK, DEFAULT, NOT NULL, PRIMARY KEY, UNIQUE, and FOREIGN IDENTITY constraints. They also include IDENTITY and TIMESTAMP columns. You can also take advantage of the ability to create your own user-defined data types.
Not only are these built-in data integrity features faster than ones you create, they also enforce these rules completely and consistently. [6.5, 7.0, 2000] Updated 10-16-2006
*****
The Query Optimizer can use check constraints to help speed up query processing in some cases. For example, if you have a check constraint on a column of the integer data type that specifies that the lowest allowable number is 1,000, and that the highest allowable number is 10,000, the Query Optimizer can use this information to help it make better decisions when optimizing the query and selecting the correct index(es) to use.
Consider adding check constants to all columns where they are applicable. This not only helps to ensure "proper" data is entered into your tables, in can boost overall performance of some queries. [7.0, 2000] Updated 11-6-2006
*****
Generally, most SQL Server-related development is done on a "development" SQL Server box, then moved to a "testing" SQL Server box, and then is eventually moved to a "production" SQL Server box. One step you can take to help ensure that your new application will perform as expected once it goes into production is to perform regression testing and trend-analysis when moving any code (in the application or the database) from one box to another. Doing so helps to prevent unexpected performance surprises.
For example, you run a query on the development box and notice that it performs at sub-second speed. Because it is fast, you don't think that it needs performance tuning. But when you move it from the development box to the test box, you notice that the same query now takes 45 seconds. After a little research, you find out that the number of rows hit in each box in the tables is different. In one key table on the development server, you discover that is only has 5,000 rows, but the same key table on the test server has 1,600,100 rows. In addition, when you take a look at the query plan for the query in question, you discover that the query is performing a table scan on both servers. So right away, you realize that the query, which appeared to be fast on the development server, is only fast because it only has to scan 5,000 rows. Obviously, the query needs tuning (or an appropriate index added).
Problems such as the one described above can sneak up on you and hurt your application's performance. That's why it is very important to do performance regression testing and trend-analysis whenever you move code from one server to another during the development process. [6.5, 7.0, 2000] Updated 11-6-2006
*****
When you have a choice of using a constraint or a trigger to perform the same task, always choose the constraint. The same goes if you have the option of using either a constraint or a rule, or a constraint or a default. Constraints require less overhead than triggers, rules, and defaults, reducing the load on SQL Server, and boosting overall performance. [6.5, 7.0, 2000] Updated 6-27-2006
*****
Don't implement redundant integrity features in your database. For example, if you are using primary key and foreign key constraints to enforce referential integrity, don't add unnecessary overhead by also adding a trigger that performs the same function. The same goes for using both constraints and defaults or constraints and rules that perform redundant work. While this may sound obvious, it is not uncommon to find these kinds of problem in SQL Server databases. [6.5, 7.0, 2000] Updated 6-27-2006

Pasted from <http://www.sql-server-performance.com/database_design.asp>

Went Blog

Tips on Optimizing SQL Server Database Design

没有评论:

博客归档