Creating the PhysicalDatabase Schema IN THIS CHAPTER Creating the database files Creating tables Creating primary and foreign keys Configuring constraints Creating the user data columns
Trang 2Creating the Physical
Database Schema
IN THIS CHAPTER Creating the database files Creating tables
Creating primary and foreign keys
Configuring constraints Creating the user data columns Documenting the database schema
Creating indexes
The longer I work with databases, the more I become convinced that the
real magic is the physical schema design No aspect of application
devel-opment has more potential to derail an application or enable it to soar
than the physical schema — not even indexing (This idea is crucial to my view
of Smart Database Design as expressed in Chapter 2, ‘‘Data Architecture’’.)
The primary features of the application are designed at the data schema level
If the data schema supports a feature, then the code will readily bring the feature
to life; but if the feature is not designed in the tables, then the client application
can jump through as many hoops as you can code and it will never work right
The logical database schema, discussed in Chapter 3, ‘‘Relational Database
Design,’’ is a necessary design step to ensure that the business requirements
are well understood However, a logical design has never stored nor served up
any data
In contrast, the physical database schema is an actual data store that must meet
the Information Architecture Principle’s call to make information ‘‘readily available
in a usable format for daily operations and analysis by individuals, groups, and
processes.’’ It’s the physical design that meets the database objectives of usability,
scalability, integrity, and extensibility
This chapter first discusses designing the physical database schema and then
focuses on the actual implementation of the physical design:
■ Creating the database files
■ Creating the tables
■ Creating the primary and foreign keys
■ Creating the data columns
■ Adding data-integrity constraints
Trang 3Similarly, while this chapter covers the syntax and mechanics of creating indexes, Chapter 64, ‘‘Indexing
Strategies,’’ explores how to fine-tune indexes for performance.
Part III, ‘‘Beyond Relational,’’ has three chapters related to database design Chapters 17–19, discuss
modeling and working with data that’s not traditionally thought of as relational: hierarchical data, spatial
data, XML data, full-text searches, and storing BLOBs using Filestream.
Implementation of the database often considers partitioning the data, which is covered in Chapter 68,
‘‘Partitioning.’’
And whereas this chapter focuses on building relational databases, Chapter 70, ‘‘BI Design,’’ covers
creating data warehouses.
What’s New with the Physical Schema?
SQL Server 2008 supports several new data types, and an entirely new way of storing data in the data
pages —sparse columns Read on!
Designing the Physical Database Schema
If there’s one area where I believe the SQL Server community is lacking in skills and emphasis, it’s
translating a logical design into a decent physical schema
When designing the physical design, the design team should begin with a clean logical design and/or
well-understood and documented business rules, and then brainstorm until a simple, flexible design
emerges that performs excellently, is extensible and flexible, has appropriate data integrity, and is
usable by all those who will consume the data I firmly believe it’s not a question of compromising one
database attribute for another; all these goals can be met by a well-designed elegant physical schema
Translating the logical database schema into a physical database schema may involve the following
changes:
■ Converting complex logical designs into simpler, more agile table structures
■ Converting logical many-to-many relationships to two physical one-to-many relationships with
an associative, or junction, table
■ Converting logical composite primary keys to surrogate (computer-generated) single-column primary keys
■ Converting the business rules into constraints or triggers, or, better yet, into data-driven designs
Trang 4Logical to physical options
Every project team develops the physical database schema drawing from these two disciplines (logical
data modeling and physical schema design) in one of the following possible combinations:
■ A logical database schema is designed and then implemented without the benefit of
physi-cal schema development This plan is a sure way to develop a slow and unwieldy database
schema The application code will be frustrating to write and the code will not be able to
overcome the performance limitations of the design
■ A logical database schema is developed to ensure that the business requirements are
under-stood Based on the logical design, the database development team develops a physical
database schema This method can result in a fast, usable schema
Developing the schema in two stages is a good plan if the development team is large enough
and one team is designing and collecting the business requirements and another team is
developing the physical database schema Make sure that having a completed logical database
schema does not squelch the team’s creativity as the physical database schema is designed
■ The third combination of logical and physical design methodologies combines the two into
a single development step as the database development team develops a physical database
schema directly from the business requirements This method can work well providing that
the design team fully understands logical database modeling, physical database modeling, and
advanced query design
The key task in designing a physical database schema is brainstorming multiple possible designs, each
of which meets the user requirements and ensures data integrity Each design is evaluated based on its
simplicity, performance of possible query paths, flexibility, and maintainability
Refining the data patterns
The key to simplicity is refining the entity definition with a lot of team brainstorming so that each table
does more work — rearranging the data patterns until an elegant and simple pattern emerges This is
where a broad repertoire of database experience aids the design process
Often the solution is to view the data from multiple angles, finding the commonality between them
Users are too close to the data and they seldom correctly identify the true entities What a user might
see as multiple entities a database design team might model as a single entity with dynamic roles
Combining this quest for simplicity with some data-driven design methods can yield normalized
databases with higher data integrity, more flexibility and agility, and dramatically fewer tables
Designing for performance
A normalized logical database design without the benefit of physical database schema optimization will
perform poorly, because the logical design alone doesn’t consider performance Issues such as lock
con-tention, composite keys, excessive joins for common queries, and table structures that are difficult to
update are just some of the problems that a logical design might bring to the database
Designing for performance is greatly influenced by the simplicity or complexity of the design Each
unnecessary complexity requires additional code, extra joins, and breeds even more complexity
Trang 5A popular myth is that the primary task of translating a logical design into a physical schema is
denor-malization Denormalization, purposefully breaking the normal forms, is the technique of duplicating
data within the data to make it easier to retrieve Interestingly, the Microsoft Word spell checker
sug-gests replacing ‘‘denormalization’’ with ‘‘demoralization.’’ Within the context of a transactional, OLTP
database, I couldn’t agree more
Normalization is described in Chapter 3, ‘‘Relational Database Design.’’
Some examples of denormalizing a data structure, including the customer name in an[Order]table
would enable retrieving the customer name when querying an order without joining to theCustomer
table Or, including theCustomerIDin aShipDetailtable would enable joining directly from
theShipDetailtable to theCustomertable while bypassing theOrderDetailand[Order]
tables Both of these examples violate the normalization because the attributes don’t depend on the
primary key
Some developers regularly denormalize portions of the database in an attempt to improve performance
While it might seem that this would improve performance because it reduces the number of joins, I
have found that in practice the additional code (procedures, triggers, constraints, etc.) required to keep
the data consistent, or to renormalize data for use in set-based queries, actually costs performance In
my consulting practice, I’ve tested a normalized design vs a denormalized design several times In every
case the normalized design was about 15% faster than the denormalized design
Best Practice
There’s a common saying in the database field: ‘‘Normalize ‘til it hurts, then denormalize ‘til it works.’’
Poppycock! That’s a saying of data modelers who don’t know how to design an efficient physical schema
I never denormalize apart from the two cases identified as responsible denormalization
■ Denormalize aggregate data — such as account balances, or inventory on hand quantities within OLTP databases — for performance even though such data could be calculated from the inventory transaction table or the account transaction ledger table These may be calculated using a trigger or a persisted computed column
■ If the data is not original and is primarily there for OLAP or reporting purposes, data consistency is not the primary concern For performance, denormalization is a wise move
The architecture of the databases, and which databases or tables are being used for which purpose, are
the driving factors in any decision to denormalize a part of the database
Trang 6If the database requires both OLTP and OLAP, the best solution might just be to create a few tables that
duplicate data for their own distinct purposes The OLTP side might need its own tables to maintain the
data, but the reporting side might need that same data in a single, wide, fast table from which it can
retrieve data without any joins or locking concerns The trick is to correctly populate the denormalized
data in a timely manner
Indexed views are basically denormalized clustered indexes Chapter 64, ‘‘Indexing
Strate-gies,’’ discusses setting up an indexed view Chapter 70, ‘‘BI Design,’’ includes advice on
creating a denormalized reporting database and data warehouse.
Designing for extensibility
Maintenance over the life of the application will cost significantly more than the initial development
Therefore, during the initial development process you should consider as a primary objective making it
as easy as possible to maintain the physical design, code, and data The following techniques may reduce
the cost of database maintenance:
■ Enforce a strong T-SQL based abstraction layer
■ Always normalize the schema
■ Data-driven designs are more flexible, and therefore more extensible, than rigid designs
■ Use a consistent naming convention
■ Avoid data structures that are overly complex, as well as unwieldy data structures, when
simpler data structures will suffice
■ Develop with scripts instead of using Management Studio’s UI
■ Enforce the data integrity constraints from the beginning Polluted data is a bear to clean up
after even a short time of loose data-integrity rules
■ Develop the core feature first, and once that’s working, then add the bells and whistles
■ Document not only how the procedure works, but also why it works
Creating the Database
The database is the physical container for all database schemas, data, and server-side programming SQL
Server’s database is a single logical unit, even though it may exist in several files
Database creation is one of those areas in which SQL Server requires little administrative work, but you
may decide instead to fine-tune the database files with more sophisticated techniques
The Create DDL command
Creating a database using the default parameters is very simple The following data definition language
(DDL) command is taken from theCape Hatteras Adventuressample database:
CREATE DATABASE CHA2;
TheCREATEcommand will create a data file with the name provided and a.mdffile extension, as well
as a transaction log with an.ldfextension
Trang 7can be changed in the Database Settings page of the Server Properties dialog.
While these defaults might be acceptable for a sample or development database, they are sorely
inade-quate for a production database Better alternatives are explained as theCREATE DATABASEcommand is
covered
Using the Object Explorer, creating a new database requires only that the database name be entered in
the New Database form, as shown in Figure 20-1 Use the New Database menu command from the
Databases node’s context menu to open the New Database form
FIGURE 20-1
The simplest way to create a new database is by entering the database name in Object Explorer’s
New Database page
Trang 8The New Database page includes several individual subpages — the General, Options, and Filegroups
pages, as shown in Table 20-1 For existing databases, the Files, Permissions, Extended Properties,
Mirroring, and Log Shipping pages are added to the Database Properties page (not shown)
TABLE 20-1
Database Property Pages Page New Database Existing Database
General Create new database, setting the
name, owner, collation, recovery model, full-text indexing, and data file properties
View (read-only) general properties: name, last back-up, size, collation
recovery model, full-text indexing, and database files
Filegroups View and modify filegroup
information
View and modify filegroup information
Options View and modify database options
such as auto shrink, ANSI settings, page verification method, and single-user access
View and modify database options such as auto shrink, ANSI settings, page verification method, and single-user access
Permissions n/a View and modify server roles, users, and
permissions See Chapter 50, ‘‘Authorizing Securables,’’ for more details
Extended
Properties
Mirroring n/a View and configure database mirroring,
covered in Chapter 47, ‘‘Mirroring’’
Transaction Log
Shipping
n/a View and configure database mirroring,
covered in Chapter 46, ‘‘Log Shipping’’
Database-file concepts
A database consists of two files (or two sets of files): the data file and the transaction log The data file
contains all system and user tables, indexes, views, stored procedures, user-defined functions, triggers,
and security permissions The write-ahead transaction log is central to SQL Server’s design All updates
to the data file are first written and verified in the transaction log, ensuring that all data updates are
written to two places
Never store the transaction log on the same disk subsystem as the data file For the sake
of the transactional-integrity ACID properties and the recoverability of the database, it’s
critical that a failing disk subsystem not be able to take out both the data file and the transaction file.
Trang 9recovery plan, as discussed in Chapter 41, ‘‘Recovery Planning.’’ How SQL Server uses the transaction log within transactions is covered in Chapter 66, ‘‘Managing Transactions, Locking, and
Blocking.’’
Configuring file growth
Prior to SQL Server version 7, the data files required manual size adjustment to handle additional data
Fortunately, for about a decade now, SQL Server can automatically grow thanks to the following options
(see Figure 20-2):
■ Enable Autogrowth: As the database begins to hold more data, the file size must grow If autogrowth is not enabled, an observant DBA will have to manually adjust the size If auto-growth is enabled, SQL Server automatically adjusts the size according to one of the following growth parameters:
■ In percent: When the data file needs to grow, this option will expand it by the percent specified Growing by percent is the best option for smaller databases With very large files, this option may add too much space in one operation and hurt performance while the data file is being resized For example, adding 10 percent to a 5GB data file will add 500MB;
writing 500MB could take a while
■ In megabytes: When the data file needs to grow, this option will add the specified number of megabytes to the file Growing by a fixed size is a good option for larger data files
Best Practice
The default setting is to grow the data file by 1 MB Autogrow events require database locks, which
severely impact performance Imagine a database that grows by a couple of gigabytes It will have to
endure 2,048 tiny autogrow events On the other hand, a large autogrowth event will consume more time
The best solution is to turn autogrow off and manually increase the file size during the database maintenance
window However, if that’s not expedient, then I recommend setting Autogrow to a reasonable mid-size
growth For my open-source O/R DBMS product, Nordic, I set both the initial size and the autogrow to
100MB
■ Maximum file size: Setting a maximum size can prevent the data file or transaction log file from filling the entire disk subsystem, which would cause trouble for the operating system
The maximum size for a data file is 16 terabytes, and log files are limited to 2 terabytes This does not
limit the size of the database because a database can include multiple files
Trang 10FIGURE 20-2
With Management Studio’s New Database form, a new database is configured for automatic file
growth and a maximum size of 20GB
Automatic file growth can be specified in code by adding the file options to theCREATE DATABASE
DDL command File size can be specified in kilobytes (KB), megabytes (MB), gigabytes (GB), or
terabytes (TB) Megabytes is the default File growth can be set to a size or a percent The following
code creates theNewDBdatabase with an initial data-file size of 10MB, a maximum size of 200GB, and
a file growth of 10MB The transaction log file is initially 5MB, with a maximum size of 10GB and a
growth of 100MB:
CREATE DATABASE NewDB
ON
PRIMARY
(NAME = NewDB,
FILENAME = ‘c:\SQLData\NewDB.mdf’,
SIZE = 10MB,
MAXSIZE = 200Gb,
FILEGROWTH = 100)
LOG ON
(NAME = NewDBLog,
FILENAME = ‘d:\SQLLog\NewDBLog.ldf’,
SIZE = 5MB,
MAXSIZE = 10Gb,
FILEGROWTH = 100);
All the code in this chapter and all the sample databases are available for download from
the book’s website, www.sqlserverbible.com In addition, there are extensive queries
using the catalog views that relate to this chapter.
If autogrowth is not enabled, then the files require manual adjustment if they are to handle additional
data File size can be adjusted in Management Studio by editing it in the database properties form
An easy way to determine the files and file sizes for all databases from code is to query the
sys.database_files catalog view.