The first step performs the date math; it selects the data required for the raise calculation, assuming June 25, 2009, is the effective date of the raise, and ensures the performance rat
Trang 1DatePosition DATE NOT NULL )
INSERT dbo.Dept (DeptName, RaiseFactor) VALUES (’Engineering’, 1.2),
(’Sales’, 8), (’IT’, 2.5), (’Manufacturing’, 1.0) ; INSERT dbo.Employee (DeptID, LastName, FirstName,
Salary, PerformanceRating, DateHire, DatePosition) VALUES (1, ‘Smith’, ‘Sam’, 54000, 2.0, ‘19970101’, ‘19970101’),
(1, ‘Nelson’, ‘Slim’, 78000, 1.5, ‘19970101’, ‘19970101’), (2, ‘Ball’, ‘Sally’, 45000, 3.5, ‘19990202’, ‘19990202’), (2, ‘Kelly’, ‘Jeff’, 85000, 2.4, ‘20020625’, ‘20020625’), (3, ‘Guelzow’, ‘Jo’, 120000, 4.0, ‘19991205’, ‘19991205’), (3, ‘Ander’, ‘Missy’, 95000, 1.8, ‘19980201’, ‘19980201’), (4, ‘Reagan’, ‘Sam’, 75000, 2.9, ‘20051215’, ‘20051215’), (4, ‘Adams’, ‘Hank’, 34000, 3.2, ‘20080501’, ‘20080501’);
When developing complex queries, I work from the inside out The first step performs the date math;
it selects the data required for the raise calculation, assuming June 25, 2009, is the effective date of the
raise, and ensures the performance rating won’t count if it’s only 1:
SELECT EmployeeID, Salary,
CAST(CAST(DATEDIFF(d, DateHire, ‘20090625’)
AS DECIMAL(7, 2)) / 365.25 AS INT)
AS YrsCo,
CAST(CAST(DATEDIFF(d, DatePosition, ‘20090625’)
AS DECIMAL(7, 2)) / 365.25
* 12 AS INT)
AS MoPos, CASE WHEN Employee.PerformanceRating >= 2
THEN Employee.PerformanceRating ELSE 0
END AS Perf, Dept.RaiseFactor FROM dbo.Employee JOIN dbo.Dept
ON Employee.DeptID = Dept.DeptID Result:
EmployeeID Salary YrsCo MoPos Perf RaiseFactor - - - - -
Trang 24 85000.00 7 84 2.40 0.80
The next step in developing this query is to add the raise calculation The simplest way to see the
calcu-lation is to pull the values already generated from a subquery:
SELECT EmployeeID, Salary,
(2 + ((YearsCompany * 1) + (MonthPosition * 02)
+ (Performance * 5)) * RaiseFactor) / 100 AS EmpRaise
FROM (SELECT EmployeeID, FirstName, LastName, Salary,
CAST(CAST(DATEDIFF(d, DateHire, ‘20090625’) AS DECIMAL(7, 2)) / 365.25 AS INT) AS YearsCompany, CAST(CAST(DATEDIFF(d, DatePosition, ‘20090625’) AS DECIMAL(7, 2)) / 365.25 * 12 AS INT) AS MonthPosition, CASE WHEN Employee.PerformanceRating >= 2
THEN Employee.PerformanceRating ELSE 0
END AS Performance, Dept.RaiseFactor FROM dbo.Employee
JOIN dbo.Dept
ON Employee.DeptID = Dept.DeptID) AS SubQuery Result:
EmployeeID Salary EmpRaise
- -
5 120000.00 0.149500000
The last query was relatively easy to read, but there’s no logical reason for the subquery The query
could be rewritten combining the date calculations and the case expression into the raise formula:
SELECT EmployeeID, Salary,
(2 +
years with company
+ ((CAST(CAST(DATEDIFF(d, DateHire, ‘20090625’)
AS DECIMAL(7, 2)) / 365.25 AS INT) * 1) months in position
+ (CAST(CAST(DATEDIFF(d, DatePosition, ‘20090625’)
AS DECIMAL(7, 2)) / 365.25 * 12 AS INT) * 02)
Trang 3Performance Rating minimum + (CASE WHEN Employee.PerformanceRating >= 2
THEN Employee.PerformanceRating ELSE 0
END * 5)) Raise Factor
* RaiseFactor) / 100 AS EmpRaise FROM dbo.Employee
JOIN dbo.Dept
ON Employee.DeptID = Dept.DeptID It’s easy to verify that this query gets the same result, but which is the better query? From a
perfor-mance perspective, both queries generate the exact same query execution plan When considering
maintenance and readability, I’d probably go with the second query carefully formatted and commented
The final step is to convert the query into anUPDATEcommand The hard part is already done — it
just needs theUPDATEverb at the front of the query:
UPDATE Employee SET Salary = Salary *
(1 + ((2 years with company + ((CAST(CAST(DATEDIFF(d, DateHire, ‘20090625’)
AS DECIMAL(7, 2)) / 365.25 AS INT) * 1) months in position
+ (CAST(CAST(DATEDIFF(d, DatePosition, ‘20090625’)
AS DECIMAL(7, 2)) / 365.25 * 12 AS INT) * 02) Performance Rating minimum
+ (CASE WHEN Employee.PerformanceRating >= 2
THEN Employee.PerformanceRating ELSE 0
END * 5)) Raise Factor
* RaiseFactor) / 100 )) FROM dbo.Employee
JOIN dbo.Dept
ON Employee.DeptID = Dept.DeptID
A quick check of the data confirms that the update was successful:
SELECT FirstName, LastName, Salary FROM dbo.Employee
Result:
FirstName LastName Salary -
Trang 4Slim Nelson 83472.48
Missy Anderson 105972.50
The final step of the exercise is to clean up the sample tables:
DROP TABLE dbo.Employee, dbo.Dept;
This sample code pulls together techniques from many of the previous chapters: creating and dropping
tables,CASEexpressions, joins, and date scalar functions, not to mention the inserts and updates from
this chapter The example is long because it demonstrates more than just theUPDATEstatement It also
shows the typical process of developing a complexUPDATE, which includes the following:
1 Checking the available data: The firstSELECTjoinsemployeeanddept, and lists all the
columns required for the formula
2 Testing the formula: The secondSELECTis based on the initialSELECTand assembles the
formula from the required rows From this data, a couple of rows can be hand-tested against
the specs, and the formula verified
3 Performing the update: Once the formula is constructed and verified, the formula is edited
into anUPDATEstatement and executed
The SQLUPDATEcommand is powerful I have replaced terribly complex record sets and nested loops
that were painfully slow and error-prone withUPDATEstatements and creative joins that worked
well, and I have seen execution times reduced from hours to a few seconds I cannot overemphasize
the importance of approaching the selection and updating of data in terms of data sets, rather than
data rows
Deleting Data
TheDELETEcommand is dangerously simple In its basic form, it deletes all the rows from a table
Because theDELETEcommand is a row-based operation, it doesn’t require specifying any column
names The firstFROMis optional, as are the secondFROMand theWHEREconditions However,
although theWHEREclause is optional, it is the primary subject of concern when you’re using the
DELETEcommand Here’s an abbreviated syntax for theDELETEcommand:
DELETE [FROM] schema.Table
[FROM data sources]
[WHERE condition(s)];
Notice that everything is optional except the actualDELETEcommand and the table name The
following command would delete all data from theProducttable — no questions asked and no
second chances:
Trang 5DELETE FROM OBXKites.dbo.Product;
SQL Server has no inherent ‘‘undo’’ command Once a transaction is committed, that’s it That’s why the
WHEREclause is so important when you’re deleting
By far, the most common use of theDELETEcommand is to delete a single row The primary key is
usually the means of selecting the row:
USE OBXKites;
DELETE FROM dbo.Product WHERE ProductID = ‘DB8D8D60-76F4-46C3-90E6-A8648F63C0F0’;
Referencing multiple data sources while deleting
There are two techniques for referencing multiple data sources while deleting rows: the doubleFROM
clause and subqueries
TheUPDATEcommand uses theFROMclause to join the updated table with other tables for more
flexi-ble row selection TheDELETEcommand can use the exact same technique When using this method,
the first optionalFROMcan make it look confusing To improve readability and consistency, I
recom-mend that you omit the firstFROMin your code
For example, the followingDELETEstatement ignores the firstFROMclause and uses the secondFROM
clause to joinProductwithProductCategoryso that theWHEREclause can filter theDELETE
based on theProductCategoryName This query removes all videos from theProducttable:
DELETE dbo.Product FROM dbo.Product
JOIN dbo.ProductCategory
ON Product.ProductCategoryID
= ProductCategory.ProductCategoryID
WHERE ProductCategory.ProductCategoryName = ‘Video’;
The second method looks more complicated at first glance, but it’s ANSI standard and the preferred
method A correlated subquery actually selects the rows to be deleted, and theDELETEcommand just
picks up those rows for the delete operation It’s a very clean query:
DELETE FROM dbo.Product WHERE EXISTS
(SELECT * FROM dbo.ProductCategory AS pc WHERE pc.ProductCategoryID = Product.ProductCategoryID AND pc.ProductCategoryName = ‘Video’);
Trang 6It terms of performance, both methods generate the exact same query execution plan.
As with the UPDATE command’s FROM clause, the DELETE command’s second FROM clause is
not an ANSI SQL standard If portability is important to your project, then use a subquery
to reference additional tables.
Cascading deletes
Referential integrity (RI) refers to the idea that no secondary row foreign key should point to a primary
row primary key unless that primary row does in fact exist This means that an attempt to delete a
pri-mary row will fail if a foreign-key value somewhere points to that pripri-mary row
For more information about referential integrity and when to use it, turn to Chapter 3,
‘‘Relational Database Design,’’ and Chapter 20, ‘‘Creating the Physical Database Schema.’’
When implemented correctly, referential integrity will block any delete operation that would result in a
foreign key value without a corresponding primary key value The way around this is to first delete the
secondary rows that point to the primary row, and then delete the primary row This technique is called
a cascading delete In a complex database schema, the cascade might bounce down several levels before
working its way back up to the original row being deleted
There are two ways to implement a cascading delete: manually with triggers or automatically with
declared referential integrity (DRI) via foreign keys.
Implementing cascading deletes manually is a lot of work Triggers are significantly slower than foreign
keys (which are checked as part of the query execution plan), and trigger-based cascading deletes
usu-ally also handle the foreign key checks While this was commonplace a decade ago, today trigger-based
cascading deletes are very rare and might only be needed with a very complex nonstandard foreign key
design that includes business rules in the foreign key If you’re doing that, then you’re either very new
at this or very, very good
Fortunately, SQL Server offers cascading deletes as a function of the foreign key Cascading deletes may
be enabled via Management Studio, in the Foreign Key Relationship dialog, or in SQL code
The sample script that creates theCape Hatteras Adventures version 2database
(CHA2_Create.sql) provides a good example of setting the cascade-delete option for referential
integrity In this case, if either the event or the guide is deleted, then the rows in the event-guide
many-to-many table are also deleted TheON DELETE CASCADEforeign-key option is what actually
specifies the cascade action:
CREATE TABLE dbo.Event_mm_Guide (
EventGuideID
INT IDENTITY NOT NULL PRIMARY KEY,
EventID
INT NOT NULL
FOREIGN KEY REFERENCES dbo.Event ON DELETE CASCADE,
GuideID
INT NOT NULL
FOREIGN KEY REFERENCES dbo.Guide ON DELETE CASCADE,
LastName
Trang 7VARCHAR(50) NOT NULL, )
ON [PRIMARY];
As a caution, cascading deletes, or even referential integrity, are not suitable for every relationship It
depends on the permanence of the secondary row If deleting the primary row makes the secondary row
moot or meaningless, then cascading the delete makes good sense; but if the secondary row is still a
valid row after the primary row is deleted, then referential integrity and cascading deletes would cause
the database to break its representation of reality
As an example of determining the usefulness of cascading deletes from theCape Hatteras
Adventuresdatabase, consider that if a tour is deleted, then all scheduled events for that tour become
meaningless, as do the many-to-many schedule tables between event and customer, and between
event and guide Conversely, a tour must have a base camp, so referential integrity is required on the
Tour.BaseCampIDforeign key However, if a base camp is deleted, then the tours originating from
that base camp might still be valid (if they can be rescheduled to another base camp), so cascading a
base-camp delete down to the tour is not a reasonable action If RI is on and cascading deletes are off,
then a base camp with tours cannot be deleted until all tours for that base camp are either manually
deleted or reassigned to other base camps
Alternatives to physically deleting data
Some database developers choose to completely avoid deleting data Instead, they build systems to
remove the data from the user’s view while retaining the data for safekeeping (likedBase][did) This
can be done in several different ways:
■ A logical-deletebitflag, or nullableMomentDeletedcolumn, in the row can indicate that the row is deleted This makes deleting or restoring a single row a straightforward matter
of setting or clearing a bit However, because a relational database involves multiple related tables, there’s more work to it than that All queries must check the logical-delete flag and filter out logically deleted rows This means that a bit column (with extremely poor selectivity)
is probably an important index for every query While SQL Server 2008’s new filtered indexes are a perfect fit, it’s still a performance killer
■ To make matters worse, because the rows still physically exist in SQL Server, and SQL Server’s declarative referential integrity does not know about the logical-delete flag, custom referential integrity and cascading of logical delete flags are also required Restoring, or undeleting, cascaded logical deletes can become a nightmare
■ The cascading logical deletes method is complex to code and difficult to maintain This is a case of complexity breeding complexity, and I no longer recommend this method
■ Another alternative to physically deleting rows is to archive the deleted rows in an archive or audit table This method is best implemented by anINSTEAD OFtrigger that copies the data
to the alternative location and then physically deletes the rows from the production database
■ This method offers several advantages Data is physically removed from the database, so there’s no need to artificially modifySELECTqueries or index on a bit column Physically removing the data enables SQL Server referential integrity to remain in effect In addition, the database is not burdened with unnecessary data Retrieving archived data remains relatively straightforward and can be easily accomplished with a view that selects data from the archive location
Trang 8Chapter 53, ‘‘Data Audit Triggers,’’ details how to automatically generate the audit system
discussed here that stores, views, and recovers deleted rows.
Merging Data
An upsert operation is a logical combination of an insert and an update If the data isn’t already in the
table, the upsert inserts the data; if the data is already in the table, then the upsert updates with the
dif-ferences Ignoring for a moment the newMERGEcommand in SQL Server 2008, there are a few ways to
code an upsert operation with T-SQL:
■ The most common method is to attempt to locate the data with anIF EXISTS; and if the row
was found,UPDATE, otherwiseINSERT
■ If the most common use case is that the row exists and theUPDATEwas needed, then the best
method is to do the update, and if@@RowCount = 0, then the row was new and the insert
should be performed
■ If the overwhelming use case is that the row would be new to the database, thenTRYto
INSERTthe new row; if a unique index blocked theINSERTand fired an error, thenCATCH
the error andUPDATEinstead
All three methods are potentially obsolete with the newMERGEcommand TheMERGEcommand is very
well done by Microsoft — it solves a complex problem well with a clean syntax and good performance
First, it’s called ‘‘merge’’ because it does more than an upsert Upsert only inserts or updates; merge can
be directed to insert, update, and delete all in one command
In a nutshell,MERGEsets up a join between the source table and the target table, and can then perform
operations based on matches between the two tables
To walk through a merge scenario, the following example sets up an airline flight check-in scenario The
main work table isFlightPassengers, which holds data about reservations It’s updated as travelers
check in, and by the time the flight takes off, it has the actual final passenger list and seat assignments
In the sample scenario, four passengers are scheduled to fly SQL Server Airlines flight 2008 (Denver to
Seattle) on March 1, 2008 Poor Jerry, he has a middle seat on the last row of the plane — the row that
doesn’t recline:
USE tempdb;
Merge Target Table
CREATE TABLE FlightPassengers (
FlightID INT NOT NULL
IDENTITY PRIMARY KEY, LastName VARCHAR(50) NOT NULL,
FirstName VARCHAR(50) NOT NULL,
FlightCode CHAR(6) NOT NULL,
FlightDate DATE NOT NULL,
Seat CHAR(3) NOT NULL
Trang 9INSERT FlightPassengers
(LastName, FirstName, FlightCode, FlightDate, Seat) VALUES (‘Nielsen’, ‘Paul’, ‘SS2008’, ‘20090301’, ‘9F’),
(‘Jenkins’, ‘Sue’, ‘SS2008’, ‘20090301’, ‘7A’), (‘Smith’, ‘Sam’, ‘SS2008’, ‘20090301’, ‘19A’), (‘Nixon’, ‘Jerry’, ‘SS2008’, ‘20090301’, ‘29B’);
The day of the flight, the check-in counter records all the passengers as they arrive, and their seat
assignments, in theCheckIntable One passenger doesn’t show, a new passenger buys a ticket, and
Jerry decides today is a good day to burn an upgrade coupon:
Merge Source table
CREATE TABLE CheckIn (
LastName VARCHAR(50), FirstName VARCHAR(50), FlightCode CHAR(6), FlightDate DATE, Seat CHAR(3) );
INSERT CheckIn (LastName, FirstName, FlightCode, FlightDate, Seat)
VALUES (‘Nielsen’, ‘Paul’, ‘SS2008’, ‘20090301’, ‘9F’),
(‘Jenkins’, ‘Sue’, ‘SS2008’, ‘20090301’, ‘7A’), (‘Nixon’, ‘Jerry’, ‘SS2008’, ‘20090301’, ‘2A’), (‘Anderson’, ‘Missy’, ‘SS2008’, ‘20090301’, ‘4B’);
Before theMERGEcommand is executed, the next three queries look for differences in the data
The first set-difference query returns any no-show passengers ALEFT OUTER JOINbetween the
FlightPassengersandCheckIntables finds every passenger with a reservation joined with their
CheckInrow if the row is available If noCheckInrow is found, then theLEFT OUTER JOINfills
in theCheckIncolumn with nulls Filtering for the null returns only those passengers who made a
reservation but didn’t make the flight:
NoShows SELECT F.FirstName + ‘ ’ + F.LastName AS Passenger, F.Seat FROM FlightPassengers AS F
LEFT OUTER JOIN CheckIn AS C
ON C.LastName = F.LastName AND C.FirstName = F.FirstName AND C.FlightCode = F.FlightCode AND C.FlightDate = F.FlightDate
WHERE C.LastName IS NULL
Result:
-
Trang 10The walk-up check-in query uses aLEFT OUTER JOINand anIS NULLin theWHEREclause to locate
any passengers who are in theCheckIntable but not in theFlightPassengertable:
Walk Up CheckIn
SELECT C.FirstName + ‘ ’ + C.LastName AS Passenger, C.Seat
FROM CheckIn AS C
LEFT OUTER JOIN FlightPassengers AS F
ON C.LastName = F.LastName
AND C.FirstName = F.FirstName AND C.FlightCode = F.FlightCode AND C.FlightDate = F.FlightDate
WHERE F.LastName IS NULL
Result:
-
The last difference query lists any seat changes, including Jerry’s upgrade to first class This query uses
an inner join because it’s searching for passengers who both had previous seat assignments and now are
boarding with a seat assignment The query compares theseatcolumns from theFlightPassenger
andCheckIntables using a not equal comparison, which finds any passengers with a different seat
than previously assigned Go Jerry!
Seat Changes
SELECT C.FirstName + ‘ ’ + C.LastName AS Passenger, F.Seat AS
‘previous seat’, C.Seat AS ‘final seat’
FROM CheckIn AS C
INNER JOIN FlightPassengers AS F
ON C.LastName = F.LastName
AND C.FirstName = F.FirstName AND C.FlightCode = F.FlightCode AND C.FlightDate = F.FlightDate AND C.Seat <> F.Seat
WHERE F.Seat IS NOT NULL
Result:
Passenger previous seat final seat
- -
For another explanation of set difference queries, flip over to Chapter 10, ‘‘Merging Data
with Joins and Unions.’’
With the scenario’s data in place and verified with set-difference queries, it’s time to merge the check-in
data into theFlightPassengertable