Best Practices for Database Programming Software development is not just a practical discipline performed by coders, but also an area of academic research and theory.. Defensive Program
Trang 1Best Practices for
Database Programming
Software development is not just a practical discipline performed by coders, but also an area of
academic research and theory There is now a great body of knowledge concerning software
development, and lengthy academic papers have been written to propose, dissect, and discuss different approaches to development Various methodologies have emerged, including test-driven development (TDD), agile and extreme programming (XP), and defensive programming, and there have been
countless arguments concerning the benefits afforded by each of these schools of thought
The practices described in this chapter, and the approach taken throughout the rest of this book, are most closely aligned with the philosophy of defensive programming However, the topics discussed here can be applied just as readily in any environment While software theorists may argue the finer
differences between different methodologies (and undoubtedly, they do differ in some respects), when it comes down to it, the underlying features of good programming remain the same whatever
methodology you apply
I do not intend to provide an exhaustive, objective guide as to what constitutes best practice, but
rather to highlight some of the standards that I believe demonstrate the level of professionalism that
database developers require in order to do a good job I will present the justification of each argument
from a defensive point of view, but remember that they are generally equally valid in other
environments
Defensive Programming
Defensive programming is a methodology used in software development that suggests that developers should proactively anticipate and make allowances for (or “defend against”) unforeseen future events The objective of defensive programming is to create applications that can remain robust and effective, even when faced with unexpected situations
Defensive programming essentially involves taking a pessimistic view of the world—if something
can go wrong, it will: network resources will become unavailable halfway through a transaction; required files will be absent or corrupt; users will input data in any number of ways different from that expected, and so on Rather than leave anything to chance, a defensive programmer will have predicted the
possibility of these eventualities, and will have written appropriate handling code to check for and deal with these situations This means that potential error conditions can be detected and handled before an
Trang 2cases, it may be possible to identify and isolate a particular component responsible for a failure, allowing the rest of the application to continue functioning
There is no definitive list of defensive programming practices, but adopting a defensive stance to development is generally agreed to include the following principles:
• Keep things simple (or KISS—keep it simple, stupid) Applications are not made
powerful and effective by their complexity, but by their elegant simplicity
Complexity allows bugs to be concealed, and should be avoided in both application design and in coding practice itself
• “If it ain’t broke, fix it anyway.” Rather than waiting for things to break, defensive
programming encourages continuous, proactive testing and future-proofing of an application against possible breaking changes in the future
• Be challenging, thorough, and cautious at all stages and development “What if?”
analyses should be conducted in order to identify possible exceptional scenarios that might occur during normal (and abnormal) application usage
• Extensive code reviews and testing should be conducted with different peer
groups, including other developers or technical teams, consultants, end users, and management Each of these different groups may have different implicit
assumptions that might not be considered by a closed development team
• Assumptions should be avoided wherever possible If an application requires a
certain condition to be true in order to function correctly, there should be an explicit assertion to this effect, and relevant code paths should be inserted to check and act accordingly based on the result
• Applications should be built from short, highly cohesive, loosely coupled modules
Modules that are well encapsulated in this way can be thoroughly tested in isolation, and then confidently reused throughout the application Reusing specific code modules, rather than duplicating functionality, reduces the chances
of introducing new bugs
Throughout the remainder of this chapter, I'll be providing simple examples of what I believe to be best practices demonstrating each of these principles, and these concepts will be continually
reexamined in later chapters of this book
Attitudes to Defensive Programming
The key advantages of taking a defensive approach to programming are essentially twofold:
• Defensive applications are typically robust and stable, require fewer essential bug
fixes, and are more resilient to situations that may otherwise lead to expensive failures or crashes As a result, they have a long expected lifespan, and relatively cheap ongoing maintenance costs
• In many cases, defensive programming can lead to an improved user experience
By actively foreseeing and allowing for exceptional circumstances, errors can be
caught before they occur, rather than having to be handled afterward Exceptions
can be isolated and handled with a minimum negative effect on user experience, rather than propagating an entire system failure Even in the case of extreme
Trang 3unexpected conditions being encountered, the system can still degrade gracefully
and act according to documented behavior
However, as with any school of thought, defensive programming is not without its opponents Some
of the criticisms commonly made of defensive coding are listed following In each case, I’ve tried to give
a reasoned response to each criticism
Defensive code takes longer to develop
It is certainly true that following a defensive methodology can result in a longer up-front development
time when compared to applications developed following other software practices Defensive
programming places a strong emphasis on the initial requirements-gathering and architecture design
phases, which may be longer and more involved than in some methodologies Coding itself takes longer
because additional code paths may need to be added to handle checks and assertions of assumptions
Code must be subjected to an extensive review that is both challenging and thorough, and then must
undergo rigorous testing All these factors contribute to the fact that the overall development and release
cycle for defensive software is longer than in other approaches
There is a particularly stark contrast between defensive programming and so-called “agile”
development practices, which focus on releasing frequent iterative changes on a very accelerated
development and release cycle However, this does not necessarily mean that defensive code takes
longer to develop when considered over the full life cycle of an application The additional care and
caution invested in code at the initial stages of development are typically paid back over the life of the
project, because there is less need for code fixes to be deployed once the project has gone live
Writing code that anticipates and handles every possible scenario makes defensive
applications bloated
Code bloat suggests that an application contains unnecessary, inefficient, or wasteful code Defensive
code protects against events that may be unlikely to happen, but that certainly doesn’t mean that they
can’t happen Taking actions to explicitly test for and handle exceptional circumstances up front can
save lots of hours spent possibly tracing and debugging in the future Defensive applications may
contain more total lines of code than other applications, but all of that code should be well designed,
with a clear purpose Note that the label of “defensive programming” is sometimes misused: the
addition of unnecessary checks at every opportunity without consideration or justification is not
defensive programming Such actions lead to code that is both complex and rigid Remember that true
defensive programming promotes simplicity, modularization, and code reuse, which actually reduces
code bloat
Defensive programming hides bugs that then go unfixed, rather than making them
visible
This is perhaps the most common misconception applied to defensive practices, which manifests from a
failure to understand the fundamental attitude toward errors in defensive applications By explicitly
Trang 4to handle them To demonstrate this in practical terms, consider the following code listing, which describes a simple stored procedure to divide one number by another:
CREATE PROCEDURE Divide (
However, it is important to realize that the preceding code listing is not defensive—it does nothing
to prevent the exceptional circumstance from occurring, and its only effect is to allow the system to
continue operating, pretending that nothing bad has happened Exception hiding such as this can be
very dangerous, and makes it almost impossible to ensure the correct functioning of an application The defensive approach would be, before attempting to perform the division, to explicitly check that all the requirements for that operation to be successful are met This means asserting such things as making sure that values for @x and @y are supplied (i.e., they are not NULL), that @y is not equal to zero, that the supplied values lie within the range that can be stored within the decimal(18,2) datatype, and so on The following code listing provides a simplified defensive approach to this same procedure:
ALTER PROCEDURE Divide (
Trang 5For the purposes of the preceding example, each assertion was accompanied by a simple PRINT
statement to advise which of the conditions necessary for the procedure to execute failed In real life,
these code paths may handle such assertions in a number of ways—typically logging the error, reporting
a message to the user, and attempting to continue system operation if it is possible to do so In doing so,
they prevent the kind of unpredictable behavior associated with an exception that has not been
expected
Defensive programming can be contrasted to the fail fast methodology, which focuses on
immediate recognition of any errors encountered by causing the application to halt whenever an
exception occurs Just because the defensive approach doesn’t espouse ringing alarm bells and flashing
lights doesn’t mean that it hides errors—it just reports them more elegantly to the end user and, if
possible, continues operation of the core part of the system
Why Use a Defensive Approach to Database Development?
As stated previously, defensive programming is not the only software development methodology that
can be applied to database development Other common approaches include TDD, XP, and fail-fast
development So why have I chosen to focus on just defensive programming in this chapter, and
throughout this book in general? I believe that defensive programming is the most appropriate approach
for database development for the following reasons:
Database applications tend to have a longer expected lifespan than other
software applications Although it may be an overused stereotype to suggest that
database professionals are the sensible, fastidious people of the software
development world, the fact is that database development tends to be more
slow-moving and cautious than other technologies Web applications, for example, may
be revised and relaunched on a nearly annual basis, in order to take advantage of
whatever technology is current at the time In contrast, database development
tends to be slow and steady, and a database application may remain current for
many years without any need for updating from a technological point of view As a
result, it is easier to justify the greater up-front development cost associated with
defensive programming The benefits of reliability and bug resistance will typically
Trang 6the habit of hitting Ctrl+Alt+Delete to reset their machine when a web browser
hangs, or because some application fails to shut down correctly However, the
same tolerance that is shown to personal desktop software is not typically extended
to corporate database applications Recent highly publicized scandals in which
bugs have been exploited in the systems of several governments and large
organizations have further heightened the general public’s ultrasensitivity toward
anything that might present a risk to database integrity
Any bugs that do exist in database applications can have more severe
consequences than in other software It can be argued that people are absolutely
right to be more worried about database bugs than bugs in other software An
unexpected error in a desktop application may lead to a document or file becoming
corrupt, which is a nuisance and might lead to unnecessary rework But an
unexpected error in a database may lead to important personal, confidential, or
sensitive data being placed at risk, which can have rather more serious
consequences The nature of data typically stored in a database warrants a
cautious, thorough approach to development, such as defensive programming
provides
Designing for Longevity
Consumer software applications have an increasingly short expected shelf life, with compressed release cycles pushing out one release barely before the predecessor has hit the shelves However, this does not have to be the case Well-designed, defensively programmed applications can continue to operate for many years In one organization I worked for, a short-term tactical management information data store was created so that essential business reporting functions could continue while the organization’s systems went through an integration following a merger Despite only being required for an immediate post-merger period, the (rather unfortunately named) Short Term Management Information database continued to be used for up to ten years later, as it remained more reliable and robust than subsequent attempted
replacements
And let that be a lesson in choosing descriptive names for your databases that won’t age with time!
Best Practice SQL Programming Techniques
Having looked at some of the theory behind different software methodologies, and in particular the defensive approach to programming, you’re now probably wondering about how to put this into practice As in any methodology, defensive programming is more concerned with the mindset with which you should approach development than prescribing a definitive set of rules to follow As a result, this section will only provide examples that illustrate the overall concepts involved, and should not be treated as an exhaustive list I’ll try to keep the actual examples as simple as possible in every case, so that you can concentrate on the reasons I consider these to be best practices, rather than the code itself
Trang 7Identify Hidden Assumptions in Your Code
One of the core tenets of defensive programming is to identify all of the assumptions that lie behind the
proper functioning of your code Once these assumptions have been identified, the function can either
be adjusted to remove the dependency on them, or explicitly test each condition and make provisions
should it not hold true In some cases, “hidden” assumptions exist as a result of code failing to be
sufficiently explicit
To demonstrate this concept, consider the following code listing, which creates and populates a
Customers and an Orders table:
CREATE TABLE Customers(
CustID int,
Name varchar(32),
Address varchar(255));
INSERT INTO Customers(CustID, Name, Address) VALUES
(1, 'Bob Smith', 'Flat 1, 27 Heigham Street'),
(2, 'Tony James', '87 Long Road');
Trang 8The query executes successfully and we get the results expected:
Bob Smith Flat 1, 27 Heigham Street 1
Bob Smith Flat 1, 27 Heigham Street 2
Tony James 87 Long Road 3
But what is the hidden assumption? The column names listed in the SELECT query were not qualified with table names, so what would happen if the table structure were to change in the future? Suppose that an Address column were added to the Orders table to enable a separate delivery address to be attached to each order, rather than relying on the address in the Customers table:
ALTER TABLE Orders ADD Address varchar(255);
GO
The unqualified column name, Address, specified in the SELECT query, is now ambiguous, and if we attempt to run the original query again we receive an error:
Msg 209, Level 16, State 1, Line 1
Ambiguous column name 'Address'
By not recognizing and correcting the hidden assumption contained in the original code, the query subsequently broke as a result of the additional column being added to the Orders table The simple practice that could have prevented this error would have been to ensure that all column names were prefixed with the appropriate table name or alias:
Suppose that you had a table, MainData, containing some simple values, as shown in the following code listing:
CREATE TABLE MainData(
ID int,
Value char(3));
GO
Trang 9INSERT INTO MainData(ID, Value) VALUES
(1, 'abc'), (2, 'def'), (3, 'ghi'), (4, 'jkl');
GO
Now suppose that every change made to the MainData table was to be recorded in an associated
ChangeLog table The following code demonstrates this structure, together with a mechanism to
automatically populate the ChangeLog table by means of an UPDATE trigger attached to the MainData table:
CREATE TABLE ChangeLog(
ChangeID int IDENTITY(1,1),
DECLARE @ID int;
SELECT @ID = ID FROM INSERTED;
DECLARE @OldValue varchar(32);
SELECT @OldValue = Value FROM DELETED;
DECLARE @NewValue varchar(32);
SELECT @NewValue = Value FROM INSERTED;
INSERT INTO ChangeLog(RowID, OldValue, NewValue, ChangeDate)
VALUES(@ID, @OldValue, @NewValue, GetDate());
GO
We can test the trigger by running a simple UPDATE query against the MainData table:
UPDATE MainData SET Value = 'aaa' WHERE ID = 1;
GO
The query appears to be functioning correctly—SQL Server Management Studio reports the following:
(1 row(s) affected)
(1 row(s) affected)
Trang 10And, as expected, we find that one row has been updated in the MainData table:
and an associated row has been created in the ChangeLog table:
ChangeID RowID OldValue NewValue ChangeDate
1 1 abc aaa 2009-06-15 14:11:09.770
However, once again, there is a hidden assumption in the code Within the trigger logic, the
variables @ID, @OldValue, and @NewValue are assigned values that will be inserted into the ChangeLog table Clearly, each of these scalar variables can only be assigned a single value, so what would happen if you were to attempt to update two or more rows in a single statement?
UPDATE MainData SET Value = 'zzz' WHERE ID IN (2,3,4);
Trang 11The result in this case is that all three rows affected by the UPDATE statement have been changed in
the MainData table:
but only the first update has been logged:
ChangeID RowID OldValue NewValue ChangeDate
1 1 abc aaa 2009-06-15 14:11:09.770
2 2 def zzz 2009-06-15 15:18:11.007
The failure to foresee the possibility of multiple rows being updated in a single statement led to a
silent failure on this occasion, which is much more dangerous than the overt error given in the previous
example Had this scenario been actively considered, it would have been easy to recode the procedure to
deal with such an event by making a subtle alteration to the trigger syntax, as shown here:
ALTER TRIGGER DataUpdate ON MainData
FOR UPDATE
AS
INSERT INTO ChangeLog(RowID, OldValue, NewValue, ChangeDate)
SELECT i.ID, d.Value, i.Value, GetDate()
FROM INSERTED i JOIN DELETED d ON i.ID = d.ID;
GO
Don’t Take Shortcuts
It is human nature to want to take shortcuts if we believe that they will allow us to avoid work that we
feel is unnecessary In programming terms, there are often shortcuts that provide a convenient, concise
way of achieving a given task in fewer lines of code than other, more standard methods However, these
shortcut methods can come with associated risks Most commonly, shortcut methods require less code
Trang 12change between different versions of SQL Server Taking shortcuts therefore reduces the portability of code, and introduces assumptions that can break in the future
To demonstrate, consider what happens when you CAST a value to a varchar datatype without explicitly declaring the appropriate data length:
SELECT CAST ('This example seems to work ok' AS varchar);
GO
The query appears to work correctly, and results in the following output:
This example seems to work ok
It seems to be a common misunderstanding among some developers that omitting the length for the varchar type as the target of a CAST operation results in SQL Server dynamically assigning a length sufficient to accommodate all of the characters of the input However, this is not the case, as
demonstrated in the following code listing:
SELECT CAST ('This demonstrates the problem of relying on default datatype length'
AS varchar);
GO
This demonstrates the problem
If not explicitly specified, when CASTing to a character datatype, SQL Server defaults to a length of 30 characters In the second example, the input string is silently truncated to 30 characters, even though there is no obvious indication in the code to this effect If this was the intention, it would have been much clearer to explicitly state varchar(30) to draw attention to the fact that this was a planned
truncation, rather than simply omitting the data length
Another example of a shortcut sometimes made is to rely on implicit CASTs between datatypes Consider the following code listing:
Trang 13Now let’s suppose that management makes a decision to change the calculation used to determine
@Rate, and increases the scale factor from 1.9 to 2 The obvious (but incorrect) solution would be to
amend the code as follows:
Rather than increasing the rate as intended, the change has actually negated the effect of applying
any rate to the supplied value of 1000 The problem now is that the sum used to determine @Rate is a
purely integer calculation, 2 * 5 / 9 In integer mathematics, this equates to 1 In the previous example,
the hard-coded value of 1.9 caused an implicit cast of both @x and @y parameters to the decimal type, so
the sum was calculated with decimal precision
This example may seem trivial when considered in isolation, but can be a source of unexpected
behavior and unnecessary bug-chasing when nested deep in the belly of some complex code To avoid
these complications, it is always best to explicitly state the type and precision of any parameters used in
a calculation, and avoid implicit CASTs between them
Another problem with using shortcuts is that they can obscure what the developer intended the
purpose of the code to be If we cannot tell what a line of code is meant to do, it is incredibly hard to test
whether it is achieving its purpose or not Consider the following code listing:
DECLARE @Date datetime = '03/05/1979';
SELECT @Date + 365;
At first sight, this seems fairly innocuous: take a specific date and add 365 But there are actually several
shortcuts used here that add ambiguity as to what the intended purpose of this code is:
The first shortcut is in the implicit CAST from the string value '03/05/1979' to a
datetime As I’m sure you know, there are numerous ways of presenting date
formats around the world, and 03/05/1979 is ambiguous In the United Kingdom it
means the 3rd of May, but to American readers it means the 5th of March The
result of the implicit cast will depend upon the locale of the server on which the
function is performed
Even if the dd/mm/yyyy or mm/dd/yyyy ordering is resolved, there is still
ambiguity regarding the input value The datatype chosen is datetime, which stores
both a date and time component, but the value assigned to @Date does not specify a