Best Practices for Database Programming

Best Practices for Database Programming Software development is not just a practical discipline performed by coders, but also an area of academic research and theory.. Defensive Program

Trang 1

Best Practices for

Database Programming

Software development is not just a practical discipline performed by coders, but also an area of

academic research and theory There is now a great body of knowledge concerning software

development, and lengthy academic papers have been written to propose, dissect, and discuss different approaches to development Various methodologies have emerged, including test-driven development (TDD), agile and extreme programming (XP), and defensive programming, and there have been

countless arguments concerning the benefits afforded by each of these schools of thought

The practices described in this chapter, and the approach taken throughout the rest of this book, are most closely aligned with the philosophy of defensive programming However, the topics discussed here can be applied just as readily in any environment While software theorists may argue the finer

differences between different methodologies (and undoubtedly, they do differ in some respects), when it comes down to it, the underlying features of good programming remain the same whatever

methodology you apply

I do not intend to provide an exhaustive, objective guide as to what constitutes best practice, but

rather to highlight some of the standards that I believe demonstrate the level of professionalism that

database developers require in order to do a good job I will present the justification of each argument

from a defensive point of view, but remember that they are generally equally valid in other

environments

Defensive Programming

Defensive programming is a methodology used in software development that suggests that developers should proactively anticipate and make allowances for (or “defend against”) unforeseen future events The objective of defensive programming is to create applications that can remain robust and effective, even when faced with unexpected situations

Defensive programming essentially involves taking a pessimistic view of the world—if something

can go wrong, it will: network resources will become unavailable halfway through a transaction; required files will be absent or corrupt; users will input data in any number of ways different from that expected, and so on Rather than leave anything to chance, a defensive programmer will have predicted the

possibility of these eventualities, and will have written appropriate handling code to check for and deal with these situations This means that potential error conditions can be detected and handled before an

Trang 2

cases, it may be possible to identify and isolate a particular component responsible for a failure, allowing the rest of the application to continue functioning

There is no definitive list of defensive programming practices, but adopting a defensive stance to development is generally agreed to include the following principles:

• Keep things simple (or KISS—keep it simple, stupid) Applications are not made

powerful and effective by their complexity, but by their elegant simplicity

Complexity allows bugs to be concealed, and should be avoided in both application design and in coding practice itself

• “If it ain’t broke, fix it anyway.” Rather than waiting for things to break, defensive

programming encourages continuous, proactive testing and future-proofing of an application against possible breaking changes in the future

• Be challenging, thorough, and cautious at all stages and development “What if?”

analyses should be conducted in order to identify possible exceptional scenarios that might occur during normal (and abnormal) application usage

• Extensive code reviews and testing should be conducted with different peer

groups, including other developers or technical teams, consultants, end users, and management Each of these different groups may have different implicit

assumptions that might not be considered by a closed development team

• Assumptions should be avoided wherever possible If an application requires a

certain condition to be true in order to function correctly, there should be an explicit assertion to this effect, and relevant code paths should be inserted to check and act accordingly based on the result

• Applications should be built from short, highly cohesive, loosely coupled modules

Modules that are well encapsulated in this way can be thoroughly tested in isolation, and then confidently reused throughout the application Reusing specific code modules, rather than duplicating functionality, reduces the chances

of introducing new bugs

Throughout the remainder of this chapter, I'll be providing simple examples of what I believe to be best practices demonstrating each of these principles, and these concepts will be continually

reexamined in later chapters of this book

Attitudes to Defensive Programming

The key advantages of taking a defensive approach to programming are essentially twofold:

• Defensive applications are typically robust and stable, require fewer essential bug

fixes, and are more resilient to situations that may otherwise lead to expensive failures or crashes As a result, they have a long expected lifespan, and relatively cheap ongoing maintenance costs

• In many cases, defensive programming can lead to an improved user experience

By actively foreseeing and allowing for exceptional circumstances, errors can be

caught before they occur, rather than having to be handled afterward Exceptions

can be isolated and handled with a minimum negative effect on user experience, rather than propagating an entire system failure Even in the case of extreme

Trang 3

unexpected conditions being encountered, the system can still degrade gracefully

and act according to documented behavior

However, as with any school of thought, defensive programming is not without its opponents Some

of the criticisms commonly made of defensive coding are listed following In each case, I’ve tried to give

a reasoned response to each criticism

Defensive code takes longer to develop

It is certainly true that following a defensive methodology can result in a longer up-front development

time when compared to applications developed following other software practices Defensive

programming places a strong emphasis on the initial requirements-gathering and architecture design

phases, which may be longer and more involved than in some methodologies Coding itself takes longer

because additional code paths may need to be added to handle checks and assertions of assumptions

Code must be subjected to an extensive review that is both challenging and thorough, and then must

undergo rigorous testing All these factors contribute to the fact that the overall development and release

cycle for defensive software is longer than in other approaches

There is a particularly stark contrast between defensive programming and so-called “agile”

development practices, which focus on releasing frequent iterative changes on a very accelerated

development and release cycle However, this does not necessarily mean that defensive code takes

longer to develop when considered over the full life cycle of an application The additional care and

caution invested in code at the initial stages of development are typically paid back over the life of the

project, because there is less need for code fixes to be deployed once the project has gone live

Writing code that anticipates and handles every possible scenario makes defensive

applications bloated

Code bloat suggests that an application contains unnecessary, inefficient, or wasteful code Defensive

code protects against events that may be unlikely to happen, but that certainly doesn’t mean that they

can’t happen Taking actions to explicitly test for and handle exceptional circumstances up front can

save lots of hours spent possibly tracing and debugging in the future Defensive applications may

contain more total lines of code than other applications, but all of that code should be well designed,

with a clear purpose Note that the label of “defensive programming” is sometimes misused: the

addition of unnecessary checks at every opportunity without consideration or justification is not

defensive programming Such actions lead to code that is both complex and rigid Remember that true

defensive programming promotes simplicity, modularization, and code reuse, which actually reduces

code bloat

Defensive programming hides bugs that then go unfixed, rather than making them

visible

This is perhaps the most common misconception applied to defensive practices, which manifests from a

failure to understand the fundamental attitude toward errors in defensive applications By explicitly

Trang 4

to handle them To demonstrate this in practical terms, consider the following code listing, which describes a simple stored procedure to divide one number by another:

CREATE PROCEDURE Divide (

However, it is important to realize that the preceding code listing is not defensive—it does nothing

to prevent the exceptional circumstance from occurring, and its only effect is to allow the system to

continue operating, pretending that nothing bad has happened Exception hiding such as this can be

very dangerous, and makes it almost impossible to ensure the correct functioning of an application The defensive approach would be, before attempting to perform the division, to explicitly check that all the requirements for that operation to be successful are met This means asserting such things as making sure that values for @x and @y are supplied (i.e., they are not NULL), that @y is not equal to zero, that the supplied values lie within the range that can be stored within the decimal(18,2) datatype, and so on The following code listing provides a simplified defensive approach to this same procedure:

ALTER PROCEDURE Divide (

Trang 5

For the purposes of the preceding example, each assertion was accompanied by a simple PRINT

statement to advise which of the conditions necessary for the procedure to execute failed In real life,

these code paths may handle such assertions in a number of ways—typically logging the error, reporting

a message to the user, and attempting to continue system operation if it is possible to do so In doing so,

they prevent the kind of unpredictable behavior associated with an exception that has not been

expected

Defensive programming can be contrasted to the fail fast methodology, which focuses on

immediate recognition of any errors encountered by causing the application to halt whenever an

exception occurs Just because the defensive approach doesn’t espouse ringing alarm bells and flashing

lights doesn’t mean that it hides errors—it just reports them more elegantly to the end user and, if

possible, continues operation of the core part of the system

Why Use a Defensive Approach to Database Development?

As stated previously, defensive programming is not the only software development methodology that

can be applied to database development Other common approaches include TDD, XP, and fail-fast

development So why have I chosen to focus on just defensive programming in this chapter, and

throughout this book in general? I believe that defensive programming is the most appropriate approach

for database development for the following reasons:

Database applications tend to have a longer expected lifespan than other

software applications Although it may be an overused stereotype to suggest that

database professionals are the sensible, fastidious people of the software

development world, the fact is that database development tends to be more

slow-moving and cautious than other technologies Web applications, for example, may

be revised and relaunched on a nearly annual basis, in order to take advantage of

whatever technology is current at the time In contrast, database development

tends to be slow and steady, and a database application may remain current for

many years without any need for updating from a technological point of view As a

result, it is easier to justify the greater up-front development cost associated with

defensive programming The benefits of reliability and bug resistance will typically

Trang 6

the habit of hitting Ctrl+Alt+Delete to reset their machine when a web browser

hangs, or because some application fails to shut down correctly However, the

same tolerance that is shown to personal desktop software is not typically extended

to corporate database applications Recent highly publicized scandals in which

bugs have been exploited in the systems of several governments and large

organizations have further heightened the general public’s ultrasensitivity toward

anything that might present a risk to database integrity

Any bugs that do exist in database applications can have more severe

consequences than in other software It can be argued that people are absolutely

right to be more worried about database bugs than bugs in other software An

unexpected error in a desktop application may lead to a document or file becoming

corrupt, which is a nuisance and might lead to unnecessary rework But an

unexpected error in a database may lead to important personal, confidential, or

sensitive data being placed at risk, which can have rather more serious

consequences The nature of data typically stored in a database warrants a

cautious, thorough approach to development, such as defensive programming

provides

Designing for Longevity

Consumer software applications have an increasingly short expected shelf life, with compressed release cycles pushing out one release barely before the predecessor has hit the shelves However, this does not have to be the case Well-designed, defensively programmed applications can continue to operate for many years In one organization I worked for, a short-term tactical management information data store was created so that essential business reporting functions could continue while the organization’s systems went through an integration following a merger Despite only being required for an immediate post-merger period, the (rather unfortunately named) Short Term Management Information database continued to be used for up to ten years later, as it remained more reliable and robust than subsequent attempted

replacements

And let that be a lesson in choosing descriptive names for your databases that won’t age with time!

Best Practice SQL Programming Techniques

Having looked at some of the theory behind different software methodologies, and in particular the defensive approach to programming, you’re now probably wondering about how to put this into practice As in any methodology, defensive programming is more concerned with the mindset with which you should approach development than prescribing a definitive set of rules to follow As a result, this section will only provide examples that illustrate the overall concepts involved, and should not be treated as an exhaustive list I’ll try to keep the actual examples as simple as possible in every case, so that you can concentrate on the reasons I consider these to be best practices, rather than the code itself

Trang 7

Identify Hidden Assumptions in Your Code

One of the core tenets of defensive programming is to identify all of the assumptions that lie behind the

proper functioning of your code Once these assumptions have been identified, the function can either

be adjusted to remove the dependency on them, or explicitly test each condition and make provisions

should it not hold true In some cases, “hidden” assumptions exist as a result of code failing to be

sufficiently explicit

To demonstrate this concept, consider the following code listing, which creates and populates a

Customers and an Orders table:

CREATE TABLE Customers(

CustID int,

Name varchar(32),

Address varchar(255));

INSERT INTO Customers(CustID, Name, Address) VALUES

(1, 'Bob Smith', 'Flat 1, 27 Heigham Street'),

(2, 'Tony James', '87 Long Road');

Trang 8

The query executes successfully and we get the results expected:

Bob Smith Flat 1, 27 Heigham Street 1

Bob Smith Flat 1, 27 Heigham Street 2

Tony James 87 Long Road 3

But what is the hidden assumption? The column names listed in the SELECT query were not qualified with table names, so what would happen if the table structure were to change in the future? Suppose that an Address column were added to the Orders table to enable a separate delivery address to be attached to each order, rather than relying on the address in the Customers table:

ALTER TABLE Orders ADD Address varchar(255);

GO

The unqualified column name, Address, specified in the SELECT query, is now ambiguous, and if we attempt to run the original query again we receive an error:

Msg 209, Level 16, State 1, Line 1

Ambiguous column name 'Address'

By not recognizing and correcting the hidden assumption contained in the original code, the query subsequently broke as a result of the additional column being added to the Orders table The simple practice that could have prevented this error would have been to ensure that all column names were prefixed with the appropriate table name or alias:

Suppose that you had a table, MainData, containing some simple values, as shown in the following code listing:

CREATE TABLE MainData(

ID int,

Value char(3));

GO

Trang 9

INSERT INTO MainData(ID, Value) VALUES

(1, 'abc'), (2, 'def'), (3, 'ghi'), (4, 'jkl');

GO

Now suppose that every change made to the MainData table was to be recorded in an associated

ChangeLog table The following code demonstrates this structure, together with a mechanism to

automatically populate the ChangeLog table by means of an UPDATE trigger attached to the MainData table:

CREATE TABLE ChangeLog(

ChangeID int IDENTITY(1,1),

DECLARE @ID int;

SELECT @ID = ID FROM INSERTED;

DECLARE @OldValue varchar(32);

SELECT @OldValue = Value FROM DELETED;

DECLARE @NewValue varchar(32);

SELECT @NewValue = Value FROM INSERTED;

INSERT INTO ChangeLog(RowID, OldValue, NewValue, ChangeDate)

VALUES(@ID, @OldValue, @NewValue, GetDate());

GO

We can test the trigger by running a simple UPDATE query against the MainData table:

UPDATE MainData SET Value = 'aaa' WHERE ID = 1;

GO

The query appears to be functioning correctly—SQL Server Management Studio reports the following:

(1 row(s) affected)

Trang 10

And, as expected, we find that one row has been updated in the MainData table:

and an associated row has been created in the ChangeLog table:

ChangeID RowID OldValue NewValue ChangeDate

1 1 abc aaa 2009-06-15 14:11:09.770

However, once again, there is a hidden assumption in the code Within the trigger logic, the

variables @ID, @OldValue, and @NewValue are assigned values that will be inserted into the ChangeLog table Clearly, each of these scalar variables can only be assigned a single value, so what would happen if you were to attempt to update two or more rows in a single statement?

UPDATE MainData SET Value = 'zzz' WHERE ID IN (2,3,4);

Trang 11

The result in this case is that all three rows affected by the UPDATE statement have been changed in

the MainData table:

but only the first update has been logged:

ChangeID RowID OldValue NewValue ChangeDate

1 1 abc aaa 2009-06-15 14:11:09.770

2 2 def zzz 2009-06-15 15:18:11.007

The failure to foresee the possibility of multiple rows being updated in a single statement led to a

silent failure on this occasion, which is much more dangerous than the overt error given in the previous

example Had this scenario been actively considered, it would have been easy to recode the procedure to

deal with such an event by making a subtle alteration to the trigger syntax, as shown here:

ALTER TRIGGER DataUpdate ON MainData

FOR UPDATE

AS

INSERT INTO ChangeLog(RowID, OldValue, NewValue, ChangeDate)

SELECT i.ID, d.Value, i.Value, GetDate()

FROM INSERTED i JOIN DELETED d ON i.ID = d.ID;

GO

Don’t Take Shortcuts

It is human nature to want to take shortcuts if we believe that they will allow us to avoid work that we

feel is unnecessary In programming terms, there are often shortcuts that provide a convenient, concise

way of achieving a given task in fewer lines of code than other, more standard methods However, these

shortcut methods can come with associated risks Most commonly, shortcut methods require less code

Trang 12

change between different versions of SQL Server Taking shortcuts therefore reduces the portability of code, and introduces assumptions that can break in the future

To demonstrate, consider what happens when you CAST a value to a varchar datatype without explicitly declaring the appropriate data length:

SELECT CAST ('This example seems to work ok' AS varchar);

GO

The query appears to work correctly, and results in the following output:

This example seems to work ok

It seems to be a common misunderstanding among some developers that omitting the length for the varchar type as the target of a CAST operation results in SQL Server dynamically assigning a length sufficient to accommodate all of the characters of the input However, this is not the case, as

demonstrated in the following code listing:

SELECT CAST ('This demonstrates the problem of relying on default datatype length'

AS varchar);

GO

This demonstrates the problem

If not explicitly specified, when CASTing to a character datatype, SQL Server defaults to a length of 30 characters In the second example, the input string is silently truncated to 30 characters, even though there is no obvious indication in the code to this effect If this was the intention, it would have been much clearer to explicitly state varchar(30) to draw attention to the fact that this was a planned

truncation, rather than simply omitting the data length

Another example of a shortcut sometimes made is to rely on implicit CASTs between datatypes Consider the following code listing:

Trang 13

Now let’s suppose that management makes a decision to change the calculation used to determine

@Rate, and increases the scale factor from 1.9 to 2 The obvious (but incorrect) solution would be to

amend the code as follows:

Rather than increasing the rate as intended, the change has actually negated the effect of applying

any rate to the supplied value of 1000 The problem now is that the sum used to determine @Rate is a

purely integer calculation, 2 * 5 / 9 In integer mathematics, this equates to 1 In the previous example,

the hard-coded value of 1.9 caused an implicit cast of both @x and @y parameters to the decimal type, so

the sum was calculated with decimal precision

This example may seem trivial when considered in isolation, but can be a source of unexpected

behavior and unnecessary bug-chasing when nested deep in the belly of some complex code To avoid

these complications, it is always best to explicitly state the type and precision of any parameters used in

a calculation, and avoid implicit CASTs between them

Another problem with using shortcuts is that they can obscure what the developer intended the

purpose of the code to be If we cannot tell what a line of code is meant to do, it is incredibly hard to test

whether it is achieving its purpose or not Consider the following code listing:

DECLARE @Date datetime = '03/05/1979';

SELECT @Date + 365;

At first sight, this seems fairly innocuous: take a specific date and add 365 But there are actually several

shortcuts used here that add ambiguity as to what the intended purpose of this code is:

The first shortcut is in the implicit CAST from the string value '03/05/1979' to a

datetime As I’m sure you know, there are numerous ways of presenting date

formats around the world, and 03/05/1979 is ambiguous In the United Kingdom it

means the 3rd of May, but to American readers it means the 5th of March The

result of the implicit cast will depend upon the locale of the server on which the

function is performed

Even if the dd/mm/yyyy or mm/dd/yyyy ordering is resolved, there is still

ambiguity regarding the input value The datatype chosen is datetime, which stores

both a date and time component, but the value assigned to @Date does not specify a

Tiêu đề	Best practices for database programming
Trường học	Standard University
Chuyên ngành	Computer Science
Thể loại	Essay
Thành phố	Standard City

Định dạng
Số trang	26
Dung lượng	192,75 KB