184 CHAPTER 10: THINKING IN SQL All of these examples are based on actual postings in a newsgroup that have been translated into SQL/PSM to remove proprietary features.. 10.1 Bad Program
Trang 2C H A P T E R
10
Thinking in SQL
“It ain’t so much the things we don’t know that get us into trouble It’s the
thing we know that just ain’t so.”
—Artemus Ward (Charles Farrar Browne),
American humorist (1834–1867)
THE BIGGEST HURDLE in learning SQL is thinking in sets and logic, instead of
in sequences and processes I just gave you a list of heuristics in the previous chapter, but let’s take a little time to analyze why mistakes were made You now have some theory, but can you do diagnostics?
I tried to find common errors that new programmers make, but perhaps the most difficult thing to learn is thinking in sets Consider the classic puzzle shown in Figure 10.1
The usual mistake people make is trying to count the 1 × 1 × 2 bricks one at a time This requires the ability to make a three-dimensional mental model of the boxes, which is really difficult for most of us
The right approach is to look at the whole block as if it were completely filled in It is 4 × 5 × 5 units, or 50 bricks The corner that
is knocked off is 3 bricks, which we can count individually, so we must have 47 bricks in the block The arrangement inside the block does not matter at all
Trang 3184 CHAPTER 10: THINKING IN SQL
All of these examples are based on actual postings in a newsgroup that have been translated into SQL/PSM to remove proprietary features
In some cases, I have cleaned up the data element names, and in others I have left them Obviously, I am guessing at motivation for each example, but I think I can defend my reasoning
10.1 Bad Programming in SQL and Procedural Languages
As an example of not learning any relational approaches to a problem, consider a posting in the comp.databases.ms-sqlserver newsgroup in January 2005: The title was “How to Find a Hole in Records,” which already tells you that the poster is thinking in terms of a file system and not an RDBMS
The original table declaration had the usual newbie “id” column, without a key or any constraints The table modeled a year’s worth of rows identified by a week-within-year number (1 to 53) and a day-of-the-week number (1 to 7) Thus, we started with a table that looked more or less like this, after the names were cleaned up:
CREATE TABLE WeeklyReport (id INTEGER AUTONUMBER NOT NULL,—not valid SQL!
week_nbr INTEGER NOT NULL, day_nbr INTEGER NOT NULL);
Figure 10.1
Classic block
puzzle.
Trang 4By removing the useless, proprietary id column and adding constraints, we then had the following table:
CREATE TABLE WeeklyReport (week_nbr INTEGER NOT NULL CHECK(week_nbr BETWEEN 1 AND 53), day_nbr INTEGER NOT NULL
CHECK(day_nbr BETWEEN 1 AND 7), PRIMARY KEY(week_nbr, day_nbr));
Despite giving some constraints in the narrative specification, the poster never bothered to apply them to the table declaration Newbies think of a table as a file, not as a set The only criteria that data needs to
be put into a file is that it is written to that file The file cannot validate anything The proprietary auto-number acts to replace a nonrelational record number in a sequential file system
The problem was to find the earliest missing day within each week for inserting a new row If there were some other value or measurement for that date being recorded, it was not in the specifications The poster’s own T-SQL solution translated in SQL/PSM like this, with some name changes:
CREATE FUNCTION InsertNewWeekDay (IN my_week_nbr_nbr INTEGER) RETURNS INTEGER
LANGUAGE SQL BEGIN DECLARE my_day_nbr INTEGER;
DECLARE result_day_nbr INTEGER;
SET my_day_nbr = 1;
xx:
WHILE my_day_nbr < 8
DO IF NOT EXISTS (SELECT * FROM WeeklyReport WHERE day_nbr = my_day_nbr AND week_nbr = my_week_nbr_nbr) THEN BEGIN
SET result_day_nbr = my_day_nbr;
LEAVE xx;
END;
ELSE BEGIN SET my_day_nbr = my_day_nbr + 1;
Trang 5186 CHAPTER 10: THINKING IN SQL
ITERATE xx;
END;
END IF;
END WHILE;
RETURN result_day_nbr;
END;
This is a classic imitation of a FOR loop, or counting loop, used in all 3GL programming languages However, if you look at it for two seconds, you will see that this is bad procedural programming! SQL will not make
up for a lack of programming skills In fact, the bad effects of mimicking 3GL languages in SQL are magnified The optimizers and compilers in SQL engines are not designed to look for procedural code optimizations
By removing the redundant local variables and getting rid of the hidden GOTO statements in favor of a simple, classic structured design, the poster should have written this:
CREATE FUNCTION InsertNewWeekDay (IN my_week_nbr INTEGER) RETURNS INTEGER
LANGUAGE SQL BEGIN
DECLARE answer_nbr INTEGER;
SET answer_nbr = 1;
WHILE answer_nbr < 8
DO IF NOT EXISTS (SELECT * FROM WeeklyReport WHERE day_number = answer_nbr AND week_nbr = my_week_nbr) THEN RETURN answer_nbr;
ELSE SET answer_nbr = answer_nbr + 1;
END IF;
END WHILE;
RETURN CAST (NULL AS INTEGER);—cause an error END;
This points out another weakness in this posting We were not told how to handle a week that has all seven days represented In the original table design, any integer value would have been accepted because of the lack of constraints In the revised DDL, any weekday value not between 1 and 7 will cause a primary-key violation This is not the best solution,