A pharmaceutical company has an inventory table, and price changes table that look like this: CREATE TABLE Drugs drug_nbr INTEGER NOT NULL PRIMARY KEY, drug_name CHAR30 NOT NULL, drug_q
Trang 1Consider this actual problem, which appeared on CompuServe’s ORACLE forum some years ago A pharmaceutical company has an inventory table, and price changes table that look like this:
CREATE TABLE Drugs (drug_nbr INTEGER NOT NULL PRIMARY KEY, drug_name CHAR(30) NOT NULL,
drug_qty INTEGER NOT NULL CONSTRAINT positive_quantity CHECK(drug_qty >= 0),
);
CREATE TABLE Prices (drug_nbr INTEGER NOT NULL, start_date DATE NOT NULL, end_date DATE NOT NULL CONSTRAINT started_before_endded CHECK(start_date <= end_date), price DECIMAL(8,2) NOT NULL,
PRIMARY KEY (drug_nbr, start_date));
Every order has to use the order date to find what the selling price was when the order was placed The current price will have a value of
“eternity” (a dummy date set so high that it will not be reached, such as
‘9999-12-31’) The(end_date + INTERVAL '1' DAY) of one price will be equal to the start_date of the next price for the same drug While this is normalized, performance was bad Every report, invoice
or query will have a JOIN between Drugs and Prices The trick might be
to add more columns to the Drugs, like this:
CREATE TABLE Drugs (drug_nbr INTEGER PRIMARY KEY, drug_name CHAR(30) NOT NULL, drug_qty INTEGER NOT NULL CONSTRAINT positive_quantity CHECK(drug_qty >= 0),
current_start_date DATE NOT NULL, current_end_date DATE NOT NULL, CONSTRAINT current_start_before_endded CHECK(current_start_date <= current_end_date), current_price DECIMAL(8,2) NOT NULL,
Trang 2prior_start_date DATE NOT NULL,
prior_end_date DATE NOT NULL,
CONSTRAINT prior_start_before_endded
CHECK(prior_start_date <= prior_end_date),
AND (current_start_date = prior_end_date + INTERVAL '1' DAY
prior_price DECIMAL(8,2) NOT NULL,
);
This covered more than 95% of the orders in the actual company, because very few orders have more than two price changes before they are taken out of stock The odd exception was trapped by a procedural routine
The other method is to add CHECK() constraints that will enforce the rules destroyed by denormalization We will discuss this later, but the overhead for insertion, updating, and deleting to the table is huge In fact, in many cases denormalized tables cannot be changed until a complete set of columns is built outside the table Furthermore, while one set of queries is improved, all others are damaged
Today, however, only data warehouses should be denormalized JOINs are far cheaper than they were, and the overhead of handling exceptions with procedural code is far greater than any extra database overhead
2.11.5 Row Sorting
On May 27, 2001, Fred Block posted a problem on the SQL Server Newsgroup I will change the problem slightly, but the idea was that he had a table with five character string columns that had to be sorted alphabetically within each row This “flatten table” denormalization is a very common one that might involve months of the year as columns, or other things that are acting as repeating groups in violation of 1NF Let’s declare the table and dive into the problem:
CREATE TABLE Foobar
(key_col INTEGER NOT NULL PRIMARY KEY,
c1 VARCHAR(20) NOT NULL,
c2 VARCHAR(20) NOT NULL,
c3 VARCHAR(20) NOT NULL,
c4 VARCHAR(20) NOT NULL,
c5 VARCHAR(20) NOT NULL);
This means that we want this condition to hold:
Trang 3CHECK ((c1 <= c2) AND (c2 <= c3) AND (c3 <= c4) AND (c4 <= c5))
Obviously, if he had added this constraint to the table in the first place, we would be fine Of course, that would have pushed the problem
to the front end, and I would not have a topic for this section
What was interesting was how everyone who read this newsgroup posting immediately envisioned a stored procedure that would take the five values, sort them and return them to their original row in the table The only way to make this approach work for the whole table was to write an update cursor and loop through all the rows of the table Itzik Ben-Gan posted a simple procedure that loaded the values into a temporary table, then pulled them out in sorted order, starting with the minimum value, using a loop
Another trick is the Bose-Nelson sort (Bose-Nelson Sort, Dr Dobbs
Journal, September 1985, pp 282-296), which I had written about in Dr Dobb’s Journal back in 1985 This sort is a recursive procedure that takes
an integer and then generates swap pairs for a vector of that size A swap
pair is a pair of position numbers from 1 to (n) in the vector that need to
be exchanged if they are out of order These swap pairs are also related to
Sorting Networks in the literature (see Donald Knuth, Art of Computer
Programming, Volume 3: Sorting and Searching, 2nd Edition, April 24,
1998, ISBN: 0-201-89685-0)
You are probably thinking that this method is a bit weak, because the results are only good for sorting a fixed number of items But a table only has a fixed number of columns, so that is not a problem in denormalized SQL
You can set up a sorting network that will sort five items, with the minimal number of exchanges, nine swaps, like this:
Swap(c1, c2);
Swap(c4, c5);
Swap(c3, c5);
Swap(c3, c4);
Swap(c1, c4);
Swap(c1, c3);
Swap(c2, c5);
Swap(c2, c4);
Swap(c2, c3);
Trang 4You might want to deal yourself a hand of five playing cards in one suit to see how it works Put the cards face down on the table and pick
up the pairs, swapping them if required, then turn over the row to see that it is in sorted order when you are done
In theory, the minimum number of swaps needed to sort (n) items is
CEILING (log2 (n!)), and as (n) increases, this approaches O(n*log2(n))
Computer science majors will remember this “Big O” expression as the expected performance of the best sorting algorithms, such as Quicksort
The Bose-Nelson method is very good for small values of (n) If (n < 9)
then it is perfect, actually But as things get bigger, Bose-Nelson
approaches O(n ^ 1.585) In English, this method is good for a fixed size
list of 16 or fewer items, but it goes to Hell after that
You can write a version of the Bose-Nelson procedure that will output
the SQL code for a given value of (n) The obvious direct way to do a
Swap() is to write a chain of UPDATE statements Remember that in SQL, the SET clause assignments happen in parallel, so you can easily write a SET clause that exchanges the two items when they are out of order Using the above swap chain, we get this block of code:
BEGIN ATOMIC
Swap(c1, c2);
UPDATE Foobar
SET c1 = c2, c2 = c1
WHERE c1 > c2;
Swap(c4, c5);
UPDATE Foobar
SET c4 = c5, c5 = c4
WHERE c4 > c5;
Swap(c3, c5);
UPDATE Foobar
SET c3 = c5, c5 = c3
WHERE c3 > c5;
Swap(c3, c4);
UPDATE Foobar
SET c3 = c4, c4 = c3
WHERE c3 > c4;
Swap(c1, c4);
Trang 5UPDATE Foobar SET c1 = c4, c4 = c1 WHERE c1 > c4;
Swap(c1, c3);
UPDATE Foobar SET c1 = c3, c3 = c1 WHERE c1 > c3;
Swap(c2, c5);
UPDATE Foobar SET c2 = c5, c5 = c2 WHERE c2 > c5;
Swap(c2, c4);
UPDATE Foobar SET c2 = c4, c4 = c2 WHERE c2 > c4;
Swap(c2, c3);
UPDATE Foobar SET c2 = c3, c3 = c2 WHERE c2 > c3;
END;
This is fully portable, Standard SQL code, and it can be machine-generated But that parallelism is useful It is worthwhile to combine some of the UPDATE statements But you have to be careful not to change the effective sequence of the swap operations
If you look at the first two UPDATE statements, you can see that they
do not overlap This means you could roll them into one statement like this:
Swap(c1, c2) AND Swap(c4, c5);
UPDATE Foobar SET c1 = CASE WHEN c1 <= c2 THEN c1 ELSE c2 END, c2 = CASE WHEN c1 <= c2 THEN c2 ELSE c1 END, c4 = CASE WHEN c4 <= c5 THEN c4 ELSE c5 END, c5 = CASE WHEN c4 <= c5 THEN c5 ELSE c4 END WHERE c4 > c5 OR c1 > c2;
Trang 6The advantage of doing this is that you have to execute only one UPDATE statement, not two Updating a table, even on nonkey
columns, usually locks the table and prevents other users from getting
to the data If you could roll the statements into one single UPDATE, you would have the best of all possible worlds, but I doubt that the code would be easy to read
We can see this same pattern in the pair of statements:
Swap(c1, c3);
Swap(c2, c5);
But there are other patterns, so you can write general templates for them Consider this one:
Swap(x, y);
Swap(x, z);
Write out all possible triplets and apply these two operations on them, thus:
(x, y, z) => (x, y, z)
(x, z, y) => (x, z, y)
(y, x, z) => (x, y, z)
(y, z, x) => (x, z, y)
(z, x, y) => (x, y, z)
(z, y, x) => (x, y, z)
The result of this pattern is that x is lowest value of the three values, and y and z either stay in the same relative position to each other or be
sorted properly Properly sorting them would have the advantage of saving exchanges later and also of reducing the set of the subset being operated upon by each UPDATE statement With a little thought, we can write the following symmetric piece of code
Swap(x, y) AND Swap(x, z);
UPDATE Foobar
SET x = CASE WHEN x BETWEEN y AND z THEN y
WHEN z BETWEEN y AND x THEN y
WHEN y BETWEEN z AND x THEN z
WHEN x BETWEEN z AND y THEN z
ELSE x END,
Trang 7y = CASE WHEN x BETWEEN y AND z THEN x WHEN x BETWEEN z AND y THEN x WHEN z BETWEEN x AND y THEN z WHEN z BETWEEN y AND x THEN z ELSE y END,
z = CASE WHEN x BETWEEN z AND y THEN y WHEN z BETWEEN x AND y THEN y WHEN y BETWEEN z AND x THEN x WHEN z BETWEEN y AND x THEN x ELSE z END
WHERE x > z OR x > y;
While it is very tempting to write more and more of these pattern templates, it might be more trouble than it is worth, because of increased maintenance and readability
Here is an SQL/PSM program for the Bose-Nelson sort, based on the
version given in Frederick Hegeman’s “Sorting Networks” article for The
C/C++ User’s Journal (Hegeman 1993) It assumes that you have a
procedure called PRINT() for output to a text file You can translate it into the programming language of your choice easily, as long as it supports recursion
CREATE PROCEDURE BoseSort (IN i INTEGER, IN j INTEGER) LANGUAGE SQL
DETERMINISTIC BEGIN
DECLARE m INTEGER;
IF j > i THEN SET m = i + (j-i+1)/2 -1;
CALL BoseSort(i,m);
CALL BoseSort(m+1, j);
CALL BoseMerge(i, m, m+1, j);
END IF;
END;
CREATE PROCEDURE BoseMerge (IN i1 INTEGER, IN i2 INTEGER, IN 'j1' INTEGER, IN 'j2' INTEGER)
LANGUAGE SQL DETERMINISTIC BEGIN
DECLARE i_mid INTEGER;
Trang 8DECLARE j_mid INTEGER;
IF i2 = i1 AND 'j2' = 'j1'
THEN CALL PRINT('swap', i1, 'j1');
ELSE IF i2 = i1+1 AND 'j2' = 'j1'
THEN CALL PRINT('swap', i1, 'j1');
CALL PRINT('swap', i2, 'j1');
ELSE IF i2 = i1+1 AND 'j2' = 'j1'+1
THEN CALL PRINT('swap', i1, 'j2');
CALL PRINT('swap', i1, 'j1');
ELSE SET i_mid = i1 + (i2-i1+1)/2 - 1;
IF MOD((i2-i1+1),2) = 0 AND i2-i1 <> 'j2'-'j1' THEN SET j_mid = ('j1' + 'j2'-'j1')/2 -1;
CALL BoseMerge(i1, i_mid, 'j1', j_mid); CALL BoseMerge(ii_mid+1, i2, j_mid+1, 'j2'); CALL BoseMerge(ii_mid+1, i2, 'j1', j_mid); END IF;
END IF;
END IF;
END IF;
END;
Trang 10C H A P T E R
3
Numeric Data in SQL
S QL IS NOT A computational or procedural language; the arithmetic capability of SQL is weaker than that of any other language you have ever used But there are some tricks that you need to know when working with numbers in SQL and when passing them to a host program Much of the arithmetic and the functions are defined by implementations, so you should experiment with your particular product and make notes on the defaults, precision, and tools in the math library of your database
You should also read Chapter 21, which deals with the related topic
of aggregate functions This chapter deals with the arithmetic that you would use across a row, instead of down a column; they are not quite the same
3.1 Numeric Types
The SQL Standard has a wide range of numeric types The idea is that any host language can find an SQL numeric type that matches one of its own
You will also find some vendor extensions in the numeric data types, the most common of which is MONEY This is really a DECIMAL
or NUMERIC data type, which also accepts and displays currency symbols in input and output