Joe Celko s SQL for Smarties - Advanced SQL Programming P13 pptx

A pharmaceutical company has an inventory table, and price changes table that look like this: CREATE TABLE Drugs drug_nbr INTEGER NOT NULL PRIMARY KEY, drug_name CHAR30 NOT NULL, drug_q

Trang 1

Consider this actual problem, which appeared on CompuServe’s ORACLE forum some years ago A pharmaceutical company has an inventory table, and price changes table that look like this:

CREATE TABLE Drugs (drug_nbr INTEGER NOT NULL PRIMARY KEY, drug_name CHAR(30) NOT NULL,

drug_qty INTEGER NOT NULL CONSTRAINT positive_quantity CHECK(drug_qty >= 0),

);

CREATE TABLE Prices (drug_nbr INTEGER NOT NULL, start_date DATE NOT NULL, end_date DATE NOT NULL CONSTRAINT started_before_endded CHECK(start_date <= end_date), price DECIMAL(8,2) NOT NULL,

PRIMARY KEY (drug_nbr, start_date));

Every order has to use the order date to find what the selling price was when the order was placed The current price will have a value of

“eternity” (a dummy date set so high that it will not be reached, such as

‘9999-12-31’) The(end_date + INTERVAL '1' DAY) of one price will be equal to the start_date of the next price for the same drug While this is normalized, performance was bad Every report, invoice

or query will have a JOIN between Drugs and Prices The trick might be

to add more columns to the Drugs, like this:

CREATE TABLE Drugs (drug_nbr INTEGER PRIMARY KEY, drug_name CHAR(30) NOT NULL, drug_qty INTEGER NOT NULL CONSTRAINT positive_quantity CHECK(drug_qty >= 0),

current_start_date DATE NOT NULL, current_end_date DATE NOT NULL, CONSTRAINT current_start_before_endded CHECK(current_start_date <= current_end_date), current_price DECIMAL(8,2) NOT NULL,

Trang 2

prior_start_date DATE NOT NULL,

prior_end_date DATE NOT NULL,

CONSTRAINT prior_start_before_endded

CHECK(prior_start_date <= prior_end_date),

AND (current_start_date = prior_end_date + INTERVAL '1' DAY

prior_price DECIMAL(8,2) NOT NULL,

);

This covered more than 95% of the orders in the actual company, because very few orders have more than two price changes before they are taken out of stock The odd exception was trapped by a procedural routine

The other method is to add CHECK() constraints that will enforce the rules destroyed by denormalization We will discuss this later, but the overhead for insertion, updating, and deleting to the table is huge In fact, in many cases denormalized tables cannot be changed until a complete set of columns is built outside the table Furthermore, while one set of queries is improved, all others are damaged

Today, however, only data warehouses should be denormalized JOINs are far cheaper than they were, and the overhead of handling exceptions with procedural code is far greater than any extra database overhead

2.11.5 Row Sorting

On May 27, 2001, Fred Block posted a problem on the SQL Server Newsgroup I will change the problem slightly, but the idea was that he had a table with five character string columns that had to be sorted alphabetically within each row This “flatten table” denormalization is a very common one that might involve months of the year as columns, or other things that are acting as repeating groups in violation of 1NF Let’s declare the table and dive into the problem:

CREATE TABLE Foobar

(key_col INTEGER NOT NULL PRIMARY KEY,

c1 VARCHAR(20) NOT NULL,

c5 VARCHAR(20) NOT NULL);

This means that we want this condition to hold:

Trang 3

CHECK ((c1 <= c2) AND (c2 <= c3) AND (c3 <= c4) AND (c4 <= c5))

Obviously, if he had added this constraint to the table in the first place, we would be fine Of course, that would have pushed the problem

to the front end, and I would not have a topic for this section

What was interesting was how everyone who read this newsgroup posting immediately envisioned a stored procedure that would take the five values, sort them and return them to their original row in the table The only way to make this approach work for the whole table was to write an update cursor and loop through all the rows of the table Itzik Ben-Gan posted a simple procedure that loaded the values into a temporary table, then pulled them out in sorted order, starting with the minimum value, using a loop

Another trick is the Bose-Nelson sort (Bose-Nelson Sort, Dr Dobbs

Journal, September 1985, pp 282-296), which I had written about in Dr Dobb’s Journal back in 1985 This sort is a recursive procedure that takes

an integer and then generates swap pairs for a vector of that size A swap

pair is a pair of position numbers from 1 to (n) in the vector that need to

be exchanged if they are out of order These swap pairs are also related to

Sorting Networks in the literature (see Donald Knuth, Art of Computer

Programming, Volume 3: Sorting and Searching, 2nd Edition, April 24,

1998, ISBN: 0-201-89685-0)

You are probably thinking that this method is a bit weak, because the results are only good for sorting a fixed number of items But a table only has a fixed number of columns, so that is not a problem in denormalized SQL

You can set up a sorting network that will sort five items, with the minimal number of exchanges, nine swaps, like this:

Swap(c1, c2);

Swap(c4, c5);

Swap(c3, c5);

Swap(c3, c4);

Swap(c1, c4);

Swap(c1, c3);

Swap(c2, c5);

Swap(c2, c4);

Swap(c2, c3);

Trang 4

You might want to deal yourself a hand of five playing cards in one suit to see how it works Put the cards face down on the table and pick

up the pairs, swapping them if required, then turn over the row to see that it is in sorted order when you are done

In theory, the minimum number of swaps needed to sort (n) items is

CEILING (log2 (n!)), and as (n) increases, this approaches O(n*log2(n))

Computer science majors will remember this “Big O” expression as the expected performance of the best sorting algorithms, such as Quicksort

The Bose-Nelson method is very good for small values of (n) If (n < 9)

then it is perfect, actually But as things get bigger, Bose-Nelson

approaches O(n ^ 1.585) In English, this method is good for a fixed size

list of 16 or fewer items, but it goes to Hell after that

You can write a version of the Bose-Nelson procedure that will output

the SQL code for a given value of (n) The obvious direct way to do a

Swap() is to write a chain of UPDATE statements Remember that in SQL, the SET clause assignments happen in parallel, so you can easily write a SET clause that exchanges the two items when they are out of order Using the above swap chain, we get this block of code:

BEGIN ATOMIC

Swap(c1, c2);

UPDATE Foobar

SET c1 = c2, c2 = c1

WHERE c1 > c2;

Swap(c4, c5);

UPDATE Foobar

SET c4 = c5, c5 = c4

WHERE c4 > c5;

Swap(c3, c5);

UPDATE Foobar

SET c3 = c5, c5 = c3

WHERE c3 > c5;

Swap(c3, c4);

UPDATE Foobar

SET c3 = c4, c4 = c3

WHERE c3 > c4;

Swap(c1, c4);

Trang 5

UPDATE Foobar SET c1 = c4, c4 = c1 WHERE c1 > c4;

Swap(c1, c3);

Swap(c2, c5);

Swap(c2, c4);

Swap(c2, c3);

END;

This is fully portable, Standard SQL code, and it can be machine-generated But that parallelism is useful It is worthwhile to combine some of the UPDATE statements But you have to be careful not to change the effective sequence of the swap operations

If you look at the first two UPDATE statements, you can see that they

do not overlap This means you could roll them into one statement like this:

Swap(c1, c2) AND Swap(c4, c5);

UPDATE Foobar SET c1 = CASE WHEN c1 <= c2 THEN c1 ELSE c2 END, c2 = CASE WHEN c1 <= c2 THEN c2 ELSE c1 END, c4 = CASE WHEN c4 <= c5 THEN c4 ELSE c5 END, c5 = CASE WHEN c4 <= c5 THEN c5 ELSE c4 END WHERE c4 > c5 OR c1 > c2;

Trang 6

The advantage of doing this is that you have to execute only one UPDATE statement, not two Updating a table, even on nonkey

columns, usually locks the table and prevents other users from getting

to the data If you could roll the statements into one single UPDATE, you would have the best of all possible worlds, but I doubt that the code would be easy to read

We can see this same pattern in the pair of statements:

Swap(c1, c3);

Swap(c2, c5);

But there are other patterns, so you can write general templates for them Consider this one:

Swap(x, y);

Swap(x, z);

Write out all possible triplets and apply these two operations on them, thus:

(x, y, z) => (x, y, z)

(x, z, y) => (x, z, y)

(y, x, z) => (x, y, z)

(y, z, x) => (x, z, y)

(z, x, y) => (x, y, z)

(z, y, x) => (x, y, z)

The result of this pattern is that x is lowest value of the three values, and y and z either stay in the same relative position to each other or be

sorted properly Properly sorting them would have the advantage of saving exchanges later and also of reducing the set of the subset being operated upon by each UPDATE statement With a little thought, we can write the following symmetric piece of code

Swap(x, y) AND Swap(x, z);

UPDATE Foobar

SET x = CASE WHEN x BETWEEN y AND z THEN y

WHEN z BETWEEN y AND x THEN y

WHEN y BETWEEN z AND x THEN z

WHEN x BETWEEN z AND y THEN z

ELSE x END,

Trang 7

y = CASE WHEN x BETWEEN y AND z THEN x WHEN x BETWEEN z AND y THEN x WHEN z BETWEEN x AND y THEN z WHEN z BETWEEN y AND x THEN z ELSE y END,

z = CASE WHEN x BETWEEN z AND y THEN y WHEN z BETWEEN x AND y THEN y WHEN y BETWEEN z AND x THEN x WHEN z BETWEEN y AND x THEN x ELSE z END

WHERE x > z OR x > y;

While it is very tempting to write more and more of these pattern templates, it might be more trouble than it is worth, because of increased maintenance and readability

Here is an SQL/PSM program for the Bose-Nelson sort, based on the

version given in Frederick Hegeman’s “Sorting Networks” article for The

C/C++ User’s Journal (Hegeman 1993) It assumes that you have a

procedure called PRINT() for output to a text file You can translate it into the programming language of your choice easily, as long as it supports recursion

CREATE PROCEDURE BoseSort (IN i INTEGER, IN j INTEGER) LANGUAGE SQL

DETERMINISTIC BEGIN

DECLARE m INTEGER;

IF j > i THEN SET m = i + (j-i+1)/2 -1;

CALL BoseSort(i,m);

CALL BoseSort(m+1, j);

CALL BoseMerge(i, m, m+1, j);

END IF;

END;

CREATE PROCEDURE BoseMerge (IN i1 INTEGER, IN i2 INTEGER, IN 'j1' INTEGER, IN 'j2' INTEGER)

LANGUAGE SQL DETERMINISTIC BEGIN

DECLARE i_mid INTEGER;

Trang 8

DECLARE j_mid INTEGER;

IF i2 = i1 AND 'j2' = 'j1'

THEN CALL PRINT('swap', i1, 'j1');

ELSE IF i2 = i1+1 AND 'j2' = 'j1'

CALL PRINT('swap', i2, 'j1');

ELSE IF i2 = i1+1 AND 'j2' = 'j1'+1

CALL PRINT('swap', i1, 'j1');

ELSE SET i_mid = i1 + (i2-i1+1)/2 - 1;

IF MOD((i2-i1+1),2) = 0 AND i2-i1 <> 'j2'-'j1' THEN SET j_mid = ('j1' + 'j2'-'j1')/2 -1;

CALL BoseMerge(i1, i_mid, 'j1', j_mid); CALL BoseMerge(ii_mid+1, i2, j_mid+1, 'j2'); CALL BoseMerge(ii_mid+1, i2, 'j1', j_mid); END IF;

END IF;

END;

Trang 10

C H A P T E R

3

Numeric Data in SQL

S QL IS NOT A computational or procedural language; the arithmetic capability of SQL is weaker than that of any other language you have ever used But there are some tricks that you need to know when working with numbers in SQL and when passing them to a host program Much of the arithmetic and the functions are defined by implementations, so you should experiment with your particular product and make notes on the defaults, precision, and tools in the math library of your database

You should also read Chapter 21, which deals with the related topic

of aggregate functions This chapter deals with the arithmetic that you would use across a row, instead of down a column; they are not quite the same

3.1 Numeric Types

The SQL Standard has a wide range of numeric types The idea is that any host language can find an SQL numeric type that matches one of its own

You will also find some vendor extensions in the numeric data types, the most common of which is MONEY This is really a DECIMAL

or NUMERIC data type, which also accepts and displays currency symbols in input and output

Định dạng
Số trang	10
Dung lượng	234,87 KB