Listing 15.4 defines the sequence shown in Figure 15.4.. The SQL standard provides the built-in function NEXT VALUE FOR to increment a sequence value, as in: INSERT INTO shipment part_nu
Trang 1Listing 15.4 defines the sequence shown
in Figure 15.4 You can use a sequence
generator in a few ways The SQL standard
provides the built-in function NEXT VALUE FOR
to increment a sequence value, as in:
INSERT INTO shipment(
part_num,
desc,
quantity)
VALUES(
NEXT VALUE FOR part_seq,
‘motherboard’,
5);
If you’re creating a column of unique
values, you can use the keyword IDENTITY
to define a sequence right in the CREATE
TABLEstatement:
CREATE TABLE parts (
part_num INTEGER AS
IDENTITY(INCREMENT BY 1
MINVALUE 1 MAXVALUE 10000 START WITH 1
NO CYCLE), desc AS VARCHAR(100),
quantity INTEGER;
This table definition lets you omit NEXT
VALUE FORwhen you insert a row:
INSERT INTO shipment(
desc,
quantity)
VALUES(
‘motherboard’,
5);
SQL also provides ALTER SEQUENCEand
DROP SEQUENCEto change and remove
sequence generators
Listing 15.4 Create a sequence generator for the
consecutive integers 1 to 10,000 See Figure 15.4 for the result.
CREATE SEQUENCE part_seq INCREMENT BY 1 MINVALUE 1 MAXVALUE 10000 START WITH 1
NO CYCLE;
Listing
1 2 3
9998 9999 10000
Figure 15.4 The sequence that Listing 15.4 generates.
✔ Tip
■ Oracle, DB2, and PostgreSQL
support CREATE SEQUENCE,ALTER SEQUENCE, and DROP SEQUENCE In Oracle,
useNOCYCLEinstead of NO CYCLE See your DBMS documentation to see how sequences are used in your system Most DBMSs don’t support IDENTITY columns because they have other (pre-SQL:2003) ways that define columns with unique values See Table 3.18 in “Unique
Identifiers” in Chapter 3 PostgreSQL’s
generate_series()function offers a quick way to generate numbered rows
Trang 2A one-column table containing a sequence
of consecutive integers makes it easy to solve problems that would otherwise be difficult with SQL’s limited computational
power Sequence tables aren’t really part of
the data model—they’re auxiliary tables that are adjuncts to queries and other “real” tables You can create a sequence table by using one
of the methods just described Alternatively,
you can create one by using Listing 15.5,
which creates the sequence table seqby cross-joining the intermediate table temp09 with itself The CASTexpression concatenates digit characters into sequential numbers and then casts them as integers You can drop temp09after seqis created Figure 15.5
shows the result The table seqcontains the integer sequence 0, 1, 2, …, 9999 You can shrink or grow this sequence by changing the SELECTandFROMexpressions in the INSERT INTO seqstatement
Listing 15.5 Create a one-column table that contains
consecutive integers See Figure 15.5 for the result.
CREATE TABLE temp09 (
i CHAR(1) NOT NULL PRIMARY KEY
);
INSERT INTO temp09 VALUES('0');
INSERT INTO temp09 VALUES('1');
INSERT INTO temp09 VALUES('2');
INSERT INTO temp09 VALUES('3');
INSERT INTO temp09 VALUES('4');
INSERT INTO temp09 VALUES('5');
INSERT INTO temp09 VALUES('6');
INSERT INTO temp09 VALUES('7');
INSERT INTO temp09 VALUES('8');
INSERT INTO temp09 VALUES('9');
CREATE TABLE seq (
i INTEGER NOT NULL PRIMARY KEY
);
INSERT INTO seq
SELECT CAST(t1.i || t2.i ||
t3.i || t4.i AS INTEGER)
FROM temp09 t1, temp09 t2,
temp09 t3, temp09 t4;
DROP TABLE temp09;
Listing
i
-0
1
2
3
4
9996
9997
9998
9999
Figure 15.5 Result of Listing 15.5.
Trang 3A sequence table is especially useful for
enumerative and datetime functions
Listing 15.6 lists the 95 printable
charac-ters in the ASCII character set (if that’s the
character set in use) See Figure 15.6 for
the result
Listing 15.7 adds monthly intervals to
today’s date (7-March-2005) for the next six
months See Figure 15.7 for the result This
example works on Microsoft SQL Server;
the other DBMSs have similar functions that
increment dates
Sequence tables are handy for normalizing
data that you’ve imported from a
non-relational environment such as a spreadsheet
Suppose that you have the following
non-normalized table, named au_orders, showing
the order of the authors’ names on each
book’s cover:
title_id author1 author2 author3
———————— ——————— ——————— ———————
T01 A01 NULL NULL
T02 A01 NULL NULL
T03 A05 NULL NULL
T04 A03 A04 NULL
T05 A04 NULL NULL
T06 A02 NULL NULL
T07 A02 A04 NULL
T08 A06 NULL NULL
T09 A06 NULL NULL
T10 A02 NULL NULL
T11 A06 A03 A04
T12 A02 NULL NULL
T13 A01 NULL NULL
Listing 15.8 cross-joins au_orderswith seq
to produce Figure 15.8 You can DELETEthe
result rows with nulls in the column au_id,
leaving the result set looking like the table
title_authorsin the sample database
Note that Listing 15.8 does the reverse of
Listing 8.18 in Chapter 8
Listing 15.6 List the characters associated with a set
of character codes See Figure 15.6 for the result.
SELECT
i AS CharCode, CHR(i) AS Ch FROM seq WHERE i BETWEEN 32 AND 126;
Listing
CharCode Ch
32
33 !
34 "
35 #
36 $
37 %
38 &
39 '
40 (
41 )
42 *
43 +
44 ,
45
-46
47 /
48 0
49 1
50 2
51 3
52 4
Figure 15.6 Result of Listing 15.6.
Trang 4Listing 15.7 Increment today’s date to six months
hence, in one-month intervals See Figure 15.7 for
the result.
SELECT
i AS MonthsAhead,
DATEADD("m", i, CURRENT_TIMESTAMP)
AS FutureDate
FROM seq
WHERE i BETWEEN 1 AND 6;
Listing
MonthsAhead FutureDate
-
-1 2005-04-07
2 2005-05-07
3 2005-06-07
4 2005-07-07
5 2005-08-07
6 2005-09-07
Figure 15.7 Result of Listing 15.7.
Listing 15.8 Normalize the table au_orders See
Figure 15.8 for the result.
SELECT title_id,
(CASE WHEN i=1 THEN '1'
WHEN i=2 THEN '2'
WHEN i=3 THEN '3'
END) AS au_order,
(CASE WHEN i=1 THEN author1
WHEN i=2 THEN author2
WHEN i=3 THEN author3
END) AS au_id
FROM au_orders, seq
WHERE i BETWEEN 1 AND 3
ORDER BY title_id, i;
Listing
title_id au_order au_id - - -T01 1 A01 T01 2 NULL T01 3 NULL T02 1 A01 T02 2 NULL T02 3 NULL T03 1 A05 T03 2 NULL T03 3 NULL T04 1 A03 T04 2 A04 T04 3 NULL T05 1 A04 T05 2 NULL T05 3 NULL T06 1 A02 T06 2 NULL T06 3 NULL T07 1 A02 T07 2 A04 T07 3 NULL T08 1 A06 T08 2 NULL T08 3 NULL T09 1 A06 T09 2 NULL T09 3 NULL T10 1 A02 T10 2 NULL T10 3 NULL T11 1 A06 T11 2 A03 T11 3 A04 T12 1 A02 T12 2 NULL T12 3 NULL T13 1 A01 T13 2 NULL T13 3 NULL
Figure 15.8 Result of Listing 15.8.
Trang 5✔ Tips
■ If you have a column of sequential inte-gers that’s missing some numbers, you can fill in the gaps by EXCEPTing the column with a sequence column See
“Finding Different Rows with EXCEPT” earlier in this chapter
■ To run Listing 15.5 in Microsoft
Access and Microsoft SQL Server, change the CASTexpression to: t1.i + t2.i + t3.i + t4.i
To run Listing 15.5 in MySQL, change
the CASTexpression to:
CONCAT(t1.i, t2.i, t3.i, t4.i)
To run Listing 15.6 in Microsoft SQL
Server and MySQL, change CHR(i)
toCHAR(i)
To run Listing 15.8 in Microsoft Access,
change the CASEexpressions to Switch() function calls (see the DBMS Tip in
“Evaluating Conditional Values with CASE” in Chapter 5):
(Switch(i=1, ‘1’, i=2, ‘2’, i=3, ‘3’)) AS au_order, (Switch(i=1, author1, i=2, author2, i=3, author3)) AS au_id
Calendar Tables
Another useful auxiliary table is a calendar
table One type of calendar table has a
primary-key column that contains a row
for each calendar date (past and future)
and other columns that indicate the
date’s attributes: business day, holiday,
international holiday, fiscal-month end,
fiscal-year end, Julian date,
business-day offsets, and so on Another type of
calendar table stores the starting and
ending dates of events (in the columns
event_id,start_date, and end_date, for
example) Spreadsheets have more
date-arithmetic functions than DBMSs, so it
might be easier to build a calendar table
in a spreadsheet and then import it as a
database table
Even if your DBMS has plenty of
date-arithmetic functions, it might be faster to
look up data in a calendar table than to
call these functions in a query
Trang 6Finding Sequences, Runs, and Regions
A sequence is a series of consecutive values without gaps A run is like a sequence, but
the values don’t have to be consecutive, just increasing (that is, gaps are allowed)
A region is an unbroken series of values that
all are equal
Finding these series requires a table that has
at least two columns: a primary-key column that holds a sequence of consecutive inte-gers and a column that holds the values of interest The table temps(Listing 15.9 and
Figure 15.9) shows a series of high
temper-atures over 15 days
As a set-oriented language, SQL isn’t a good choice for finding series of values The fol-lowing queries won’t run very fast, so if you have a lot of data to analyze, you might con-sider exporting it to a statistical package or using a procedural host language
✔ Tip
■ These queries are based on the ideas in David Rozenshtein, Anatoly Abramovich,
and Eugene Birger’s Optimizing
Transact-SQL: Advanced Programming Techniques
(SQL Forum Press) You can use the queries’ common framework to create similar queries that find other series
of values
Listing 15.9 List all the column in the table temps
See Figure 15.9 for the result.
SELECT *
FROM temps;
Listing
id hi_temp
-1 49
2 46
3 48
4 50
5 50
6 50
7 51
8 52
9 53
10 50
11 50
12 47
13 50
14 51
15 52
Figure 15.9 Result of Listing 15.9.
Trang 7Listing 15.10 finds all the sequences in
tempsand lists each sequence’s start
position, end position, and length See
Figure 15.10 for the result This query
is a lot to take in at first glance, but it’s
easier to understand it if you look at it
piecemeal Then you’ll be able to
under-stand the rest of the queries in this section
The subquery’s WHEREclause subtracts id
fromhi_temp, yielding (internally):
id hi_temp diff
—— ——————— ————
1 49 48
2 46 44
3 48 45
4 50 46
5 50 45
6 50 44
7 51 44
8 52 44
9 53 44
10 50 40
11 50 39
12 47 35
13 50 37
14 51 37
15 52 37
In the column diff, note that successive
differences are constant for sequences
(50 – 6 = 44, 51 – 7 = 44, and so on) To find
neighboring rows, the outer query cross-joins
two instances of the same table (t1andt2), as
described in “Calculating Running Statistics”
earlier in this chapter The condition
WHERE (t1.id < t2.id)
guarantees that any t1row represents an
element with an index (id) lower than the
correspondingt2row
Listing 15.10 List the starting point, ending point,
and length of each sequence in the table temps See Figure 15.10 for the result.
SELECT t1.id AS StartSeq, t2.id AS EndSeq, t2.id - t1.id + 1 AS SeqLen FROM temps t1, temps t2 WHERE (t1.id < t2.id) AND NOT EXISTS(
SELECT * FROM temps t3 WHERE (t3.hi_temp - t3.id <> t1.hi_temp - t1.id AND t3.id BETWEEN t1.id AND t2.id)
OR (t3.id = t1.id - 1 AND t3.hi_temp - t3.id = t1.hi_temp - t1.id)
OR (t3.id = t2.id + 1 AND t3.hi_temp - t3.id = t1.hi_temp - t1.id) );
Listing
StartSeq EndSeq SeqSize - -
-6 9 4
13 15 3
Figure 15.10 Result of Listing 15.10.
Trang 8The subquery detects sequence breaks with the condition
t3.hi_temp - t3.id <> t1.hi_temp - t1.id
The third instance of temps(t3) in the
sub-query is used to determine whether any row
in a candidate sequence (t3) has the same
difference as the sequence’s first row (t1)
If so, it’s a sequence member If not, the can-didate pair (t1andt2) is rejected
The last two ORconditions determine whether the candidate sequence’s borders can expand
A row that satisfies these conditions means the current candidate sequence can be extended and is rejected in favor of a longer one
✔ Tip
■ To find only sequences larger than n
rows, add the WHEREcondition
AND (t2.id - t1.id) >= n - 1
To change Listing 15.10 to find all
sequences of four or more rows, for
example, replace
WHERE (t1.id < t2.id)
with
WHERE (t1.id < t2.id)
AND (t2.id - t1.id) >= 3
The result is:
StartSeq EndSeq SeqSize
———————— —————— ———————
6 9 4
Trang 9Listing 15.11 finds all the runs in tempsand
lists each run’s start position, end position,
and length See Figure 15.11 for the result.
The logic of this query is similar to that
of the preceding one but accounts for run
values needing only to increase, not
(neces-sarily) be consecutive The fourth instance
oftemps(t4) is needed because there doesn’t
have to be a constant difference between id
andhi_tempvalues The subquery
cross-joins t3andt4to check rows in the middle
of a candidate run, whose borders are t1
andt2 For every element between t1andt2
(limited by BETWEEN),t3and its predecessor
t4are compared to see whether their values
are increasing
Listing 15.11 List the starting point, ending point, and
length of each run in the table temps See Figure 15.11 for the result.
SELECT t1.id AS StartRun, t2.id AS EndRun, t2.id - t1.id + 1 AS RunLen FROM temps t1, temps t2 WHERE (t1.id < t2.id) AND NOT EXISTS(
SELECT * FROM temps t3, temps t4 WHERE (t3.hi_temp <= t4.hi_temp AND t4.id = t3.id - 1 AND t3.id BETWEEN t1.id + 1 AND t2.id)
OR (t3.id = t1.id - 1 AND t3.hi_temp <
t1.hi_temp)
OR (t3.id = t2.id + 1 AND t3.hi_temp >
t2.hi_temp) );
Listing
StartRun EndRun RunLen
-2 4 3
6 9 4
12 15 4
Figure 15.11 Result of Listing 15.11.
Trang 10Listing 15.12 finds all regions in tempswith
a high temperature of 50 and lists each region’s start position, end position, and
length See Figure 15.12 for the result.
✔ Tips
■ To rank regions by length, add an ORDER
BYclause to the outer query:
ORDER BY t2.id - t1.id DESC
■ To list the individual ids that fall in a region (with value 50), type:
SELECT DISTINCT t1.id FROM temps t1, temps t2 WHERE t1.hi_temp = 50 AND t2.hi_temp = 50 AND ABS(t1.id - t2.id) = 1;
The standard function ABS(), which all DBMSs support, returns the absolute value of its argument The result is:
id ––
4 5 6 10 11
Listing 15.12 List the starting point, ending point, and
length of each region (with value 50) in the table
temps See Figure 15.12 for the result.
SELECT
t1.id AS StartReg,
t2.id AS EndReg,
t2.id - t1.id + 1 AS RegLen
FROM temps t1, temps t2
WHERE (t1.id < t2.id)
AND NOT EXISTS(
SELECT *
FROM temps t3
WHERE (t3.hi_temp <> 50
AND t3.id BETWEEN
t1.id AND t2.id)
OR (t3.id = t1.id - 1
AND t3.hi_temp = 50)
OR (t3.id = t2.id + 1
AND t3.hi_temp = 50)
);
Listing
StartReg EndReg RegLen
-4 6 3
10 11 2
Figure 15.12 Result of Listing 15.12.