Joe Celko s SQL for Smarties - Advanced SQL Programming P61 ppt

576 CHAPTER 25: ARRAYS IN SQL 25.1 Arrays via Named Columns An array in other programming languages has a name and subscripts by which the array elements are referenced.. The array eleme

Trang 1

572 CHAPTER 24: REGIONS, RUNS, GAPS, SEQUENCES, AND SERIES

AND I3.n > 1);

SELECT DISTINCT C1.x, C1.y FROM Cover AS C1

WHERE NOT EXISTS (SELECT * FROM Cover AS C2 WHERE C2.x <= C1.x AND C2.y >= C1.y AND (C1.x <> C2.x OR C1.y <> C2.y)) ORDER BY C1.x;

Finally, try this approach Assume we have the usual Sequence auxiliary table Now we find all the holes in the range of the intervals and put them in a VIEW or a WITH clause–derived table

CREATE VIEW Holes (hole) AS

SELECT seq_nbr FROM Sequence WHERE seq_nbr <= (SELECT MAX(y) FROM Intervals) AND NOT EXISTS

(SELECT * FROM Intervals WHERE seq_nbr BETWEEN x AND y) UNION VALUES(0) left sentinel value UNION (SELECT MAX(y) + 1 FROM Intervals); right sentinel value

The query picks start and end pairs that are on the edge of a hole and counts the number of holes inside that range Covering has no holes inside its range

SELECT Starts.x, Ends.y FROM Intervals AS Starts, Intervals AS Ends, Sequence AS S usual auxiliary table WHERE S.seq_nbr BETWEEN Starts.x AND Ends.y restrict seq_nbr numbers

AND S.seq_nbr < (SELECT MAX(hole) FROM Holes) AND S.seq_nbr NOT IN (SELECT hole FROM Holes) not a hole

Trang 2

24.10 Coverings 573

AND Starts.x - 1 IN (SELECT hole FROM Holes) on a left cusp AND Ends.y + 1 IN (SELECT hole FROM Holes) on a right cusp

GROUP BY Starts.x, Ends.y

HAVING COUNT(DISTINCT seq_nbr) = Ends.y - Starts.x + 1; no holes

Trang 4

C H A P T E R

25

Arrays in SQL

AI RRAYS CANNOT BE REPRESENTED directly in SQL-92, but they are a common vendor language extension that became part of SQL-99 Arrays violate the rules of First Normal Form (1NF) required for a relational database, which say that the tables have no repeating groups

in any column A repeating group is a data structure that is not scalar; examples of repeating groups include linked lists, arrays, records, and even tables within a column

The reason they are not allowed is that a repeating group would have to define a column like a data type There is no obvious way to

JOIN a column that contains an array to other columns, since there are

no comparison operators or conversion rules There is no obvious way

to display or transmit a column that contains an array as a result set Different languages and different compilers for the same language store arrays in column-major or row-major order, so there is no standard There is no obvious way to write constraints on nonscalar values The goal of SQL was to be a database language that would operate with a wide range of host languages To meet that goal, the scalar data types are as varied as possible to match the host language data types, but as simple in structure as they can be to make the transfer of data to the host language as easy as possible The extensions after SQL-92 ruin all of these advantages, so it is a good thing they are not widely implemented in products

Trang 5

576 CHAPTER 25: ARRAYS IN SQL

25.1 Arrays via Named Columns

An array in other programming languages has a name and subscripts by which the array elements are referenced The array elements are all of the same data type, and the subscripts are all sequential integers Some languages start numbering at zero, some start numbering at one, and some let the user set the upper and lower bounds For example, a Pascal array declaration would look like this:

foobar : ARRAY [1 5] OF INTEGER;

and would have integer elements foobar[1], foobar[2], foobar[3], foobar[4], and foobar[5] The same structure is most often mapped into

an SQL declaration as:

CREATE TABLE Foobar1 (element1 INTEGER NOT NULL, element2 INTEGER NOT NULL, element3 INTEGER NOT NULL, element4 INTEGER NOT NULL, element5 INTEGER NOT NULL);

The elements cannot be accessed by the use of a subscript in this table, as they can in a true array That is, to set the array elements equal

to zero in Pascal takes one statement with a FOR loop in it:

FOR i := 1 TO 5 DO foobar[i] := 0;

The same action in SQL would be performed with the following statement:

UPDATE Foobar1 SET element1 = 0, element2 = 0, element3 = 0, element4 = 0, element5 = 0;

This is because there is no subscript that can be iterated in a loop Any access must be based on column names, not on subscripts These pseudosubscripts lead to building column names on the fly in dynamic

Trang 6

25.1 Arrays via Named Columns 577

SQL, giving code that is both slow and dangerous Even worse, some users will use the same approach in table names, and destroy their logical data model

Let’s assume that we design an Employee table with separate columns for the names of four children, and we start with an empty table and then try to use it

1 What happens if we hire a man with fewer than four children?

We can fire him immediately or make him have more chil-dren We can restructure the table to allow for fewer chilchil-dren The usual, and less drastic, solution is to put NULLs in the col-umns for the nonexistent children We then have all of the problems associated with NULLs to handle

2 What happens if we hire a man with five children?

We can fire him immediately or order him to kill one of his children We can restructure the table to allow five children

We can add a second row to hold the information on children

5 through 8; however, this destroys the uniqueness of the emp_id, so it cannot be used as a key We can overcome that problem by adding a new column for record number, which will form a two-column key with the emp_id This leads to needless duplication in the table

3 What happens if the employee dies?

We will delete all his children’s data along with his, even if the company owes benefits to the survivors

4 What happens if the child of an employee dies?

We can fire him or order him to get another child immedi-ately We can restructure the table to allow only three children

We can overwrite the child’s data with NULLs and get all of the problems associated with NULL values

This one is the most common decision But what if we had used the multiple-row trick and this employee had a fifth child—should that child be brought up into the vacant slot in the current row, and the second row of the set be deleted?

5 What happens if the employee replaces a dead child with a new one?

Trang 7

Should the new child’s data overwrite the NULLs in the dead child’s data? Should the new child’s data be put in the next available slot and overwrite the NULLs in those columns?

Some of these choices involve rebuilding the database Others are simply absurd attempts to restructure reality to fit the database The real point is that each insertion or deletion of a child involves a different procedure, depending on the size of the group to which he belongs File systems had variant records that could change the size of their repeating groups

Consider, instead a table of employees, and another table for their children:

CREATE TABLE Employees (emp_id INTEGER NOT NULL PRIMARY KEY, emp_name CHAR(30) NOT NULL,

);

CREATE TABLE Children (emp_id INTEGER NOT NULL REFERENCES Employees(emp_id)

ON UPFDATE CASCADE, child_name CHAR(30) NOT NULL, PRIMARY KEY (emp_id, child_name), birthday DATE NOT NULL,

sex CHAR(1) NOT NULL);

To add a child, you insert a row into Children To remove a child, you delete a row from Children There is nothing special about the fourth or fifth child that requires the database system to use special procedures There are no NULLs in either table

The trade-off is that the number of tables in the database schema increases, but the total amount of storage used will be smaller, because you will keep data only on children who exist, rather than using NULLs

to hold space The goal is to have data in the simplest possible format, so any host program can use it

Gabrielle Wiorkowski, in her excellent DB2 classes, uses an example

of a table for tracking the sales made by salespersons during the past year That table could be defined as

Trang 8

25.1 Arrays via Named Columns 579

CREATE TABLE AnnualSales1

(salesman CHAR(15) NOT NULL PRIMARY KEY,

jan DECIMAL(5, 2),

feb DECIMAL(5, 2),

mar DECIMAL(5, 2),

apr DECIMAL(5, 2),

may DECIMAL(5, 2),

jun DECIMAL(5, 2),

jul DECIMAL(5, 2),

aug DECIMAL(5, 2),

sep DECIMAL(5, 2),

oct DECIMAL(5, 2),

nov DECIMAL(5, 2),

"dec" DECIMAL(5, 2) DEC[IMAL] is a reserved word

);

We have to allow for NULLs in the monthly sales_amts in the first version of the table, but the table is actually quite a bit smaller than it would be if we were to declare it as:

CREATE TABLE AnnualSales2

(salesman CHAR(15) NOT NULL PRIMARY KEY,

sale_month CHAR(3)

CONSTRAINT valid_month_abbrev

CHECK (sale_month IN ('Jan', 'Feb', 'Mar', 'Apr',

'May', 'Jun', 'Jul', 'Aug',

'Sep', 'Oct', 'Nov', 'Dec'),

sales_amt DECIMAL(5, 2) NOT NULL,

PRIMARY KEY(salesman, sale_month));

In Wiorkowski’s actual example in DB2, the break-even point for DASD storage was April; that is, the storage required for AnnualSales1 and AnnualSales2 is about the same in April of the given year

Queries that deal with individual salespersons will run much faster against the AnnualSales1 table than queries based on the AnnualSales2 table, because all the data is in one row in the AnnualSales1 table These tables may be a bit messy and they may require function calls to handle possible NULL values, but they are not very complex

The only reason for using AnnualSales1 is that you have a data warehouse and all you want to see is summary information, grouped into years This design is not acceptable in an OLTP system

Trang 9

25.2 Arrays via Subscript Columns

Another approach to faking a multidimensional array is to map arrays into a table with an integer column for each subscript, thus:

CREATE TABLE Foobar (i INTEGER NOT NULL PRIMARY KEY CONSTRAINT valid_array_index CHECK(i BETWEEN 1 AND 5), element REAL NOT NULL);

This looks more complex than the first approach, but it is closer to what the original Pascal declaration was doing behind the scenes Subscripts resolve to unique physical addresses, so it is not possible to have two values for foobar[i]; hence, i is a key The Pascal compiler will check to see that the subscripts are within the declared range; hence the

CHECK() clause

The first advantage of this approach is that multidimensional arrays are easily handled by adding another column for each subscript The Pascal declaration:

ThreeD : ARRAY [1 3, 1 4, 1 5] OF REAL;

is mapped over to:

CREATE TABLE ThreeD (i INTEGER NOT NULL CONSTRAINT valid_i CHECK(i BETWEEN 1 AND 3),

j INTEGER NOT NULL CONSTRAINT valid_j CHECK(j BETWEEN 1 AND 4),

k INTEGER NOT NULL CONSTRAINT valid_k CHECK(k BETWEEN 1 AND 5), element REAL NOT NULL, PRIMARY KEY (i, j, k));

Obviously, SELECT statements with GROUP BY clauses on the subscript columns will produce row and column totals, thus:

Trang 10

25.3 Matrix Operations in SQL 581

SELECT i, j, SUM(element) sum across the k columns

FROM ThreeD

GROUP BY i, j;

SELECT i, SUM(element) sum across the j and k columns

FROM ThreeD

GROUP BY i;

SELECT SUM(element) sum the entire array

FROM ThreeD;

If the original one element/one column approach were used, the table declaration would have 120 columns named element_111 through element_345 There are too many names in this example to handle in any reasonable way; you would not be able to use the GROUP BY clauses for array projection, either

Another advantage of this approach is that the subscripts can be data types other than integers DATE and TIME data types are often useful, but CHARACTER and approximate numerics have their uses too

25.3 Matrix Operations in SQL

A matrix is not quite the same thing as an array Matrices are

mathematical structures with particular properties We cannot take the time to discuss them here; you can find the necessary information in a college freshman algebra book Though it is possible to do many matrix operations in SQL, it is not a good idea; such queries and operations will eat up resources and run much too long SQL was never meant to be a language for calculations

Let us assume that we have two-dimensional arrays that are declared

as tables using two columns for subscripts, and that all columns are declared with a NOT NULL constraint

The presence of NULLs is not defined in linear algebra, and I have no desire to invent a three-valued linear algebra of my own Another problem is that a matrix has rows and columns that are not the same as the rows and columns of an SQL table; as you read the rest of this section, be careful not to confuse the two

CREATE TABLE MyMatrix

(element INTEGER NOT NULL, could be any numeric data type

i INTEGER NOT NULL CHECK (i > 0),

Định dạng
Số trang	10
Dung lượng	228,82 KB