advanced sql Functions in Oracle 10G phần 4 ppt

func-For example, using our Employee table withPERCENT_RANK and CUME_DIST: SELECT empno, ename, region, RANK OVERPARTITION BY region ORDER BY curr_salary of PERCENT_RANK and CUME_DIST as

Trang 1

The value of nr here is 20 (20 rows).

By the row, the CUME_RANK calculation is:

CNAME TEMP RANK rownum cr calculation CD - - - - - -

The PERCENT_RANK and CUME_RANK tions are very specialized and far less common than RANK or ROW_NUMBER Also, in our examples we have depicted only one grouping — one partition A PARTITION BY clause may be added to the analytic clause of the function, and sub-grouping and sub-PER- CENT_RANKs and CUME_DISTs may also be reported.

Trang 2

func-For example, using our Employee table with

PERCENT_RANK and CUME_DIST:

SELECT empno, ename, region,

RANK() OVER(PARTITION BY region ORDER BY curr_salary)

of PERCENT_RANK and CUME_DIST as per the previous algorithms.

Trang 3

SQL for Analysis in Data Warehouses, Oracle

Corpo-ration, Redwood Shores, CA, Oracle9i Data Warehousing Guide, Release 2 (9.2), Part Number A96520-01.

For an excellent discussion of how Oracle 10g has

improved querying, see “DSS Performance in

Oracle Database 10g,” an Oracle white paper,

Sep-tember 2003 This article shows how the Optimizer

has been improved in 10g.

Trang 4

Chapter 4

Aggregate Functions Used as Analytical Functions (Analytical Functions II)

The Use of Aggregate Functions

in SQL

Many of the common aggregate functions can be used

as analytical functions: SUM, AVG, COUNT, STDDEV, VARIANCE, MAX, and MIN The aggregate functions used as analytical functions offer the advantage of partitioning and ordering as well As an example, say you want to display each person’s employee number, name, original salary, and the average salary of all employees This cannot be done with a query like the following because you cannot mix aggre- gates and row-level results.

Trang 5

SELECT empno, ename, orig_salary,AVG(orig_salary)

FROM employeeORDER BY ename

Gives:

SELECT empno, ename, orig_salary,

*ERROR at line 1:

ORA-00937: not a single-group group function

But we can use a Cartesian product/virtual table like this:

SELECT e.empno, e.ename, e.orig_salary,x.aos "Avg salary"

FROM employee e,(SELECT AVG(orig_salary) aos FROM employee) xORDER BY ename

This type of query is borderline cumbersome and may

be done far more easily using AVG in an analytical function:

Trang 6

AVG(orig_salary) OVER() "Avg salary"

This display looks off-balance due to the decimal points

in the average salary We can modify the displayed result using the analytical function nested inside an ordinary row-level function; a better version of the query with a ROUND function added would be:

ROUND(AVG(orig_salary) OVER()) "Avg salary"

Trang 7

The aggregate/analytical function uses an argument to

specify which column is aggregated/analyzed (orig_ salary) It should also be noted that there is a null

OVER clause When the OVER clause is null as it is here, it is said to be a reporting function and applies to the entire dataset.

We can use partitioning in the OVER clause of the aggregate-analytical function like this:

SELECT empno, ename, orig_salary, region,ROUND(AVG(orig_salary) OVER(PARTITION BY region))

"Avg Salary"

FROM employeeORDER BY region, ename

In this version of the query, we now have the average

by region reported along with the other ordinary row data for an individual.

The result of the row-level reporting may be used

in arithmetic in the result set Suppose we wanted to see the difference between a person’s salary and the average for his or her region This example shows that query:

Trang 8

SELECT empno, ename, region, curr_salary,orig_salary,

ROUND(AVG(orig_salary) OVER(PARTITION BY region))

"Avg-group",ROUND(orig_salary - AVG(orig_salary) OVER(PARTITION

BY region)) "Diff."

FROM employeeORDER BY region, ename

COLUMN portion FORMAT 99.9999SELECT ename, curr_salary,curr_salary/SUM(curr_salary) OVER() PortionFROM employee

ORDER BY curr_salary

Trang 9

Notice that the PORTION column adds up to 100%:

COLUMN total FORMAT 9.9999SELECT sum(o.portion) TotalFROM

(SELECT i.ename, i.curr_salary,i.curr_salary/SUM(i.curr_salary) OVER() PortionFROM employee i

ORDER BY i.curr_salary) o

Gives:

TOTAL -1.0000

The above query showing the fraction of salary tioned to each individual can be done in one step with

appor-an appor-analytical function called RATIO_TO_REPORT, which is used like this:

COLUMN portion2 LIKE portionSELECT ename, curr_salary,curr_salary/SUM(curr_salary) OVER() Portion,RATIO_TO_REPORT(curr_salary) OVER() Portion2FROM employee

Trang 10

SELECT ename, curr_salary, region,

curr_salary/SUM(curr_salary) OVER(PARTITION BY Region)Portion,

RATIO_TO_REPORT(curr_salary) OVER(PARTITION BY Region)Portion2

Trang 11

Notice that the portion amounts add to 1.000 in each region:

SELECT ename, curr_salary, region,curr_salary/SUM(curr_salary) OVER(PARTITION BY Region)Portion,

FROM employeeUNIONSELECT null, TO_NUMBER(null), region, sum(P1), sum(p2)FROM

(SELECT ename, curr_salary, region,curr_salary/SUM(curr_salary) OVER(PARTITION BY Region) P1,RATIO_TO_REPORT(curr_salary) OVER(PARTITION BY Region) P2FROM employee)

GROUP BY regionORDER BY 3,2

Trang 12

A similar report can be had without the UNION workaround with the following SQL*Plus formatting commands included in a script:

BREAK ON region

COMPUTE sum of portion ON region

SELECT ename, curr_salary, region,

curr_salary/SUM(curr_salary) OVER(PARTITION BY Region)Portion,

Christina 55000 302197802 302197802

******

Trang 13

Windowing Subclauses with Physical Offsets in Aggregate Analytical

Functions

A windowing subclause is a way of capturing several rows of a result set (i.e., a “window”) and reporting the result in one “window row.” An example of this tech- nique would be in applications where one wants to smooth data by finding a moving average Moving averages are most often calculated based on sorted data and on a physical offset of rows Once we have established how the physical (row) offsets function, we will explore logical (range) offsets To illustrate the moving average using physical offsets, suppose we have some observations that have these values:

Suppose further we know that the data is noisy; that is,

it contains a random factor that is added or subtracted from what we might consider a “true” value One way

to smooth out the data and remove some of the random

noise is to use a moving average on ordered data by taking an average using n physical rows above and

below each row A moving average will operate in a window so that if the moving average is based on, say,

three numbers (n = 3), the windows and their reported

window rows would be:

Trang 14

These calculations result in this display of the data:

Time Value Moving Average

of readings with only the “inside” numbers smoothed.

In Oracle’s analytical functions, the way the gate functions work is that the end points are reported, but they are based on averages that include nulls in

Trang 15

aggre-rows preceding and past the data points In Oracle, nulls in calculations involving aggregate functions are ignored Consider, for example, this query:

SELECT ename, curr_salaryFROM empwnulls

UNIONSELECT 'The average ', averageFROM

(SELECT avg(curr_salary) averageFROM empwnulls)

Returning to our simple example and the moving averages we have computed thus far:

Time Value Moving Average

Trang 16

The end points would be calculated as follows:

AVG(attribute1) OVER (ORDER BY attribute2)

ROWS BETWEEN x PRECEDING

AND y FOLLOWING

where attribute1 and attribute2 do not have to be the same attribute Attribute2 defines the window, and attribute1 defines the value on which to operate The

designation of “ROWS” means we will use a physical

offset The x and y values are the row limits — the

number of physical rows below and above the window (Later, we will look at another way to do these prob- lems using a logical offset, RANGE, instead of ROWS.)

Trang 17

The ORDER BY in the analytical clause is absolutely necessary, and only one attribute may be used for ordering in the function Also, only numeric or date data types would make sense in calculations of aggre- gates Here is the above example in SQL using physical offsets for the moving average on a table called

Testma:

SELECT * FROM testma;

Which gives:

MTIME MVALUE - -

ORDER BY mtime

Gives:

MTIME MVALUE MA - - -

Trang 18

If the ordering subclause is changed, then the ordering is done first and then the moving average:

row-SELECT mtime, mvalue,

AVG(mvalue) OVER(ORDER BY mvalue

ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) ma

COLUMN ma FORMAT 99.999

COLUMN sum LIKE ma

COLUMN "sum/3" LIKE ma

SELECT mtime, mvalue,

AVG(mvalue) OVER(ORDER BY mtime

ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) ma,

SUM(mvalue) OVER(ORDER BY mtime

ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) sum,

(SUM(mvalue) OVER(ORDER BY mtime

ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING))/3 "Sum/3"FROM testma

ORDER BY mtime

Trang 19

Also, we can use the COUNT aggregate analytical function to show how many rows are included in each window like this:

SELECT mtime, mvalue,COUNT(mvalue) OVER(ORDER BY mtimeROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) HowmanyrowsFROM testma

Trang 20

An Expanded Example of a Physical Window

We will need some additional data to look at more examples of windowing functions Let us consider the following data of some fictitious stock whose symbol is FROG:

COLUMN price FORMAT 9999.99SELECT *

FROM stockWHERE symb like 'FR%'ORDER BY symb desc, dte

Which gives:

SYMB DTE PRICE - - -FROG 06-JAN-06 63.13FROG 09-JAN-06 63.52FROG 10-JAN-06 64.30FROG 11-JAN-06 65.11FROG 12-JAN-06 65.07FROG 13-JAN-06 65.67FROG 16-JAN-06 65.60FROG 17-JAN-06 65.99FROG 18-JAN-06 66.11FROG 19-JAN-06 66.26FROG 20-JAN-06 67.03FROG 23-JAN-06 67.51FROG 24-JAN-06 67.23FROG 25-JAN-06 67.43FROG 26-JAN-06 67.27FROG 27-JAN-06 66.85FROG 30-JAN-06 66.95FROG 31-JAN-06 67.82FROG 01-FEB-06 68.21FROG 02-FEB-06 68.60FROG 03-FEB-06 68.76

Trang 21

FROG 06-FEB-06 69.55FROG 07-FEB-06 69.89FROG 08-FEB-06 70.18FROG 09-FEB-06 70.18

28 rows selected

To see how the moving average window can expand,

we can change the clause ROWS BETWEEN x PRECEDING AND y FOLLOWING to have different

values for x and y In fact, x and y do not have to be the same value at all For example, suppose we let x = 3 and y = 1, which gives more weight to three days

before the row-window date and less to the one day after The query and result look like this:

COLUMN ma FORMAT 99.999SELECT dte, price,AVG(price) OVER(ORDER BY dteROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) maFROM stock

WHERE symb like 'FR%'ORDER BY dte

Giving:

- 03-JAN-06 62.45 62.83504-JAN-06 63.22 62.82705-JAN-06 62.81 62.90306-JAN-06 63.13 63.32509-JAN-06 63.52 63.65010-JAN-06 64.30 64.01511-JAN-06 65.11 64.22612-JAN-06 65.07 64.73413-JAN-06 65.67 65.15016-JAN-06 65.60 65.488

Trang 22

The trailing end is done similarly:

02-FEB-06 68.60 68.068

03-FEB-06 68.76 68.588

06-FEB-06 69.55 69.002

07-FEB-06 69.89 69.396 (68.60 + 68.76 + 69.55 + 69.89 + 70.18)/508-FEB-06 70.18 69.712 (68.76 + 69.55 + 69.89 + 70.18 + 70.18)/509-FEB-06 70.18 69.950 (69.55 + 69.89 + 70.18 + 70.18)/4

Trang 23

We can clarify the demonstration a bit by displaying which rows are used in these moving average calculations with two other analytical functions: FIRST_ VALUE and LAST_VALUE These two functions tell

us which rows are used in the calculation of the window function for each row.

COLUMN first FORMAT 9999.99COLUMN last LIKE firstSELECT dte, price,AVG(price) OVER(ORDER BY dteROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) ma,FIRST_VALUE(price) OVER(ORDER BY dte

ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) first,LAST_VALUE(price) OVER(ORDER BY dte

ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) lastFROM stock

WHERE symb like 'F%'ORDER BY dte

Giving:

- - 03-JAN-06 62.45 62.835 62.45 63.2204-JAN-06 63.22 62.827 62.45 62.8105-JAN-06 62.81 62.903 62.45 63.1306-JAN-06 63.13 63.325 63.13 63.5209-JAN-06 63.52 63.650 63.13 64.3010-JAN-06 64.30 64.015 63.13 65.1111-JAN-06 65.11 64.226 63.13 65.0712-JAN-06 65.07 64.734 63.52 65.6713-JAN-06 65.67 65.150 64.30 65.6016-JAN-06 65.60 65.488 65.11 65.9917-JAN-06 65.99 65.688 65.07 66.1118-JAN-06 66.11 65.926 65.67 66.2619-JAN-06 66.26 66.198 65.60 67.0320-JAN-06 67.03 66.580 65.99 67.51

Trang 24

-25-JAN-06 67.43 67.294 67.03 67.2726-JAN-06 67.27 67.258 67.51 66.8527-JAN-06 66.85 67.146 67.23 66.9530-JAN-06 66.95 67.264 67.43 67.8231-JAN-06 67.82 67.420 67.27 68.2101-FEB-06 68.21 67.686 66.85 68.6002-FEB-06 68.60 68.068 66.95 68.7603-FEB-06 68.76 68.588 67.82 69.5506-FEB-06 69.55 69.002 68.21 69.8907-FEB-06 69.89 69.396 68.60 70.1808-FEB-06 70.18 69.712 68.76 70.1809-FEB-06 70.18 69.950 69.55 70.18

Displaying a Running Total Using SUM as an Analytical Function

As we noted earlier, the aggregate function SUM may

be used as an analytical function (as may AVG, MAX, MIN, COUNT, STDDEV, and VARIANCE) The SUM function is most easily seen when using a cumula- tive total calculation For example, suppose we have the following receipts for a cash register application for several weeks ordered by date and location (DTE, LOCATION):

SELECT * FROM storeORDER BY dte, location

Giving:

LOCATION DTE RECEIPTS - - -MOBILE 07-JAN-06 724.6PROVO 07-JAN-06 969.61MOBILE 08-JAN-06 88.76PROVO 08-JAN-06 662.45MOBILE 09-JAN-06 705.47

Trang 25

PROVO 09-JAN-06 928.37MOBILE 10-JAN-06 217.26PROVO 10-JAN-06 664.9MOBILE 11-JAN-06 16.13PROVO 11-JAN-06 694.51MOBILE 12-JAN-06 421.59PROVO 12-JAN-06 413.12MOBILE 13-JAN-06 403.95PROVO 13-JAN-06 645.78MOBILE 14-JAN-06 831.12PROVO 14-JAN-06 678.41MOBILE 15-JAN-06 783.57PROVO 15-JAN-06 491.05MOBILE 16-JAN-06 878.15PROVO 16-JAN-06 635.75MOBILE 17-JAN-06 968.89PROVO 17-JAN-06 378.25MOBILE 18-JAN-06 351PROVO 18-JAN-06 882.51MOBILE 19-JAN-06 975.73PROVO 19-JAN-06 24.52MOBILE 20-JAN-06 191PROVO 20-JAN-06 542.2MOBILE 21-JAN-06 462.92PROVO 21-JAN-06 294.19MOBILE 22-JAN-06 707.57PROVO 22-JAN-06 729.92MOBILE 23-JAN-06 919.61PROVO 23-JAN-06 272.24MOBILE 24-JAN-06 217.91PROVO 24-JAN-06 554.12

Now, suppose we’d like to have a running total of the receipts regardless of the location One way to obtain this display is to use SUM and a slightly different physical offset Previously we used this analytical function:

Trang 26

ROWS UNBOUNDED PRECEDING

This means that we will start with the first row and use all rows up to the current row of the window.

COLUMN "Running total" FORMAT 99,999.99

SELECT dte "Date", location, receipts,

SUM(receipts) OVER(ORDER BY dte

ROWS BETWEEN UNBOUNDED PRECEDING

AND CURRENT ROW) "Running total"

FROM store

WHERE dte < '10-Jan-2006'

ORDER BY dte, location

Trang 27

Date LOCATION RECEIPTS Running total - - - -07-JAN-06 MOBILE 724.6 724.6007-JAN-06 PROVO 969.61 1,694.2108-JAN-06 MOBILE 88.76 1,782.9708-JAN-06 PROVO 662.45 2,445.4209-JAN-06 MOBILE 705.47 3,150.8909-JAN-06 PROVO 928.37 4,079.26

UNBOUNDED FOLLOWING

The clause UNBOUNDED FOLLOWING is used for the end of the window Such a command is used like this:

SELECT dte "Date", location, receipts,SUM(receipts) OVER(ORDER BY dteROWS BETWEEN CURRENT ROWAND UNBOUNDED FOLLOWING) "Running total"

FROM storeWHERE dte < '10-Jan-2006'ORDER BY dte, location

Which results in:

Date LOCATION RECEIPTS Running total - - - -07-JAN-06 MOBILE 724.6 4079.2607-JAN-06 PROVO 969.61 3354.6608-JAN-06 MOBILE 88.76 2385.0508-JAN-06 PROVO 662.45 2296.2909-JAN-06 MOBILE 705.47 1633.8409-JAN-06 PROVO 928.37 928.37

Tiêu đề	Advanced SQL Functions in Oracle 10g Part 4 PPT
Trường học	Binghamton University
Chuyên ngành	Database Management / SQL
Thể loại	Lecture Presentation
Thành phố	Binghamton

Định dạng
Số trang	42
Dung lượng	602,87 KB