As an example of normalization, consider an un-normalized table called BOOKS that stores details of books, authors, and publishers, using the ISBN number as the primary key.. These are t
Trang 1data, but it is not appropriate for all applications As a general rule, a relational analysis should be the first approach taken when modeling a system Only if it proves inappropriate should one resort to nonrelational structures Applications where the relational model has proven highly effective include virtually all Online Transaction Processing (OLTP) systems and Decision Support Systems (DSS) The relational paradigm can be demanding in its hardware requirements and in the skill needed
to develop applications around it, but if the data fits, it has proved to be the most versatile model There can be, for example, problems caused by the need to maintain the indexes that maintain the links between tables and the space requirements of maintaining multiple copies of the indexed data in the indexes themselves and in the tables in which the columns reside Nonetheless, relational design is in most circumstances the optimal model
A number of software publishers have produced database management systems that conform (with varying degrees of accuracy) to the relational paradigm; Oracle
is only one IBM was perhaps the first company to commit major resources to it, but their product (which later developed into DB2) was not ported to non-IBM platforms for many years Microsoft’s SQL Server is another relational database that has been limited by the platforms on which it runs Oracle databases, by contrast, have always been ported to every major platform from the first release It may be this that gave Oracle the edge in the RDBMS market place
A note on terminology: confusion can arise when discussing relational databases with people used to working with Microsoft products SQL is a language and SQL Server is a database, but in the Microsoft world, the term SQL is often used to refer
to either
Data Normalization
The process of modeling data into relational tables is known as normalization and can be studied at university level for years There are commonly said to be three levels of normalization: the first, second, and third normal forms There are higher levels of normalization: fourth and fifth normal forms are well defined, but any normal data analyst (and certainly any normal human being) will not need to be concerned with them It is possible for a SQL application to address un-normalized data, but this will usually be inefficient as that is not what the language is designed
to do In most cases, data stored in a relational database and accessed with SQL should be normalized to the third normal form
Trang 2There are often several possible normalized models for an application It
is important to use the most appropriate—if the systems analyst gets this wrong, the implications can be serious for performance, storage needs, and development effort.
As an example of normalization, consider an un-normalized table called BOOKS that stores details of books, authors, and publishers, using the ISBN number as the
primary key A primary key is the one attribute (or attributes) that can uniquely
identify a record These are two entries:
12345 Oracle 11g OCP SQL
Fundamentals 1 Exam Guide John Watson, Roopesh Ramklass McGraw-Hill, Spear Street, San Francisco,
CA 94105
67890 Oracle 11g New Features
Exam Guide Sam Alapati McGraw-Hill, Spear Street, San Francisco,
CA 94105
Storing the data in this table gives rise to several anomalies First, here is the insertion anomaly: it is impossible to enter details of authors who are not yet
SCENARIO & SOLUTION Your organization is designing a new
application Who should be involved? Everyone! The project team must involve business analysts (who model the business processes), systems analysts
(who model the data), system designers (who decide how
to implement the models), developers (you), database administrators, system administrators, and (most importantly) end users
It is possible that relational structures may
not be suitable for a particular application
How can this be determined, and what
should be done next? Can Oracle help?
Attempt to normalize the data into two-dimensional tables, linked with one-to-many relationships If this really cannot be done, consider other paradigms Oracle may well be able to help For instance, maps and other geographical data really don’t work relationally Neither does text data (such as word processing documents) But the Spatial and Text database options can be used for these purposes There is also the possibility of using user-defined objects to store nontabular data
Trang 3Second, a book cannot be deleted without losing the details of the publisher: a deletion anomaly Third, if a publisher’s address changes, it will be necessary to update the rows for every book he has published: an update anomaly Furthermore,
it will be very difficult to identify every book written by one author The fact that a book may have several authors means that the “author” field must be multivalued, and a search will have to search all the values Related to this is the problem of having to restructure the table of a book that comes along with more authors than the original design can handle Also, the storage is very inefficient due to replication
of address details across rows, and the possibility of error as this data is repeatedly entered is high Normalization should solve all these issues
The first normal form is to remove the repeating groups, in this case, the multiple authors: pull them out into a separate table called AUTHORS The data structures will now look like the following
Two rows in the BOOKS table:
12345 Oracle 11g OCP SQL Fundamentals
1 Exam Guide McGraw-Hill, Spear Street, San Francisco, California
67890 Oracle 11g New Features Exam Guide McGraw-Hill, Spear Street,
San Francisco, California And three rows in the AUTHOR table:
The one row in the BOOKS table is now linked to two rows in the AUTHORS table This solves the insertion anomaly (there is no reason not to insert as many unpublished authors as necessary), the retrieval problem of identifying all the books
by one author (one can search the AUTHORS table on just one name) and the problem of a fixed maximum number of authors for any one book (simply insert as many or as few AUTHORS as are needed)
Trang 4This is the first normal form: no repeating groups.
The second normal form removes columns from the table that are not dependent
on the primary key In this example, that is the publisher’s address details: these are dependent on the publisher, not the ISBN The BOOKS table and a new PUBLISHERS table will then look like this:
BOOKS
12345 Oracle 11g OCP SQL Fundamentals 1 Exam Guide McGraw-Hill
67890 Oracle 11g New Features Exam Guide McGraw-Hill PUBLISHERS
McGraw-Hill Spear Street San Francisco California
All the books published by one publisher will now point to a single record in PUBLISHERS This solves the problem of storing the address many times, and also solves the consequent update anomalies and the data consistency errors caused by inaccurate multiple entries
Third normal form removes all columns that are interdependent In the
PUBLISHERS table, this means the address columns: the street exists in only one city, and the city can be in only one state; one column should do, not three This could be achieved by adding an address code, pointing to a separate address table: PUBLISHERS
ADDRESSES
Trang 5of primary keys and foreign keys A primary key is the unique identifier of a row
in a table, either one column or a concatenation of several columns (known as a
composite key) Every table should have a primary key defined This is a requirement
of the relational paradigm Note that the Oracle database deviates from this standard: it is possible to define tables without a primary key—though it is usually not a good idea, and some other RDBMSs do not permit this
A foreign key is a column (or a concatenation of several columns) that can be
used to identify a related row in another table A foreign key in one table will match
a primary key in another table This is the basis of the many-to-one relationship A
many-to-one relationship is a connection between two tables, where many rows in one table refer to a single row in another table This is sometimes called a parent-child relationship: one parent can have many parent-children In the BOOKS example so far, the keys are as follows:
Foreign key: Publisher
Foreign key: ISBN
Foreign key: Address code
These keys define relationships such as that one book can have several authors There are various standards for documenting normalized data structures, developed by different organizations as structured formal methods Generally speaking, it really doesn’t matter which method one uses as long as everyone reading the documents understands it Part of the documentation will always include a listing of the attributes that make up each entity (also known as the columns that make up each table) and an entity-relationship diagram representing graphically the foreign to primary key connections A widely used standard is as follows:
■ Primary key columns identified with a hash (#)
■ Foreign key columns identified with a back slash (\)
■ Mandatory columns (those that cannot be left empty) with an asterisk (*)
■ Optional columns with a lowercase “o”
Trang 6The BOOKS tables can now be described as follows:
Table BOOKS
\* Publisher Foreign key, link to the PUBLISHERS table Table AUTHORS
#* Name Together with the ISBN, the primary key
#\o ISBN Part of the primary key, and a foreign key to the BOOKS table
Optional, because some authors may not yet be published Table PUBLISHERS
\o Address code Foreign key, link to the ADDRESSES table Table ADDRESSES
#* Address code Primary key
The second necessary part of documenting the normalized data model is the
entity-relationship diagram This represents the connections between the tables
graphically There are different standards for these; Figure 1-3 shows the entity-relationship diagram for the BOOKS example using a very simple notation limited
to showing the direction of the one-to-many relationships, using what are often
called crow’s feet to indicate which sides of the relationship are the many and
the one It can be seen that one BOOK can have multiple AUTHORS, one PUBLISHER can publish many books Note that the diagram also states that both AUTHORS and PUBLISHERS have exactly one ADDRESS More complex notations can be used to show whether the link is required or optional, information which will match that given in the table columns listed previously
ADDRESSES
FIGURE 1-3
An
entity-relationship
diagram
Trang 7one author were to write several books, this would require multiple values in the ISBN column of the AUTHORS table That would be a repeating group, which would have to be removed because repeating groups break the rule for first normal form A major exercise with data normalization is ensuring that the structures can handle all possibilities
A table in a real-world application may have hundreds of columns and dozens
of foreign keys The standards for notation vary across organizations—the example given is very basic Entity-relationship diagrams for applications with hundreds or thousands of entities can be challenging to interpret
EXERCISE 1-2
Perform an Extended Relational Analysis
This is a paper-based exercise, with no specific solution
Consider the situation where one author can write many books, and one book can have many authors This is a many-to-many relationship, which cannot be fit into the relational model Sketch out data structures that demonstrate the problem, and develop another structure that would solve it Following is a possible solution The un-normalized table of books with many authors could look like this:
BOOKS
There could be two rows in this table:
11g SQL Fundamentals Exam Guide John Watson, Roopesh Ramklass
10g DBA Exam Guide John Watson, Damir Bersinic And that of authors could look like this:
AUTHORS
Trang 8There could be three rows in this table:
John Watson 11g SQL Fundamentals Exam Guide, 10g DBA Exam Guide
Roopesh Ramklass 11g SQL Fundamentals Exam Guide
Damir Bersinic 10g DBA Exam Guide
This many-to-many relationship needs to be resolved into many-to-one
relationships by taking the repeating groups out of the two tables and storing them
in a separate books-per-author table It will also become necessary to introduce some codes, such as ISBNs to identify books and social security numbers to identify authors This is a possible normalized structure:
BOOKS
AUTHORS
BOOKAUTHORS
#\* ISBN Part of the primary key and a foreign key to BOOKS
#\* SSNO Part of the primary key and a foreign key to AUTHORS The rows in these normalized tables would be as follows:
BOOKS
Trang 9SSNO Name
11111 John Watson
22222 Damir Bersinic
33333 Roopesh Ramklass
BOOKAUTHORS
Figure 1-4 shows the entity-relationship diagram for the original un-normalized structure, followed by the normalized structure
As a further exercise, consider the possibility that one publisher could have offices at several addresses, and one address could have offices for several companies Authors will also have addresses, and this connection too needs to be defined These enhancements can be added to the example worked through previously
FIGURE 1-4
Un-normalized
and normalized
data models
First, an un-normalized many-to-many relationship:
The many-to-many relationship resolved, by interposing another entity:
BOOKS
BOOKS
AUTHORS
AUTHORS BOOKAUTHORS
Trang 10CERTIFICATION OBJECTIVE 1.03
Summarize the SQL Language
SQL is defined, developed, and controlled by international bodies Oracle Corporation does not have to conform to the SQL standard but chooses to do so The language itself can be thought as being very simple (there are only 16 commands), but in practice SQL coding can be phenomenally complicated That is why a whole book is needed to cover the bare fundamentals
SQL Standards
Structured Query Language (SQL) was first invented by an IBM research group in the ’70s, but in fact Oracle Corporation (then trading as Relational Software, Inc.) claims to have beaten IBM to market by a few weeks with the first commercial implementation: Oracle 2, released in 1979 Since then the language has evolved enormously and is no longer driven by any one organization SQL is now an international standard It is managed by committees from ISO and ANSI ISO is the Organisation Internationale de Normalisation, based in Geneva; ANSI is the American National Standards Institute, based in Washington, DC The two bodies cooperate, and their SQL standards are identical
Earlier releases of the Oracle database used an implementation of SQL that had some significant deviations from the standard This was not because Oracle was being deliberately different: it was usually because Oracle implemented features that were ahead of the standard, and when the standard caught up, it used different syntax An example is the outer join (detailed in Chapter 8), which Oracle implemented long before standard SQL; when standard SQL introduced an outer join, Oracle added support for the new join syntax while retaining support for its own proprietary syntax Oracle Corporation ensures future compliance by inserting personnel onto the various ISO and ANSI committees and is now assisting with driving the SQL standard forward
SQL Commands
These are the 16 SQL commands, separated into commonly used groups: