Physical Database Design• Many physical database design decisions are implicit in the technology adopted – Also, organizations may have standards or an “information architecture” that
Trang 1Chapter 08: Physical Database Design
Trang 2Database Design Process
ConceptualModel
LogicalModel
External Model
External Model
External Model
Internal Model
Physical Design
Trang 3Physical Database Design
• Many physical database design decisions are implicit
in the technology adopted
– Also, organizations may have standards or an
“information architecture” that specifies operating systems, DBMS, and data access languages thus constraining the range of possible physical
implementations.
• We will be concerned with some of the possible
physical implementation issues
Trang 4Physical Database Design
• The primary goal of physical database design is data processing
efficiency
• We will concentrate on choices often available to optimize
performance of database services
• Physical Database Design requires information gathered during earlier stages of the design process
Trang 5Physical Design Information
• Information needed for physical file and database design includes:
– Normalized relations plus size estimates for them
– Definitions of each attribute
– Descriptions of where and when data are used
• entered, retrieved, deleted, updated, and how often
– Expectations and requirements for response time, and data security, backup, recovery, retention and integrity – Descriptions of the technologies used to implement the database
Trang 6Physical Design Decisions
• There are several critical decisions that will affect the integrity and performance of the system
Trang 7Storage Format
• Choosing the storage format of each field (attribute) The DBMS
provides some set of data types that can be used for the physical
storage of fields in the database
• Data Type (format) is chosen to minimize storage space and maximize data integrity
Trang 8Objectives of data type selection
• Minimize storage space
• Represent all possible values
• Improve data integrity
• Support all data manipulations
• The correct data type should, in minimal space,
represent every possible value (but eliminate
illegal values) for the associated attribute and can
support the required data manipulations (e.g
numerical or string operations)
Trang 9Access Data Types
• Numeric (1, 2, 4, 8 bytes, fixed or float)
• OLE (limited only by disk space)
• Hyperlinks (up to 64000 chars)
Trang 10Access Numeric types
• Byte
– Stores numbers from 0 to 255 (no fractions) 1 byte
• Integer
– Stores numbers from –32,768 to 32,767 (no fractions) 2 bytes
• Long Integer (Default)
– Stores numbers from –2,147,483,648 to 2,147,483,647 (no fractions) 4 bytes
• Replication ID
– Globally unique identifier (GUID) N/A 16 bytes
Trang 11Designing Physical Records
• A physical record is a group of fields stored in adjacent memory locations and retrieved together as a unit
• Fixed Length and variable fields
Trang 13The Memory Hierarchy
Main Memory = Disk Cache
• Only sequential access
• Not for operational data
Processor Cache:
• access time 10 nano’s
• 512K
Trang 14Main Memory
• Fastest, most expensive (excluding cache)
• Today: 512MB are common even on PCs
• Many databases could fit in memory
– New industry trend: Main Memory Database– E.g TimesTen
• Main issue is volatility
Trang 15– A disk block is also called a disk page or simply a page
• Used with a main memory buffer
Trang 16of a block for the data is 100 bytes.
– What is the blocking factor?
Trang 17The Mechanics of Disk
Mechanical characteristics:
• Rotation speed (5400RPM)
• Number of platters (1-30)
• Number of tracks (<=10000)
• Number of sectors (256/track)
• Number of bytes / sector (29=512)
• Block size (212=4096)
Platters
Spindle Disk head
Arm movement
Arm assembly
Tracks
Sector Cylinder
Trang 18Important Disk Access Characteristics
• Block access time = Disk latency + transfer time
• Disk latency = seek time + rotational latency
• Seek time = time for the head to reach the right track
– 10ms – 40ms
• Rotational latency = rotation time to get to the right sector
– Time for one rotation = 10ms
– Average rotation latency = 10ms/2
• Transfer time = typically 5-10MB/s
• Disks read/write one block at a time (typically 4kB)
Trang 19Representing Data Elements
• Relational database elements:
CREATE TABLE Product (
pid INT PRIMARY KEY,
name CHAR(20),
description VARCHAR(200),
maker CHAR(10) REFERENCES Company(name))
• A tuple is represented as a record
Trang 20Record Formats: Fixed Length
• Information about field types same for all
records in a file; stored in system catalogs.
• Finding i’th field requires scan of record.
• Note the importance of schema information!
Base address (B)
Address = B+L1+L2
Trang 21Need the header because:
•The schema may change
for a while new+old may coexist
•Records from different relations may coexist
header
Trang 22Variable Length Records
Other header information
length
Place the fixed fields first: F1, F2
Then the variable length fields: F3, F4
Null values take 2 bytes only
Sometimes they take 0 bytes (when at the end)
header
Trang 23Records With Referencing Fields
Trang 24Storing Records in Blocks
• Blocks have fixed size (typically 4k)
R1 R2
R3 BLOCK
R4
Trang 25Spanning Records Across Blocks
• When records are very large
• Or even medium size: saves space in blocks
block
Trang 26• Binary large objects
• Supported by modern database systems
• E.g images, sounds, etc
• Storage: attempt to cluster blocks together
Trang 27Modifications: Insertion
• File is unsorted
– add it to the end
• File is sorted:
– Is there space in the right block ?
• Yes: we are lucky, store it there
– Is there space in a neighboring block ?
• Look 1-2 blocks to the left/right, shift records
– If anything else fails, create overflow block
Trang 29Modifications: Deletions
• Free space in block, shift records
• Maybe be able to eliminate an overflow block
Trang 30Modifications: Updates
• If new record is shorter than previous, easy
• If it is longer, need to shift records, create overflow blocks
Trang 31– The cylinder number
– The track number
– The block within the track
– For records: an offset in the block
• sometimes this is in the block’s header
Trang 32Logical Addresses
• Logical address: a string of bytes (10-16)
• More flexible: can blocks/records around
• But need translation table:
Logical address Physical address
Trang 33Main Memory Address
• When the block is read in main memory, it receives a main memory address
• Buffer manager has another translation table
Trang 34Designing Physical/Internal Model
• Overview
• terminology
• Access methods
Trang 35Physical Design
• Internal Model/Physical Model
Operating System Access Methods
Data Base
User request
DBMS
Internal Model Access Methods
External Model
Interface 1
Interface 3 Interface 2
Trang 36Physical Design
user presents a query, the DBMS determines which physical DBs are needed to resolve the query
access method to access the data stored in a
logical database.
methods and OS access methods access the
physical records of the database.
Trang 37Physical File Design
• A Physical file is a portion of secondary storage
(disk space) allocated for the purpose of storing physical records
• Pointers - a field of data that can be used to locate
a related field or record of data
• Access Methods - An operating system algorithm
for storing and locating data in secondary storage
• Pages - The amount of data read or written in one
disk input or output operation
Trang 38Internal Model Access Methods
• Many types of access methods:
Trang 39Physical Sequential
• Key values of the physical records are in logical sequence
• Main use is for “dump” and “restore”
• Access method may be used for storage as well as retrieval
• Storage Efficiency is near 100%
• Access Efficiency is poor (unless fixed size physical records)
Trang 40Indexed Sequential
• Key values of the physical records are in logical
sequence
• Access method may be used for storage and retrieval
• Index of key values is maintained with entries for the highest key values per block(s)
• Access Efficiency depends on the levels of index,
storage allocated for index, number of database
records, and amount of overflow
• Storage Efficiency depends on size of index and
volatility of database
Trang 41Index Sequential
Data File Block 1
Block 2
Block 3
Address Block Number 1 2 3
Getta Harty
Mobile Sunoci Texaci
Trang 42Indexed Sequential: Two Levels
Address
7 8 9
705 710 785
251 385
455 480 536 605 610 678
791 805
Address 1 2
Key Value 150 385
Address 3 4
Key Value 536 678
Address 5 6
Key Value 785 805
Trang 43Indexed Random
• Key values of the physical records are not necessarily
in logical sequence
• Index may be stored and accessed with Indexed
Sequential Access Method
• Index has an entry for every data base record These are in ascending order The index keys are in logical sequence Database records are not necessarily in
ascending sequence.
• Access method may be used for storage and retrieval
Trang 44Indexed Random
Address Block Number 2 1 3 2 1
Adams Getta
Dumpling
Trang 45HawkeyesHoosiers
Trang 46• Key values of the physical records are not necessarily in logical sequence
• Access Method is better used for retrieval
• An index for every field to be inverted may be built
• Access efficiency depends on number of database records, levels of index, and storage allocated for index
Trang 47Address Block Number 1 2 3
CS 623
105, 106
Adams Becker Dumpling Getta Harty Mobile
Student name
Course Number
CH145 cs201 ch145 ch145 cs623 cs623
Trang 48• Key values of the physical records are not
necessarily in logical sequence
• There is a one-to-one correspondence between
a record key and the physical address of the
record
• May be used for storage and retrieval
• Access efficiency always 1
• Storage efficiency depends on density of keys
• No duplicate keys permitted
Trang 49• May be used for storage and retrieval
• Access efficiency depends on distribution of keys,
algorithm for key transformation and space allocated
• Storage efficiency depends on distibution of keys and algorithm used for key transformation
Trang 50Comparative Access Methods
Indexed
No wasted space for data but extra space for index
Moderately Fast
Moderately Fast Very fast with multiple indexes
OK if dynamic
OK if dynamic
Easy but requires Maintenance of indexes
Impractical Possible but needs
a full scan can create wasted space
requires rewriting file
usually requires rewriting file
Hashed
more space needed for addition and deletion of records after initial load
Impractical Very fast
Not possible very easy very easy very easy