Sort-merge join: CJ 3a = CS+ bR+ bS+ js ∗ |R| ∗ |S|/bf rRS; CS: Cost for sorting files Transaction logical unit of database processing an atomic unit of work that includes one or more a
Trang 1Disk - File Structures - Hash
Open addressing: the program checks the subsequent
posi-tions in order until an unused (empty) position is found
Chain-ing:Multiple hashing The program applies a second hash
func-tion if the first results in a collision
Dynamic and extendible hashing do not require an overflow
area Dynamic and extendible hashing do not require an overflow
area Blocks are split in linear orderas the file expands
• Formular: bf r = bB/Rc b = dr/bf re Level tree t r 1
f t 0
≥ 1 (r1number record, f0bft of level 1)
? Disk rd = (1/2) * (1/p) ; btt = B/tr msec; btr = (B/(B
+ G)) * tr bytes/msec; Trw = 2 * rd msec = 60000/p msec
If we have a track size of 50 Kbytes and p is 3600 rpm,
then the transfer rate in bytes/msec is tr = (50 * 1000)/(60 *
1000/3600) = 3000 bytes/msec
The average time (s) needed to find and transfer a block,
given its block address, is estimated by (s + rd + btt) msec
Seek time: s msec Rotational delay: rd msec Block
trans-fer time: btt msec Rewrite time: Trw msec Transfer rate: tr
bytes/msec Bulk transfer rate: btr bytes/msec Block size: B
bytes Interblock gap size: G bytes Disk speed: p rpm
(revolu-tions per minute)
Index
• Primary: Order data file on key field One index for each
lock Number of index entries = number of block Nondense
• Clustering: Order on non-key field One index entry →
each distinct value of the field (index entry → first data block
contains records with that field value) Number of index entries
= Number of distinct indexing field values Nondense
At most one primary index or one clustering index but not
both
• Secondary.The index is an ordered file with two fields
(indexing field + block pointer or record pointer) Can be many
secondary indexes
? Secondary key(unique value) Number of index entries = Number of record Dense
? Non-key: 1.duplicate index entries with the same K(i) value
2 keep a list of pointers < P (i, 1), , P (i, k) > in the index en-try for K(i) 3 one enen-try for each distinct index field value + an extra level of indirection to handle the multiple pointers Ordered file: Primary, Clustering Indexing field is Key Primary, Secondary Indexing field is not Key Clustering, Secondary Dense Secondary Nondense all
• Multi-Level insertion and deletion of new index entries has problem because ordered file → use tree
? B-tree: (p ∗ P ) + ((p − 1) ∗ (Pt+ V )) ≤ B
? B+-tree: internal node dpleaf/2e <= q(pointer) <= p, q-1 search values leaf node: dpleaf/2e <= q(data pointer) <=
pleaf Internode: (p ∗ P ) + ((p − 1) ∗ V ) ≤ B Leaf node: (pleaf∗ (Pt+ V )) + P ≤ B Insert: Leaf full: j = d((pleaf+ 1)/2)e remain Parent full:
j - 1 remained (j = b((p + 1)/2))c)
Query Processing and Optimization
• External sorting: Sorting algorithms for large files not fit entirely in main memory
• Sort-Merge strategy: starts by sorting small subfiles (runs) then merges the sorted runs
? Sorting phase: nR = d(b/nB)e (Read 3 blocks of the file
→ sort → run: 3 blocks)
? Merging phase dM = M in(nB − 1, nR); nP = d(logdM(nR))e (Each step: 1 Read 1 block from (nB-1) runs
to buffer 2 Merge → temp block 3 If temp block full: write to file 4 If any empty block: Read next block from corresponding run.)
nR: number of initial runs; b: number of file blocks; nB: available buffer space; dM: degree of merging; nP: number of passes
• Implementing the SELECT Operation
Trang 2? Using a secondary (B+-tree) index.: On an equality
comparison retrieve a single record if unique values (is a key) or
multiple records if the indexing field is not a key (>, ≥, <, or
≤)
? Conjunctive (AND) selection If an attribute involved
in any single simple conditionin the conjunctive condition has an
access path that permits the use of one of the methods S2 to S6,
use that condition to retrieve the records and then check whether
each retrieved record satisfies the remaining simple conditions in
the conjunctive condition
? Conjunctive selection using a composite index: If
two or more attributes are involved in equality conditions in the
conjunctive condition and a composite index (or hash structure)
exists on the combined field, we can use the index directly
Algorithm for SET operations: UNION,
INTERSEC-TION (must be sorted)
Step in converting a query during heuristic
optimiza-tion 1 Initial (canonical) query tree 2 Moving SELECT
oper-ations down 3 Apply more restrictive SELECT operation first
4 Replacing Cartesian Product and Select with Join operation
5 Moving Project operations down the query tree
• Cost estimate
? Information about the size of a file number of records
(tuples) (r), record size (R), number of blocks (b) blocking factor
(bfr)
?Information about indexes and indexing attributes
of a file Number of levels (x) of each multilevel index Number
of first-level index blocks (b I1 ) Number of distinct values (d) of
an attribute Selectivity (sl) of an attribute Selection cardinality
(s) of an attribute (s = sl * r)
? Linear search CS1a= b For an equality condition on key:
if the record is found CS1b= (b/2) else CS1a= b
?Binary search CS2= log2b + d(s/bf r)e − 1 For equality
condition on a unique (key) attribute CS2= log2b
? Using a primary index (S3a) or hash key (S3b) to
retrieve a single record CS3a= x + 1; CS3b= 1 for static or
linear hashing; CS3b= 2 for extendible hashing;
? Using an ordering index to retrieve multiple
records For the comparison condition on a key field with an
ordering index, CS4= x + (b/2)
? Using a clustering index to retrieve multiple
records for an equality condition: CS5= x + d(s/bf r)e
? Using a secondary (B+-tree) index: equality
com-parison, CS6a = x + s (option 1 & 2); CS6a = x + s + 1
(option 3); comparison condition such as >, <, >=, <= CS6b=
x + (bI1/2) + (r/2)
? Conjunctive selection using a composite index:
Same as S3a, S5 or S6a, depending on the type of index
• Cost Functions for JOIN: |(R /CS)| = js ∗ |R| ∗ |S|
? Nested-loop join CJ 1 = bR+ (bR∗ bS) + ((js ∗ |R| ∗
|S|)/bf rRS) (Use Rfor outer loop)
? Single-loop join For a secondary index: CJ 2a = bR+
(|R| ∗ (xB+ sB)) + ((js ∗ |R| ∗ |S|)/bf rRS);
For a clustering index : CJ 2a = bR + (|R| ∗ (xB +
(sB/bf rB))) + ((js ∗ |R| ∗ |S|)/bf rRS);
For a primary index: CJ 2a= bR+ (|R| ∗ (xB+ 1)) + ((js ∗
|R| ∗ |S|)/bf rRS);
If a hash key exists for one of the two join attributes —B of
S: CJ 2a= bR+ (|R| ∗ h) + ((js ∗ |R| ∗ |S|)/bf rRS); (h: the average
number of block accesses to retrieve a record, given its hash key
value, h>=1)
? Sort-merge join: CJ 3a = CS+ bR+ bS+ ((js ∗ |R| ∗
|S|)/bf rRS); (CS: Cost for sorting files)
Transaction logical unit of database processing (an atomic unit of work ) that includes one or more access operations (read -retrieval, write -insert or update, delete)
• Why Concurrency Control is needed
?The Lost Update Problem two transactions that access the same database items have their operations interleaved in a way that makes the value of some database item incorrect
?The Temporary Update (or Dirty Read) Problem one transaction updates a database item and then the transac-tion fails for some reason
?The Incorrect Summary Problem one transaction is calculating an aggregate summary function on a number of records while other transactions are updating some of these records
?The unrepeatable Read Problem Transaction Treads the same item twice and the item is changed by another trans-action T’ between the two reads
• Why recovery is needed
? A computer failure (system crash) A hardware or software error occurs in the computer system during transaction execution
? A transaction or system error Some operation in the transaction may cause it to fail, such as integer overflow or divi-sion by zero
?Local errors or exception conditions certain conditions necessitate cancellation of the transaction a programmed abort
in the transaction causes it to fail
? Concurrency control enforcement The concurrency control method may decide to abort the transaction, to be restarted later, because it violates serializability or because sev-eral transactions are in a state of deadlock
? Disk failure Some disk blocks may lose their data because
of a read or write malfunction or because of a disk read/write head crash
? Physical problems and catastrophes This refers to an endless list of problems that includes power or air-conditioning failure, fire, theft, sabotage, overwriting disks or tapes by mis-take, and mounting of a wrong tape by the operator
•Types of log record: [start_transaction,T], [write_item,T,X,old_value,new_value], [read_item,T,X], [com-mit,T], [abort,T]
• Commit Point of a Transaction: all its operations that access the database have been executed successfully and the effect of all the transaction operations on the database has been recorded in the log
• Roll Back of transactions Needed for transactions that have a [start_transaction,T] entry into the log but no commit entry [commit,T] into the log
• ACID properties
? Atomicity: either performed in its entirety or not per-formed at al (Rocovery)
? Consistency preservation take the database from one consistent state to another (programmer, concurrent control,
Trang 3re-covery, toàn vẹn)
?Isolation: the execution of a transaction should not be
in-terfered with by any other transaction executing concurrently
(concurrent control)
?Durability or permanency changes must never be lost
because of subsequent failure (Recovery)
• Recoverable schedule no transaction T in S commits
until all transactions T’ that have written an item that T reads
have committed
• Cascadeless schedule One where every transaction reads
only the items that are written by committed transactions
• Strict Schedules a transaction can neither read or write
an item X until the last transaction that wrote X has committed
Concurrency Control Techniques
• Two-Phase Locking Techniques
? Lock and Unlock are Atomic operations
? A transaction is well-formed if It must lock the data item
before it reads or writes to it and It must not lock an already
locked data items and it must not try to unlock a free data item
? For a transaction these two phases must be mutually
ex-clusively ,that is ,during locking phase unlocking phase must not
start and during unlocking phase locking phase must not begin
? Conservative: Prevents deadlock by locking all desired
data items before transaction begins execution
? Basic: Transaction locks data items incrementally This
may cause deadlock which is dealt with
? Strict: A transaction T does not release any of its
exclu-sive (write) locks until after it commits or aborts
? Rigorous: A Transaction T does not release any of its
locks (Exclusive or shared) until after it commits or aborts
• Dealing with Deadlock and Starvation
? Deadlock prevention A transaction locks all data items
it refers to before it begins execution
? Deadlock detection and resolution The scheduler
maintains a wait-for-graph for detecting cycle
? Deadlock avoidance Wound-Wait (younger is allowed
wait older) and Wait-Die (older is allowed to wait younger)
al-gorithms use times tamps to avoid deadlocks by rolling-back
vic-tim Wound-Wait and wait-die scheme can avoid starvation
• Timestamp Ordering
? write_item(X) If TS(T) > write_TS(X), then delay T
until the transaction T’ that wrote X has terminated (committed
or aborted)
? read_item(X) If TS(T) > write_TS(X), then delay T
until the transaction T’ that wrote X has terminated (committed
or aborted)
Ensures the schedules are both strict and conflict
serializ-able
• Thomas’s Write Rule If read_TS(X) > TS(T) then
abort and roll-back T and reject the operation If write_TS(X)
> TS(T), then just ignore the write operation and continue
ex-ecution because it is already outdated and obsolete
•Multiversion Concurrency Control Techniques: A
read operation in this mechanism is never rejected more
stor-age (RAM and disk) is required a garbstor-age collection is run
A new version of Xi is created only by a write operation
• MultiversionTwo-Phase Locking Using Certify
Locks
? Allow a transaction T’ to read a data item X while it is
write locked by a conflicting transaction T
? This is accomplished by maintaining two versions of each
data item X: 1 One version must always have been written by
some committed transaction This means a write operation
al-ways creates a new version of X 2 The second version created
when a transaction acquires a write lock an the item
? read and write operations from conflicting transactions can
be processed concurrently
? may delay transaction commit because of obtaining
cer-tify locks on all its writes It avoids cascading abort but like
strict two phase locking scheme conflicting transactions may get deadlocked
Intention-shared (IS): indicates that a shared lock(s) will be requested on some descendent nodes(s) Intention-exclusive (IX): indicates that an Intention-exclusive lock(s) will be requested on some descendent node(s) Shared-intention-exclusive (SIX): indicates that the current node is locked in shared mode but an exclusive lock(s) will be requested on some descendent nodes(s)
A node N can be locked by a transaction T in S or IX mode only if the parent node is already locked by T in either IS or IX mode A node N can be locked by T in X, IX, or SIX mode only if the parent of N is already locked by T in either IX or SIX mode
Database Recovery Techniques
? The flushing is controlled by Modified and Pin-Unpin bits
• Data Update
? Immediate Update As soon as a data item is modified
in cache, the disk copy is updated
? Deferred Update All modified data items in the cache
is written either after a transaction ends its execution or after a fixed number of transactions have completed their execution
? Shadow update The modified version of a data item does not overwrite its disk copy but is written at a separate disk location
? In-place update The disk version of the data item is overwritten by the cache version
• Steal/No-Steal and Force/No-Force
? Steal: Cache can be flushed before transaction commits
?No-Steal Cache cannot be flushed before transaction com-mit
? Force: Cache is immediately flushed (forced) to disk be-fore the transaction commit
? No-Force Otherwise
? Steal/No-Force (Undo/Redo) Steal/Force (Undo/No-redo) No-Steal/No-Force (Redo/No-undo) No-Steal/Force (No-undo/No-redo)
• Deferred Update (No Undo/Redo) A set of transac-tions records their updates in the log At commit point under WAL scheme these updates are saved on database disk After reboot from a failure the log is used to redo all the transactions
Trang 4affected by this failure No undo is required because no AFIM is
flushed to the disk before a transaction commits
? This environment requires some concurrency control
mech-anism to guarantee isolationproperty of transactions In a
sys-tem recovery transactions which were recorded in the log after
the last checkpoint were redone The recovery manager may scan
some of the transactions recorded before the checkpoint to get
the AFIMs
? Active table: All active transactions are entered in this
ta-ble Commit table: Transactions to be committed are entered in
this table During recovery, all transactions of the committable
are redone and all transactions of activetables are ignored since
none of their AFIMs reached the database
• Immediate Update
? Undo/Redo Algorithm(Single-user environment)
Undo of a transaction if it is in the active table Redo of a
trans-action if it is in the committable
? Undo/Redo Algorithm(Concurrent execution) To
minimize the work of the recovery manager check pointing is
used
• ARIES Recovery Algorithm Steal/no-force
(UNDO/REDO)
? consists of three steps: 1 Analysis: step identifies the dirty
(updated) pages in the buffer and the set of transactions active
at the time of crash The appropriate point in the log where redo
is to start is also determined 2 Redo: necessary redo operations are applied 3 Undo: log is scanned backwards and the opera-tions of transacopera-tions active at the time of crash are undone in reverse order ? A log record is written for: data update , transaction commit, transaction abort, undo (In the case of undo
a compensating log record is written)
• The following steps are performed for recovery
? Analysis phase :Start at the begin_checkpoint record and proceed to the end_checkpoint record end_checkpoint record Access transaction table and dirty page table are ap-pended to the end of the log Modify transaction table and dirty page table: An end log record was encountered for T → delete entry T from transaction table Some other type of log record is encountered for T’ → insert an entry T’ into transaction table
if not already The log record corresponds to a change for page
P → insert an entry P (if not present) with the associated LSN
in dirty page table present, or the last LSN is modified Redo phase: Starts redoing at a point in the log where it knows that previous changes to dirty pages have already been applied to disk Finding the smallest LSN, M of all the dirty pages in the Dirty Page Table A change recorded in the log pertains to the page P that is not in the Dirty Page Table → no redo A change recorded in the log (LSN = N) pertain to Page P and the Dirty Page Table contains an entry for P with LSN > N → no redo Page P is read from disk and the LSN stored on that page > N
→ no redo