• Each cell is associated with one bucket, but a bucket may contain several adjacent cells • Since the directory may grow large, it is usually kept on secondary storage • To guarantee t
Trang 1Nhóm 1 :
Lâm Tu n Anhấ
Nguy n Đình Tân Anhễ
Lê Minh Châu
Point Access Method
Trang 21 Spatial Data
2 Main Memory Structure
3 Point Access Methods
Point Access Method
Trang 3Spatial Data
Trang 4• Complex Structure
• Dynamic
• Spatial databases tend to be large
• There is no standard algebra defined on spatial data
• Many spatial operators are not closed
• Spatial database operators more expensive than standard relational operators
• There is no total order among spatial object
Characteristic of Spatial Data
Trang 5Queries in Spatial Data
Trang 6• Exact Match Query ( EMQ )
• Condition : Given object o’ with spatial extent o’.G in Euclide with d-dimension
• Target : Find all objects o with same spatial extent as o’
• Query
• Point Query (PQ )
• Condition : Given a point p in Euclide with d-dimension
• Target : Find all objects o ovelapping with p
• Query
Queries in Spatial Data
Trang 7• Enclosure Query ( EQ )
• Condition : Given object o’ with spatial extent o’.G in Euclide with d-dimension
• Target : Find all objects o enclosing o’
• Query
Queries in Spatial Data
Trang 9• Requirements for Multidimensional Access Methods
• Dynamics
• Secondary/tertiary storage management
• Broad range of supported operations
• Independence of the input data and insertion sequence
Trang 10Main Memory Structure
ith point: pi ith polygon: ri ith centroid: ci ith minimum bounding
box: mi
Figure 9 Running example.
Trang 11Main Memory Structure
ith point: pi ith polygon: ri ith centroid: ci ith minimum bounding
box: mi
Figure 9 Running example.
Trang 12Main Memory Structure
ith point: pi ith polygon: ri ith centroid: ci ith minimum bounding
box: mi
Figure 10 k-d construction
Trang 13Main Memory Structure
ith point: pi ith polygon: ri ith centroid: ci ith minimum bounding
box: mi
Figure 11 k-d tree
Trang 14Main Memory Structure
• Designed for main memory applications where all the data are available without accessing the disk
• Do not take secondary storage management into account explicitly
• In many spatial database applications the amount of data to be managed is notoriously large
Trang 15• Multidimensional Hashing
• Hierarchical Access Method
Point Access Methods
Trang 16• No total order for objects in two- and higher-dimensional space that completely preserves spatial proximity
• Try to construct hashing functions that preserve proximity at least to some extent
• Goal: Objects located close to each other in original space should be likely to be stored close together on the disk
• =>minimizing the number of disk accesses per range query
Multidimensional Hashing
Trang 17• A d-dimensional orthogonal grid on the universe.
• The grid is not necessarily regular, the resulting cells may be of different shapes and sizes.
• Each cell is associated with one bucket, but a bucket may contain several adjacent cells
• Since the directory may grow large, it is usually kept on secondary storage
• To guarantee that data items are always found with no more than two disk accesses for exact match queries, the grid itself is kept in main memory, represented by d one-dimensional arrays called scales
The Grid File
Trang 18• decomposes the universe regularly: all grid cells are of equal size
• each new split results in the halving of all cells and therefore in the doubling of the directory size
EXCELL
Trang 19• Use a second grid file to
manage the grid directory
• The first of the two levels is called the root directory,
• Second level: the actual grid directory
• root directory contain
• pointers to the directory
pages of the lower level,
which in turn contain
pointers to the data pages
• Splits are often confined to the subdirectory regions
without affecting too much the surroundings
• =>slower directory growth
• not solve the problem of
super linear directory size
The Two-Level Grid File
Trang 20• increase space utilization by introducing a second grid file
• relationship between these two grid files is not hierarchical but somewhat more balanced
• Both grid files span the whole universe
• The distribution of the data among the two files is performed dynamically
The Twin Grid File
Trang 21• Based on binary of multi-way tree structure
• like hashing, stores data in bucket
• each bucket is leaf of a node, and a disk page
• interior nodes of the tree guide search
• search: top-down tree traversal
• difference between different methods: characteristics of the regions
Hierarchical Access Method
Trang 22• k-d-B-tree
• combination of adaptive k-d-tree and B-tree
• partition the universe like adaptive k-d
• associates subspaces to tree nodes
• interior nodes are intervals
• nodes in same level are mutually disjoint
• perfectly balanced (like B-tree)
• search straightforward, like k-d-tree
• insert: search, find the right bucket, if required split and move half the data to it.
• Deletion: search, remove, if necessary merge node with siblings
Hierarchical Access Method
Trang 23• k-d-B-tree
• combination of adaptive k-d-tree and B-tree
• partition the universe like adaptive k-d
• associates subspaces to tree nodes
• interior nodes are intervals
• nodes in same level are mutually disjoint
• perfectly balanced (like B-tree)
• search straightforward, like k-d-tree
• insert: search, find the right bucket, if required split and move half the data to it.
• Deletion: search, remove, if necessary merge node with siblings
Hierarchical Access Method
Trang 24• LSD tree
• directory is organized same as adaptive k-d-tree
• better adaptation to data distribution (in compare to fixed binary partitioning)
• external balancing property: heights of external subtrees differ at most by one
• combines two split strategies to accommodate skewed data:
• data-dependent : based on data, tries to achieve most balanced structure (equal number of data in both sides of split)
• distribution-dependent: split at fixed dimension and position (know distribution is assumed)
Hierarchical Access Method
Trang 26•Buddy tree:
• dynamic hashing scheme with tree structure (hybrid)
• tree is made by consecutive insertions
• cut the universe equally with iso-oriented hyperplanes
• interior nodes: a partition and an interval (MBB of points or intervals below node)
• intervals in same level nodes are mutually disjoint
• leaves are data (like other trees!!)
• each directory node has at least two entries
=> may not be balanced
• when a node splits, MBB of two intervals are computed to reflect the current situation
=> tries to achieve high selectivity at directory level
• except for root, only one pointer refers to each directory page
=> guarantees linear growth
Trang 27Hierarchical Access Method
Trang 28• BANG file (Balanced and Nested Grid)
• a hybrid method
• divides the univers to intervals (boxes), similar to grid
• difference: buckets regions may intersect
• can form nonrectangular bucket regions by taking geometric difference of two intervals (nesting)
• increased storage utilization: redistributes data between bucket during insertion
• balanced search tree to manage directory
Trang 29Hierarchical Access Method
• BANG file (Balanced and Nested Grid)
• first 3 rectangles: R1: R2, R5, R6
• then R3 and R4 in R2 and R5
• representation as bit interleaving
• * = universe
• a point search may require traversal of entire directory in depth-first manner
Trang 30Hierarchical Access Method
• hB-tree
• utilizes k-d-B-tree to organize the space represented by interior nodes
• difference in splitting: based on multiple attributes
• region not boxed shape
Trang 31Hierarchical Access Method
• BV-tree
• tries to solve d-dimensional B-tree
• idea: maintain major strengths of Btree, by relaxing balancing and space utilization
• BV-tree not balanced
• at least 33% space utilization (50% for B-tree)
Trang 32• What form of Point Query in Spatial Data ?
• What methods belong to Multidimensional Hashing Method?
• Grid File
• K-d Tree
• Linear Hashing
• EXCELL