One-dimensional Indexing for P2P designs

In DBMS, with one-dimensional index, a value of one field in the relational table, is termed as “primary key” and is required to be distinct for all records. B-tree index, hashing index are two representative approaches to index the data sets based on the primary key.

In P2P world, recent DHT-based Peer-to-Peer systems such as CAN [5], Chord [7] or Tapestry [12], essentially provide a one-dimensional indexing mechanism to locate data present in the system by implementing a Distributed Hashing Table. DHT is actually a data structure for storing of pairs (key, data) in a distributed manner, which allows fast locating data when a key is given. As an example, it is possible to search in

Gnutella for music by Celine Dion. In such a one-dimensional distributed indexing infrastructure, given a specific value of the key, a list of peers that stores the request can be located with performance guarantees in terms of logical hops. While one-dimensional indexing provides efficient retrieval on the primary key, retrieval by the combination or correlation of several attributes requires to query several independent DHTs, each for one attribute and then concatenates the results in a database–like “join” operation.

2.3 Multi-dimensional indexing for P2P system

Multi-dimensional index is based on the notion that more than one field or attribute of a record type are specified as constituting the primary key set for records of that type and that each key in the key set plays an equal role in determining the physical placement of a record.

With multi-dimensional indexing, uniqueness is required for the whole set of the selected attributes. Tuples can be retrieved on partially specified keys. In particular, for a P2P system with multi-dimensional indexing, a multi-dimensional query does not require querying several indexing infrastructures (i.e. DHT) and further avoids the expensive “join” operation, while keeping all the functionality of one-dimensional indexing. This is especially meaningful for practical P2P systems in which users often issue multi-dimensional queries. In database area, many kinds of multi-dimensional index methods have been proposed, such as Bang file, Grid file, Z-ordering and R-tree.

A review of these methods is given by Gaede and Günther [14]. However, there is not multi-dimensional indexing infrastructure over P2P systems yet.

2.4 Multi-dimensional Indexing using Hilbert Space Filling Curve

Emerged in the 19-th century, space-filling curve is originally proposed by Peano[15]

and David Hilbert [16] gave its first geometrical representation. Fig.2.1 reproduces the figures in Hilbert’s paper which shows the first three order Hilbert Curves in two-dimensional space, except that the sequence number of the points starts from 0, instead of 1. More recent development and application of Hilbert space filling curve refers to Lawder [17].

(a) (b) (c) Figure.2.1 Hilbert Curve in Two-dimensional Space

The production of Hilbert Curve follows a recursive way. Fig.2.1 (a) shows the first order Curve in the two-dimensional space consisting of four points. The curve is also

divided into four segments. The one to one mapping between sub-squares and curve segments indicates that the counterpart for adjacent line intervals is always adjacent sub-squares. The line passes through the centers of the sub-squares assigns the ordering of the sub-squares. Fig.2.1 (b) shows the next recursive step. This is the second order curve. Compare Fig.2.1 (b) and Fig.2.1 (a), we can find that each sub-square and its corresponding line intervals have been further divided. Fig.2.1 (c) indicates the third step in the sequence.

Because we can view the point in the line as the limit of infinite line intervals and also the point in the square can be viewed as the limit of infinite sub-squares, as the recursion process continues infinitely, the one-to-one mapping between the points in the square and all the points in the line is established. Further, because the point on the line has fixed order, its corresponding point in the square is assigned an order. This is known as Hilbert Curve. This curve passes through all the points in the space once and gives each point a unique sequence number called sequence number and thus named as space-filling curve. For detail about space filling curves, see sagan [18],Lawder [17].

One important application of Hilbert curve lies in embedding multi-dimensional points into one dimension, while preserving spatial proximity to the largest possible extent and so that the points are clustered in groups of similar points