Tài liệu Cơ sở dữ liệu hình ảnh P11 doc

In the spatial and featureSaFe project, Smith and Chang designed a 166-bin color descriptor in HSVcolor space and developed methods for graphically constructing content-basedqueries that

Trang 1

Image Databases: Search and Retrieval of Digital Imagery

Edited by Vittorio Castelli, Lawrence D Bergman Copyright  2002 John Wiley & Sons, Inc ISBNs: 0-471-32116-8 (Hardback); 0-471-22463-4 (Electronic)

inte-of key word or text-based annotations to completely, consistently, and

objec-tively describe the content of images Although perceptual features such as colordistributions and color layout often provide a poor characterization of the actualsemantic content of the images, content-based query appears to be effective forindexing and rapidly accessing images based on the similarity of visual features

11.1.1 Content-Based Query Systems

The seminal work on content-based query of image databases was carried out

in the IBM query by image content (QBIC) project [2,8] The QBIC projectexplored methods for searching for images based on the similarity of globalimage features of color, texture, and shape The QBIC project developed a novelmethod of prefiltering of queries that greatly reduces the number of target imagessearched in similarity queries [9] The MIT Photobook project extended some ofthe early methods of content-based query by developing descriptors that provideeffective matching as well as the ability to reconstruct the images and theirfeatures from the descriptors [5] Smith and Chang developed a fully automated

based query system called VisualSEEk, which further extended

content-based querying of image databases by extracting regions and allowing searchingbased on their spatial layout [10] Other content-based image database systems

285

Trang 2

such as WebSEEk [11] and ImageRover [12] have focused on indexing andsearching of images on the World Wide Web More recently, the MPEG-7 “Multi-media Content Description Interface” standard provides standardized descriptorsfor color, texture, shape, motion, and other features of audiovisual data to enablefast and effective content-based searching [13].

11.1.2 Content-Based Query-by-Color

The objective of content-based query-by-color is to return images, color features

of which are most similar to the color features of a query image Swain andBallard investigated the use of color histogram descriptors for searching of colorobjects contained within the target images [3] Stricker and Orengo developedcolor moment descriptors for fast similarity searching of large image databases[14] Later, Stricker and Dimai developed a system for indexing of color imagesbased on the color moments of different regions [15] In the spatial and feature(SaFe) project, Smith and Chang designed a 166-bin color descriptor in HSVcolor space and developed methods for graphically constructing content-basedqueries that depict spatial layout of color regions [7] Each of these approaches forcontent-based query-by-color involves the design of color descriptors, includingthe selection of the color feature space and a distance metric for measuring thesimilarity of the color features

11.1.3 Outline

This chapter investigates methods for content-based query of image databasesbased on color features of images In particular, the chapter focuses on the designand extraction of color descriptors and the methods for matching The chapter isorganized as follows Section 11.2 analyzes the three main aspects of color featureextraction, namely, the choice of a color space, the selection of a quantizer, andthe computation of color descriptors Section 11.3 defines and discusses severalsimilarity measures and Section 11.4 evaluates their usefulness in content-basedimage-query tasks Concluding remarks and comments for future directions aregiven in Section 11.5

11.2 COLOR DESCRIPTOR EXTRACTION

Color is an important dimension of human visual perception that allows crimination and recognition of visual information Correspondingly, color featureshave been found to be effective for indexing and searching of color images inimage databases Generally, color descriptors are relatively easily extracted andmatched and are therefore well-suited for content-based query Typically, thespecification of a color descriptor1 requires fixing a color space and determiningits partitioning

dis-1 In this chapter we use the term “feature” to mean a perceptual characteristic of images that signifies something to human observers, whereas “descriptor” means a numeric quantity that describes a feature.

Trang 3

COLOR DESCRIPTOR EXTRACTION 287

Images can be indexed by mapping their pixels into the quantized color spaceand computing a color descriptor Color descriptors such as color histograms can

be extracted from images in different ways For example, in some cases, it isimportant to capture the global color distribution of an image In other cases,

it is important to capture the spatially localized apportionment of the colors todifferent regions In either case, because the descriptors are ultimately represented

as points in a multidimensional space, it is necessary to carefully define themetrics for determining descriptor similarity

The design space for color descriptors, which involves specification of thecolor space, its partitioning, and the similarity metric, is therefore quite large.There are a few evaluation points that can be used to guide the design Thedetermination of the color space and partitioning can be done using color experi-ments that perceptually gauge intra and interpartition distribution of colors Thedetermination of the color descriptors can be made using retrieval-effectivenessexperiments in which the content-based query-by-color results are compared toknown ground truth results for benchmark queries The image database systemcan be designed to allow the user to select from different descriptors based onthe query at hand Alternatively, the image database system can use relevancefeedback to automatically weight the descriptors or select metrics based on userfeedback [16]

is perceived through three independent color receptors that have peak response

at approximately red (r), green (g), and blue (b) wavelengths: λr= 700 nm,

λg= 546.1 nm, λb= 435.8 nm, respectively By assigning to each primary color receptor a response function c k (λ) , where k ∈ {r, b, g}, the linear superposition

of the c k (λ) ’s represents visible light F (λ) of any color or wavelength λ [17].

By normalizing c k (λ) ’s to reference white light W (λ) such that

W (λ) = c r (λ) + c g (λ) + c b (λ), ( 11.1) the colored light F (λ) produces the tristimulus responses (R, G, B) such that

F (λ) = R c r (λ) + G c g (λ) + B c b (λ) ( 11.2)

As such, any color can be represented by a linear combination of the three primary

colors (R, G, B) The space spanned by the R, G, and B values completely describe visible colors, which are represented as vectors in the 3D RGB color space As a result, the RGB color space provides a useful starting point for representing color features of images However, the RGB color space is not

Trang 4

perceptually uniform More specifically, equal distances in different areas and

along different dimensions of the 3D RGB color space do not correspond to equal

perception of color dissimilarity The lack of perceptual uniformity results in theneed to develop more complex vector quantization to satisfactorily partition the

RGB color space to form the color descriptors Alternative color spaces can be

generated by transforming the RGB color space However, as yet, no consensus

has been reached regarding the optimality of different color spaces for based query-by-color The problem originates from the lack of any known singleperceptually uniform color space [18] As a result, a large number of color spaceshave been used in practice for content-based query-by-color

content-In general, the RGB colors, represented by vectors vc, can be mapped to

different color spaces by means of a color transformation T c The notation wc

indicates the transformed colors The simplest color transformations are linear

For example, linear transformations of the RGB color spaces produce a number

of important color spaces that include Y I Q (NTSC composite color TV standard),

Y U V (PAL and SECAM color television standards), Y CrCb (JPEG digital image

coding standard and MPEG digital video coding standard), and opponent color

space OP P [19] Equation (11.3) gives the matrices that transform an RGB vector into each of these color spaces The Y I Q, Y U V , and Y CrCb linear

color transforms have been adopted in color picture coding systems These lineartransforms, each of which generates one luminance channel and two chrominancechannels, were designed specifically to accommodate targeted display devices:

Y I Q — NTSC color television, Y U V — PAL and SECAM color television, and

Y CrCb— color computer display Because none of the color spaces is uniform,color distance does not correspond well to perceptual color dissimilarity

The opponent color space (OP P ) was developed based on evidence that

human color vision uses an opponent-color model by which the responses of the

R , G, and B cones are combined into two opponent color pathways [20] One benefit of the OP P color space is that it is obtained easily by linear transform.

The disadvantages are that it is neither uniform nor natural The color distance

in OP P color space does not provide a robust measure of color dissimilarity One component of OP P , the luminance channel, indicates brightness The two

chrominance channels correspond to blue versus yellow and red versus green

T c Y I Q=



0.299 0.596 −0.274 −0.322 0.587 0.114 0.211 −0.523 0.312

Trang 5

T c OP P =



−0.500 −0.500 0.333 0.333 0.333 1.000 0.500 −1.000 0.500



Although these linear color transforms are the simplest, they do not generatenatural or uniform color spaces The Munsell color order system was desined to

be natural, compact, and complete The Munsell color order rotational system

organizes the colors according to natural attributes [21] Munsell’s Book of Color

[22] contains 1,200 samples of color chips, each with a value of hue, saturation,and chroma The chips are spatially arranged (in three dimensions) so that stepsbetween neighboring chips are perceptually equal

The advantage of the Munsell color order system results from its ordering of afinite set of colors by perceptual similarities over an intuitive three-dimensionalspace The disadvantage is that the color order system does not indicate how to

transform or partition the RGB color space to produce the set of color chips.

Although one transformation, named the mathematical transform to Munsell

(MTM), from RGB to Munsell H V C was investigated for image data by hara [23], there does not exist a simple mapping from color points in RGB color

Miya-space to Munsell color chips Although the Munsell Miya-space was designed to becompact and complete, it does not satisfy the property of uniformity The colororder system does not provide for the assessment of the similarity of color chipsthat are not neighbors

Other color spaces such as H SV , CIE 1976 (L∗a∗b∗), and CIE 1976 (L∗u∗v∗)

are generated by nonlinear transformation of the RGB space With the goal of

deriving uniform color spaces, the CIE2 in 1976 defined the CIE 1976 (L∗u∗v∗)

and CIE 1976 (L∗a∗b∗) color spaces [24] These are generated by a linear

trans-formation from the RGB to the XY Z color space, followed by a different

nonlinear transformation The CIE color spaces represent, with equal emphasis,

the three characteristics that best characterize color perceptually: hue, lightness, and saturation However, the CIE color spaces are inconvenient because of the necessary nonlinearity of the transformations to and from the RGB color

space

Although the determination of the optimum color space is an open problem,certain color spaces have been found to be well-suited for content-based query-

by-color In Ref [25], Smith investigated one form of the hue, lightness, and

saturation transform from RGB to H SV , given in Ref [26], for content-based

query-by-color The transform to H SV is nonlinear, but it is easily invertible The

H SV color space is natural and approximately perceptually uniform Therefore,

the quantization of H SV can produce a collection of colors that is also compact and complete Recognizing the effectiveness of the H SV color space for content- based query-by-color, the MPEG-7 has adopted H SV as one of the color spaces

for defining color descriptors [27]

2 Commission Internationale de l’Eclairage

Trang 6

11.2.2 Color Quantization

By far, the most common category of color descriptors are color histograms Colorhistograms capture the distribution of colors within an image or an image region.When dealing with observations from distributions that are continuous or that cantake a large number of possible values, a histogram is constructed by associatingeach bin to a set of observation values Each bin of the histogram contains thenumber of observations (i.e., the number of image pixels) that belong to the asso-ciated set Color belongs to this category of random variables: for example, thecolor space of 24-bit images contains 224distinct colors Therefore, the partitioning

of the color space is an important step in constructing color histogram descriptors

As color spaces are multidimensional, they can be partitioned by dimensional scalar quantization (i.e., by quantizing each dimension separately) or

multi-by vector quantization methods By definition, a vector quantizer Q cof dimension

k and size M is a mapping from a vector in k-dimensional space into a finite setC

that contains M outputs [28] Thus, a vector quantizer is defined as the mapping

Q c:k →C, whereC= (y0, y1, , y M−1) and each ym is a vector in the

k-dimensional Euclidean space k The set C is customarily called a codebook, and its elements are called code words In the case of vector quantization of the color space, k = 3 and each code word ym is an actual color point Therefore,the codebook Crepresents a gamut or collection of colors

The quantizer partitions the color space k into M disjoint sets R m, one percode word that completely covers it:

All the transformed color points wc belonging to the same partition R m are

quantized to (i.e., represented by) the same code word ym:

parti-which indicates the amount of presence of a color The angle around the axis

is the hue, indicating tint or tone As the hue represents the most perceptually

significant characteristic of color, it requires the finest quantization As shown

in Figure 11.1, the primaries, red, green, and blue, are separated by 120 degrees

in the hue circle A circular quantization at 20-degree steps separates the hues

so that the three primaries and yellow, magenta, and cyan are each representedwith three subdivisions The other color dimensions are quantized more coarsely

Trang 7

vˆc = (r, g, b)

wˆc = T · nˆ c

wˆc = (h, s, v) G

R B

V g b

r

Figure 11.1 The transformation T H SV

c from RGB to H SV and quantization Q166

c gives

166 H SV colors = 18 hues × 3 saturations × 3 values + 4 grays A color version of this figure can be downloaded from ftp://wiley.com/public/sci tech med/image databases.

because the human visual system responds to them with less discrimination; we

use three levels each for value and saturation This quantization, Q166c , provides

M = 166 distinct colors in HSV color space, derived from 18 hues (H) × 3 saturations (S) × 3 values (V) + 4 grays [29].

11.2.3 Color Descriptors

A color descriptor is a numeric quantity that describes a color feature of animage As with texture and shape, it is possible to extract color descriptors fromthe image as a whole, producing a global characterization; or separately fromdifferent regions, producing a local characterization Global descriptors capturethe color content of the entire image but carry no information on the spatiallayout, whereas local descriptors can be used in conjunction with the positionand size of the corresponding regions to describe the spatial structure of theimage color

histograms or derived quantities As previously mentioned, mapping the image

to an appropriate color space, quantizing the mapped image, and counting howmany times each quantized color occurs produce a color histogram Formally, if

I denotes an image of size W × H, I q (i, j ) is the color of the quantized pixel

at position i, j , and y m is the mth code word of the vector quantizer, the color histogram h c has entries defined by

where the Kronecker delta function, δ( ·, ·), is equal to 1 if its two arguments are

equal, and zero otherwise

The histogram computed using Eq 11.6 does not define a distribution becausethe sum of the entries is not equal to 1 but is the total number of pixels of the

Trang 8

image This definition is not conducive to comparing color histograms of imageshaving different size To allow matching, the following class of normalizationscan be used:

with r = 2 are unit vectors in the M-dimensional Euclidean space, namely, they

lie on the surface of the unit sphere The similarity between two such histogramscan be represented, for example, by the angle between the corresponding vectors,captured by their inner product

globally is that it does not take into account the spatial distribution of coloracross different areas of the image A number of methods have been developedfor integrating color and spatial information for content-based query Sticker andDimai developed a method for partitioning each image into five nonoverlappingspatial regions [15] By extracting color descriptors from each of the regions, thematching can optionally emphasize some regions or can accommodate matching

of rotated or flipped images Similarly, Whsu and coworkers developed a methodfor extracting color descriptors from local regions by imposing a spatial grid onimages [30] Jacobs and coworkers developed a method for extracting colordescriptors from wavelet-transformed images, which allows fast matching of theimages based on location of color [31] Figure 11.2 illustrates an example ofextracting localized color descriptors in ways similar to that explored in [15] and[30], respectively The basic approach involves the partitioning of the image intomultiple regions and extracting a color descriptor for each region Correspondingregion-based color descriptors are compared in order to assess the similarity oftwo images

Figure 11.2a shows a partitioning of the image into five regions: r0–r4, in

which a single center region, r0, captures the color features of any center object.Figure 11.2b shows a partitioning of the image into sixteen uniformly spaced

regions: g0–g15 The dissimilarity of images based on the color spatial descriptorscan be measured by computing the weighted sum of individual region dissimi-larities as follows:

descriptor of region m of the target image, and w m is the weight of the m-th

distance and satisfies w m= 1

Alternately, Smith and Chang developed a method by matching images based

on extraction of prominent single regions, as shown in Figure 11.3 [32] The

Trang 9

descriptors A color version of this figure can be downloaded from ftp://wiley.com/public/

sci tech med/image databases.

Region extraction

Spatial composition

Region extraction

Spatial composition

Figure 11.3 The integrated spatial and color feature query approach matches the images

by comparing the spatial arrangements of regions.

Trang 10

VisualSEEk content-based query system allows the images to be matched bymatching the color regions based on color, size, and absolute and relative spatiallocation [10] In [7], it was reported that for some queries the integrated spatialand color feature query approach improves retrieval effectiveness substantiallyover content-based query-by-color using global color histograms.

11.3 COLOR DESCRIPTOR METRICS

A color descriptor metric indicates the similarity, or equivalently, the dissimilarity

of the color features of images by measuring the distance between colordescriptors in the multidimensional feature space Color histogram metrics can

be evaluated according to their retrieval effectiveness and their computationalcomplexity Retrieval effectiveness indicates how well the color histogrammetric captures the subjective, perceptual image dissimilarity by measuring theeffectiveness in retrieving images that are perceptually similar to query images.Table 11.1 summarizes eight different metrics for measuring the dissimilarity ofcolor histogram descriptors

11.3.1 Minkowski-Form Metrics

The first category of metrics for color histogram descriptors is based on the

Minkowski-form metric Let hq and htbe the query and target color histograms,respectively Then

A Minkowski metric compares the proportion of a specific color within image q

to the proportion of the same color within image t, but not to the proportions of

Table 11.1 Summary of the Eight Color Histogram Descriptor Metrics (D1–D8)

D1 Histogram L1distance Minkowski-form (r= 1) D2 Histogram L2 distance Minkowski-form (r= 2) D3 Binary set Hamming distance Binary Minkowski-form (r= 1) D4 Histogram quadratic distance Quadratic-form

D5 Binary set quadratic distance Binary quadratic-form

D6 Histogram Mahalanobis distance Binary quadratic-form

D7 Histogram mean distance First moment

D8 Histogram moment distance Higher moments

Trang 11

COLOR DESCRIPTOR METRICS 295

m

hq[m]

ht[m]

Figure 11.4 The Minkowski-form metrics compare only the corresponding-color bins

between the color histograms As a result, they are prone to false dismissals when images have colors that are similar but not identical.

other similar colors Thus, a Minkowski distance between a dark red image and

a lighter red image is measured to be the same as the distance between the samedark red image and a perceptually more different blue image

for color image retrieval by Swain and Ballard in [3] Their objective was to find

known objects within images using color histograms When the object (q) size is less than the image (t) size and the color histograms are not normalized, |hq| isless than or equal to|ht| (where |h| denotes the sum of the histogram-cell values,

As defined, Eq (11.10) is not a distance metric because it is not symmetric:

d q,t = d t,q However, Eq (11.10) can be modified to produce a metric by making

it symmetric in hq and ht as follows:

Trang 12

Alternatively, when the color histograms are normalized, so that|hq| = |ht|, both

Eq (11.10) and Eq (11.11) are metrics It is shown in [33] that, when|hq| = |ht|,the color histogram intersection is given by

“walk” or “city block” distance

between two color histograms hq and ht is a Minkowski-form metric Eq (11.9)

color histograms using binary sets was investigated by Smith [25] Binary setscount the number of colors with a frequency of occurrence within the image

exceeding a predefined threshold T As a result, binary sets indicate the presence

of each color but do not indicate an accurate degree of presence More formally,

a binary set s is an M-dimensional binary vector with an i-th entry equal to 1 if the i-th entry of the color histogram h exceeds T and equal to zero otherwise.

The binary set Hamming distance (D3) between sq and st is given by

D3(q, t)= |sq − st|

|sq||st| , ( 11.16)

Trang 13

COLOR DESCRIPTOR METRICS 297

where, again,|·| denotes the sum of the elements of the vector As the vectors sq

and st are binary, the Hamming distance can be determined by the bit differencebetween the binary vectors Therefore, D3 can be efficiently computed using

an exclusive OR operator ( ), which sets a one in each bit position where its

operands have different bit values, and a zero where they are the same, as follows:

system developed a quadratic-form metric for color histogram–based imageretrieval [2] Reference [34] reports that the quadratic-form metric between colorhistograms provides more desirable results than “like-color”-only comparisons

The quadratic-form distance between color histograms hq and ht is given by:

Figure 11.5 Quadratic-form metrics compare multiple bins between the color histograms

using a similarity matrix A = [a ij], which can take into account color similarity or color covariance.

Tiêu đề	Color for Image Retrieval
Tác giả	John R. Smith
Người hướng dẫn	Vittorio Castelli, Editor, Lawrence D. Bergman, Editor
Trường học	IBM T.J. Watson Research Center
Chuyên ngành	Image Databases
Thể loại	Chương
Năm xuất bản	2002
Thành phố	Hawthorne

Định dạng
Số trang	27
Dung lượng	323,95 KB