In the spatial and featureSaFe project, Smith and Chang designed a 166-bin color descriptor in HSVcolor space and developed methods for graphically constructing content-basedqueries that
Trang 1Image Databases: Search and Retrieval of Digital Imagery
Edited by Vittorio Castelli, Lawrence D Bergman Copyright 2002 John Wiley & Sons, Inc ISBNs: 0-471-32116-8 (Hardback); 0-471-22463-4 (Electronic)
inte-of key word or text-based annotations to completely, consistently, and
objec-tively describe the content of images Although perceptual features such as colordistributions and color layout often provide a poor characterization of the actualsemantic content of the images, content-based query appears to be effective forindexing and rapidly accessing images based on the similarity of visual features
11.1.1 Content-Based Query Systems
The seminal work on content-based query of image databases was carried out
in the IBM query by image content (QBIC) project [2,8] The QBIC projectexplored methods for searching for images based on the similarity of globalimage features of color, texture, and shape The QBIC project developed a novelmethod of prefiltering of queries that greatly reduces the number of target imagessearched in similarity queries [9] The MIT Photobook project extended some ofthe early methods of content-based query by developing descriptors that provideeffective matching as well as the ability to reconstruct the images and theirfeatures from the descriptors [5] Smith and Chang developed a fully automated
based query system called VisualSEEk, which further extended
content-based querying of image databases by extracting regions and allowing searchingbased on their spatial layout [10] Other content-based image database systems
285
Trang 2such as WebSEEk [11] and ImageRover [12] have focused on indexing andsearching of images on the World Wide Web More recently, the MPEG-7 “Multi-media Content Description Interface” standard provides standardized descriptorsfor color, texture, shape, motion, and other features of audiovisual data to enablefast and effective content-based searching [13].
11.1.2 Content-Based Query-by-Color
The objective of content-based query-by-color is to return images, color features
of which are most similar to the color features of a query image Swain andBallard investigated the use of color histogram descriptors for searching of colorobjects contained within the target images [3] Stricker and Orengo developedcolor moment descriptors for fast similarity searching of large image databases[14] Later, Stricker and Dimai developed a system for indexing of color imagesbased on the color moments of different regions [15] In the spatial and feature(SaFe) project, Smith and Chang designed a 166-bin color descriptor in HSVcolor space and developed methods for graphically constructing content-basedqueries that depict spatial layout of color regions [7] Each of these approaches forcontent-based query-by-color involves the design of color descriptors, includingthe selection of the color feature space and a distance metric for measuring thesimilarity of the color features
11.1.3 Outline
This chapter investigates methods for content-based query of image databasesbased on color features of images In particular, the chapter focuses on the designand extraction of color descriptors and the methods for matching The chapter isorganized as follows Section 11.2 analyzes the three main aspects of color featureextraction, namely, the choice of a color space, the selection of a quantizer, andthe computation of color descriptors Section 11.3 defines and discusses severalsimilarity measures and Section 11.4 evaluates their usefulness in content-basedimage-query tasks Concluding remarks and comments for future directions aregiven in Section 11.5
11.2 COLOR DESCRIPTOR EXTRACTION
Color is an important dimension of human visual perception that allows crimination and recognition of visual information Correspondingly, color featureshave been found to be effective for indexing and searching of color images inimage databases Generally, color descriptors are relatively easily extracted andmatched and are therefore well-suited for content-based query Typically, thespecification of a color descriptor1 requires fixing a color space and determiningits partitioning
dis-1 In this chapter we use the term “feature” to mean a perceptual characteristic of images that signifies something to human observers, whereas “descriptor” means a numeric quantity that describes a feature.
Trang 3COLOR DESCRIPTOR EXTRACTION 287
Images can be indexed by mapping their pixels into the quantized color spaceand computing a color descriptor Color descriptors such as color histograms can
be extracted from images in different ways For example, in some cases, it isimportant to capture the global color distribution of an image In other cases,
it is important to capture the spatially localized apportionment of the colors todifferent regions In either case, because the descriptors are ultimately represented
as points in a multidimensional space, it is necessary to carefully define themetrics for determining descriptor similarity
The design space for color descriptors, which involves specification of thecolor space, its partitioning, and the similarity metric, is therefore quite large.There are a few evaluation points that can be used to guide the design Thedetermination of the color space and partitioning can be done using color experi-ments that perceptually gauge intra and interpartition distribution of colors Thedetermination of the color descriptors can be made using retrieval-effectivenessexperiments in which the content-based query-by-color results are compared toknown ground truth results for benchmark queries The image database systemcan be designed to allow the user to select from different descriptors based onthe query at hand Alternatively, the image database system can use relevancefeedback to automatically weight the descriptors or select metrics based on userfeedback [16]
is perceived through three independent color receptors that have peak response
at approximately red (r), green (g), and blue (b) wavelengths: λr= 700 nm,
λg= 546.1 nm, λb= 435.8 nm, respectively By assigning to each primary color receptor a response function c k (λ) , where k ∈ {r, b, g}, the linear superposition
of the c k (λ) ’s represents visible light F (λ) of any color or wavelength λ [17].
By normalizing c k (λ) ’s to reference white light W (λ) such that
W (λ) = c r (λ) + c g (λ) + c b (λ), ( 11.1) the colored light F (λ) produces the tristimulus responses (R, G, B) such that
F (λ) = R c r (λ) + G c g (λ) + B c b (λ) ( 11.2)
As such, any color can be represented by a linear combination of the three primary
colors (R, G, B) The space spanned by the R, G, and B values completely describe visible colors, which are represented as vectors in the 3D RGB color space As a result, the RGB color space provides a useful starting point for representing color features of images However, the RGB color space is not
Trang 4perceptually uniform More specifically, equal distances in different areas and
along different dimensions of the 3D RGB color space do not correspond to equal
perception of color dissimilarity The lack of perceptual uniformity results in theneed to develop more complex vector quantization to satisfactorily partition the
RGB color space to form the color descriptors Alternative color spaces can be
generated by transforming the RGB color space However, as yet, no consensus
has been reached regarding the optimality of different color spaces for based query-by-color The problem originates from the lack of any known singleperceptually uniform color space [18] As a result, a large number of color spaceshave been used in practice for content-based query-by-color
content-In general, the RGB colors, represented by vectors vc, can be mapped to
different color spaces by means of a color transformation T c The notation wc
indicates the transformed colors The simplest color transformations are linear
For example, linear transformations of the RGB color spaces produce a number
of important color spaces that include Y I Q (NTSC composite color TV standard),
Y U V (PAL and SECAM color television standards), Y CrCb (JPEG digital image
coding standard and MPEG digital video coding standard), and opponent color
space OP P [19] Equation (11.3) gives the matrices that transform an RGB vector into each of these color spaces The Y I Q, Y U V , and Y CrCb linear
color transforms have been adopted in color picture coding systems These lineartransforms, each of which generates one luminance channel and two chrominancechannels, were designed specifically to accommodate targeted display devices:
Y I Q — NTSC color television, Y U V — PAL and SECAM color television, and
Y CrCb— color computer display Because none of the color spaces is uniform,color distance does not correspond well to perceptual color dissimilarity
The opponent color space (OP P ) was developed based on evidence that
human color vision uses an opponent-color model by which the responses of the
R , G, and B cones are combined into two opponent color pathways [20] One benefit of the OP P color space is that it is obtained easily by linear transform.
The disadvantages are that it is neither uniform nor natural The color distance
in OP P color space does not provide a robust measure of color dissimilarity One component of OP P , the luminance channel, indicates brightness The two
chrominance channels correspond to blue versus yellow and red versus green
T c Y I Q=
0.299 0.596 −0.274 −0.322 0.587 0.114 0.211 −0.523 0.312
Trang 5COLOR DESCRIPTOR EXTRACTION 289
T c OP P =
−0.500 −0.500 0.333 0.333 0.333 1.000 0.500 −1.000 0.500
Although these linear color transforms are the simplest, they do not generatenatural or uniform color spaces The Munsell color order system was desined to
be natural, compact, and complete The Munsell color order rotational system
organizes the colors according to natural attributes [21] Munsell’s Book of Color
[22] contains 1,200 samples of color chips, each with a value of hue, saturation,and chroma The chips are spatially arranged (in three dimensions) so that stepsbetween neighboring chips are perceptually equal
The advantage of the Munsell color order system results from its ordering of afinite set of colors by perceptual similarities over an intuitive three-dimensionalspace The disadvantage is that the color order system does not indicate how to
transform or partition the RGB color space to produce the set of color chips.
Although one transformation, named the mathematical transform to Munsell
(MTM), from RGB to Munsell H V C was investigated for image data by hara [23], there does not exist a simple mapping from color points in RGB color
Miya-space to Munsell color chips Although the Munsell Miya-space was designed to becompact and complete, it does not satisfy the property of uniformity The colororder system does not provide for the assessment of the similarity of color chipsthat are not neighbors
Other color spaces such as H SV , CIE 1976 (L∗a∗b∗), and CIE 1976 (L∗u∗v∗)
are generated by nonlinear transformation of the RGB space With the goal of
deriving uniform color spaces, the CIE2 in 1976 defined the CIE 1976 (L∗u∗v∗)
and CIE 1976 (L∗a∗b∗) color spaces [24] These are generated by a linear
trans-formation from the RGB to the XY Z color space, followed by a different
nonlinear transformation The CIE color spaces represent, with equal emphasis,
the three characteristics that best characterize color perceptually: hue, lightness, and saturation However, the CIE color spaces are inconvenient because of the necessary nonlinearity of the transformations to and from the RGB color
space
Although the determination of the optimum color space is an open problem,certain color spaces have been found to be well-suited for content-based query-
by-color In Ref [25], Smith investigated one form of the hue, lightness, and
saturation transform from RGB to H SV , given in Ref [26], for content-based
query-by-color The transform to H SV is nonlinear, but it is easily invertible The
H SV color space is natural and approximately perceptually uniform Therefore,
the quantization of H SV can produce a collection of colors that is also compact and complete Recognizing the effectiveness of the H SV color space for content- based query-by-color, the MPEG-7 has adopted H SV as one of the color spaces
for defining color descriptors [27]
2 Commission Internationale de l’Eclairage
Trang 611.2.2 Color Quantization
By far, the most common category of color descriptors are color histograms Colorhistograms capture the distribution of colors within an image or an image region.When dealing with observations from distributions that are continuous or that cantake a large number of possible values, a histogram is constructed by associatingeach bin to a set of observation values Each bin of the histogram contains thenumber of observations (i.e., the number of image pixels) that belong to the asso-ciated set Color belongs to this category of random variables: for example, thecolor space of 24-bit images contains 224distinct colors Therefore, the partitioning
of the color space is an important step in constructing color histogram descriptors
As color spaces are multidimensional, they can be partitioned by dimensional scalar quantization (i.e., by quantizing each dimension separately) or
multi-by vector quantization methods By definition, a vector quantizer Q cof dimension
k and size M is a mapping from a vector in k-dimensional space into a finite setC
that contains M outputs [28] Thus, a vector quantizer is defined as the mapping
Q c:k →C, whereC= (y0, y1, , y M−1) and each ym is a vector in the
k-dimensional Euclidean space k The set C is customarily called a codebook, and its elements are called code words In the case of vector quantization of the color space, k = 3 and each code word ym is an actual color point Therefore,the codebook Crepresents a gamut or collection of colors
The quantizer partitions the color space k into M disjoint sets R m, one percode word that completely covers it:
All the transformed color points wc belonging to the same partition R m are
quantized to (i.e., represented by) the same code word ym:
parti-which indicates the amount of presence of a color The angle around the axis
is the hue, indicating tint or tone As the hue represents the most perceptually
significant characteristic of color, it requires the finest quantization As shown
in Figure 11.1, the primaries, red, green, and blue, are separated by 120 degrees
in the hue circle A circular quantization at 20-degree steps separates the hues
so that the three primaries and yellow, magenta, and cyan are each representedwith three subdivisions The other color dimensions are quantized more coarsely
Trang 7COLOR DESCRIPTOR EXTRACTION 291
vˆc = (r, g, b)
wˆc = T · nˆ c
wˆc = (h, s, v) G
R B
V g b
r
Figure 11.1 The transformation T H SV
c from RGB to H SV and quantization Q166
c gives
166 H SV colors = 18 hues × 3 saturations × 3 values + 4 grays A color version of this figure can be downloaded from ftp://wiley.com/public/sci tech med/image databases.
because the human visual system responds to them with less discrimination; we
use three levels each for value and saturation This quantization, Q166c , provides
M = 166 distinct colors in HSV color space, derived from 18 hues (H) × 3 saturations (S) × 3 values (V) + 4 grays [29].
11.2.3 Color Descriptors
A color descriptor is a numeric quantity that describes a color feature of animage As with texture and shape, it is possible to extract color descriptors fromthe image as a whole, producing a global characterization; or separately fromdifferent regions, producing a local characterization Global descriptors capturethe color content of the entire image but carry no information on the spatiallayout, whereas local descriptors can be used in conjunction with the positionand size of the corresponding regions to describe the spatial structure of theimage color
histograms or derived quantities As previously mentioned, mapping the image
to an appropriate color space, quantizing the mapped image, and counting howmany times each quantized color occurs produce a color histogram Formally, if
I denotes an image of size W × H, I q (i, j ) is the color of the quantized pixel
at position i, j , and y m is the mth code word of the vector quantizer, the color histogram h c has entries defined by
where the Kronecker delta function, δ( ·, ·), is equal to 1 if its two arguments are
equal, and zero otherwise
The histogram computed using Eq 11.6 does not define a distribution becausethe sum of the entries is not equal to 1 but is the total number of pixels of the
Trang 8image This definition is not conducive to comparing color histograms of imageshaving different size To allow matching, the following class of normalizationscan be used:
with r = 2 are unit vectors in the M-dimensional Euclidean space, namely, they
lie on the surface of the unit sphere The similarity between two such histogramscan be represented, for example, by the angle between the corresponding vectors,captured by their inner product
globally is that it does not take into account the spatial distribution of coloracross different areas of the image A number of methods have been developedfor integrating color and spatial information for content-based query Sticker andDimai developed a method for partitioning each image into five nonoverlappingspatial regions [15] By extracting color descriptors from each of the regions, thematching can optionally emphasize some regions or can accommodate matching
of rotated or flipped images Similarly, Whsu and coworkers developed a methodfor extracting color descriptors from local regions by imposing a spatial grid onimages [30] Jacobs and coworkers developed a method for extracting colordescriptors from wavelet-transformed images, which allows fast matching of theimages based on location of color [31] Figure 11.2 illustrates an example ofextracting localized color descriptors in ways similar to that explored in [15] and[30], respectively The basic approach involves the partitioning of the image intomultiple regions and extracting a color descriptor for each region Correspondingregion-based color descriptors are compared in order to assess the similarity oftwo images
Figure 11.2a shows a partitioning of the image into five regions: r0–r4, in
which a single center region, r0, captures the color features of any center object.Figure 11.2b shows a partitioning of the image into sixteen uniformly spaced
regions: g0–g15 The dissimilarity of images based on the color spatial descriptorscan be measured by computing the weighted sum of individual region dissimi-larities as follows:
descriptor of region m of the target image, and w m is the weight of the m-th
distance and satisfies w m= 1
Alternately, Smith and Chang developed a method by matching images based
on extraction of prominent single regions, as shown in Figure 11.3 [32] The
Trang 9COLOR DESCRIPTOR EXTRACTION 293
descriptors A color version of this figure can be downloaded from ftp://wiley.com/public/
sci tech med/image databases.
Region extraction
Spatial composition
Region extraction
Spatial composition
Figure 11.3 The integrated spatial and color feature query approach matches the images
by comparing the spatial arrangements of regions.
Trang 10VisualSEEk content-based query system allows the images to be matched bymatching the color regions based on color, size, and absolute and relative spatiallocation [10] In [7], it was reported that for some queries the integrated spatialand color feature query approach improves retrieval effectiveness substantiallyover content-based query-by-color using global color histograms.
11.3 COLOR DESCRIPTOR METRICS
A color descriptor metric indicates the similarity, or equivalently, the dissimilarity
of the color features of images by measuring the distance between colordescriptors in the multidimensional feature space Color histogram metrics can
be evaluated according to their retrieval effectiveness and their computationalcomplexity Retrieval effectiveness indicates how well the color histogrammetric captures the subjective, perceptual image dissimilarity by measuring theeffectiveness in retrieving images that are perceptually similar to query images.Table 11.1 summarizes eight different metrics for measuring the dissimilarity ofcolor histogram descriptors
11.3.1 Minkowski-Form Metrics
The first category of metrics for color histogram descriptors is based on the
Minkowski-form metric Let hq and htbe the query and target color histograms,respectively Then
A Minkowski metric compares the proportion of a specific color within image q
to the proportion of the same color within image t, but not to the proportions of
Table 11.1 Summary of the Eight Color Histogram Descriptor Metrics (D1–D8)
D1 Histogram L1distance Minkowski-form (r= 1) D2 Histogram L2 distance Minkowski-form (r= 2) D3 Binary set Hamming distance Binary Minkowski-form (r= 1) D4 Histogram quadratic distance Quadratic-form
D5 Binary set quadratic distance Binary quadratic-form
D6 Histogram Mahalanobis distance Binary quadratic-form
D7 Histogram mean distance First moment
D8 Histogram moment distance Higher moments
Trang 11COLOR DESCRIPTOR METRICS 295
m
m
hq[m]
ht[m]
Figure 11.4 The Minkowski-form metrics compare only the corresponding-color bins
between the color histograms As a result, they are prone to false dismissals when images have colors that are similar but not identical.
other similar colors Thus, a Minkowski distance between a dark red image and
a lighter red image is measured to be the same as the distance between the samedark red image and a perceptually more different blue image
for color image retrieval by Swain and Ballard in [3] Their objective was to find
known objects within images using color histograms When the object (q) size is less than the image (t) size and the color histograms are not normalized, |hq| isless than or equal to|ht| (where |h| denotes the sum of the histogram-cell values,
As defined, Eq (11.10) is not a distance metric because it is not symmetric:
d q,t = d t,q However, Eq (11.10) can be modified to produce a metric by making
it symmetric in hq and ht as follows:
Trang 12Alternatively, when the color histograms are normalized, so that|hq| = |ht|, both
Eq (11.10) and Eq (11.11) are metrics It is shown in [33] that, when|hq| = |ht|,the color histogram intersection is given by
“walk” or “city block” distance
between two color histograms hq and ht is a Minkowski-form metric Eq (11.9)
color histograms using binary sets was investigated by Smith [25] Binary setscount the number of colors with a frequency of occurrence within the image
exceeding a predefined threshold T As a result, binary sets indicate the presence
of each color but do not indicate an accurate degree of presence More formally,
a binary set s is an M-dimensional binary vector with an i-th entry equal to 1 if the i-th entry of the color histogram h exceeds T and equal to zero otherwise.
The binary set Hamming distance (D3) between sq and st is given by
D3(q, t)= |sq − st|
|sq||st| , ( 11.16)
Trang 13COLOR DESCRIPTOR METRICS 297
where, again,|·| denotes the sum of the elements of the vector As the vectors sq
and st are binary, the Hamming distance can be determined by the bit differencebetween the binary vectors Therefore, D3 can be efficiently computed using
an exclusive OR operator ( ), which sets a one in each bit position where its
operands have different bit values, and a zero where they are the same, as follows:
system developed a quadratic-form metric for color histogram–based imageretrieval [2] Reference [34] reports that the quadratic-form metric between colorhistograms provides more desirable results than “like-color”-only comparisons
The quadratic-form distance between color histograms hq and ht is given by:
Figure 11.5 Quadratic-form metrics compare multiple bins between the color histograms
using a similarity matrix A = [a ij], which can take into account color similarity or color covariance.