UNIVERSITY OF ENGINEERING AND TECHNOLOGY VIETNAM NATIONAL UNIVERSITY, HANOI NGUYEN TIENDUNG TRADEMARK IMAGE RETRIEVAL BASED ON SCALE, ROTATION, TRANSLATION, INVARIANT FEATURES MASTER
Trang 1
UNIVERSITY OF ENGINEERING AND TECHNOLOGY
VIETNAM NATIONAL UNIVERSITY, HANOI
NGUYEN TIENDUNG
TRADEMARK IMAGE RETRIEVAL BASED ON
SCALE, ROTATION, TRANSLATION,
INVARIANT FEATURES
MASTER THESIS: INFORMATION TECHNOLOGY
Hanoi - 2014
Trang 2
UNIVERSITY OF ENGINEERING AND TECHNOLOGY
VIETNAM NATIONAL UNIVERSITY, HANOI
NGUYEN TIENDUNG
TRADEMARK IMAGE RETRIEVAL BASED
ON SCALE, ROTATION, TRANSLATION,
INVARIANT FEATURES
Major : Compuler Science
Code : 60480101
MASTER THESIS: INFORMATION TECHNOLOGY
Supervised by: Dr Le Thanh Ha
Hanoi - 2014
Trang 3Originality Statement
‘Thereby declare that this submission is my own work and to the best of my knowledge
it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or
diploma at University of Engineering and Technology (UET) or any other educational
mstitulion, except where dus acknowledgement is made in the thesis T also declare thai the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project's design and conception or in style, presentation
and linguistic expression is acknowledged.”
Signed
a
Trang 4Feature vectors extraction
Diserete Fourier Transform (DFT) Lop-polar transform
‘Measure of similarity
FEuchdean distance
Mahalanobis distance Chord distance
Chapter 4: Proposed method
Al
42
43
44
PT6-ĐFOCGSSÌnE ào ccvccctscceg "
Visual shape objects extraction HH Hee,
Scale, rotation, translation invariant Íeattes
Trang 5
5.3 ‘Test results for scaling action Lạng HH HH0 0 em re
5.4 Testresults for rotating actions on
55 Test results for mirror actions
5.6 Test results for partial copy actions
5.7 Test results for random query trademark
Visual shape objects extraction
Scale, rotation, translation invariant features extraction
Matching by measure of similarity and retrieval Irademark lnaages
35
Trang 6
Some trademark image samples
The log-polar transform maps (x y) into (log(r), 8)
21
Log-polar transform of rotated and scaled squares: size goes to a shift an the
log(r) axis and rotation to a shift ơn the 6 — axis
Fig, 4 Contow filter algorithm
Illustration of three stages of the proposed method
Samples of the collected trademark images for testing
Results for exacl copy Lests Result for scaling tests
Results for translation and scaling: tests
Results for rotation tests
Results for mirror tests
Results [or parital copy tesis Results for random lesls
Trang 7
ABBREVIATION
DEFT: Discrete Fourier Transform
CBIR: Content Based Tnage Retneval
SIFT: Scale-invariant feature transform
Trang 8Abstract
Trademark registration offices or authorities have been bombarded with requests
from enterprises ‘hese authorities face a great deal of difficulty in protecting enterprises’ rights such as copyright, license, or uniqueness of logo or trademark since they have only conventional clustering Urgent and essential need for sufficient automatic trademark unage retrieval system, therefore, is enGrely worth thorough research In this thesis, a
novel trademark image retrieval method is proposed in which the input trademark image
is first separated into dominant visual shape images then a feature vector for each shape image which is scale-, rotation-, and translation- invariant is created, Vinally, a similarity measure between two Irademark is calculated based on these fealure vectors Given a query tradsmark image, relricval procedure is carried out by taking the most five similar trademark images ina predefined trademark Various experiments are conducted to mimic the many types of trademark copying actions and the experimental results exhibit the
robustness of our retrieval method under these trademark copy actions
Trang 9Chapter 1: Introduction
From an economic perspective, a trademark is clearly understood as a word, a design,
a picture, a complex symbol or even a combination of such, which is put on a product or standing for service of particular company In [2], four types of popular trademarks are listed in order of visual complexity: word-in-mark (only characters or words in the mark), device-mark (graphical or figurative elements), composite-mark (characters or words and graphical elements) and complex-mark (complex image) Fig 1 offers some trademark
G
BAGTRAN
samples
Fig 1 Some trademark image samples
Every Company or Financial organization desires to own a distinctive, meaningful, and descriptive logo which offers both exclusive and right of its characteristic Drawing
attention of consumers to their products or services and market viability dependsactually
on not only designing an intellectual and attractive trademark, but also whether or not
preventing consumer confusion
The world markets have remarkably expandedand grown for global economic
scenario caused by different trade related practices coming closer to each other at international level A great number of businesses have been established This has resulted
in millions of trademarks submitted to various trademark offices world over for
registration need to have distinctiveness from the existing trademarks as per definitions and trade practices in different countries and this is likely to be on an increase in years to
come.Actually, millions of trademarks already registered and millions of applications filed
for trademarks registration are aggravating the problem of issuing the trademark
certificates Therefore, the trademark registration authorities have received many
trademark protection applications from enterprises The problem for finding the similar trademark has become a challenge because These authorities face challenges in dealing
with these proposals since they still use the traditional activity of classification (i manual way) It is obvious that trademark registration with manual searching is very
arduous task for the officials.It is really hard for them to make sure if a trademark is
8
Trang 10duplicated: whether a particular trademark is registered or not; if a trademark resembles another registered trademark in any way, ot, if copyright or license of trademark is infringed Thus, this poses an urgent need for an alternative automatic technology
Tn [33], there are different tectmiques and approaches currently tn usc for distinetrisss check [or trademarks The most popular and appreciated image processing lechniques and approaches for the trademark distinctness check are Content Based Image retrieval techniques, which widely used for that purpose and some other approaches like shape and
texture based similarity finding lechsiques are also used Image processing tools and
fovkniques can be usod to solve different problems related to image, lex, graphics and color cle A Trademark can be a combination of text, graphics, image, and colored Lexture Based on these, one can divide them in these components for finding the similarity among
different trademarks retrieved from the trademark database Most of the recent techniques
used [or (he image retrieval have mainly utilized the features like color, lexlure, shape etc They used existing CBIR tcclmique, ic Content Based Tmage Retrieval Sysiems to rotticve the images based on visual features like texture, color, shape In this technique extraction of color feature using the color histogram technique is utilized It also considers the shape feature because it is an important feature in CBIR applications Many techniques
or approaches have been utilized for the image retrieval, some of which are based on improved pattern matching algorithms Some others take a much broader approach like searching just from the text files Some are based on shape and color feature and some have attempted morphological patter based image matching and retrieval using a
database A shape based technique introduced for the logo retrieval reparted in a paper is
also inadequate to solve the problem amicably
In this thesis, a novel trademark image retrieval method is proposed in which the input trademark image is first separated into dominant visual shape images then a feature vector for each shape image which is scale-, rotation-, and translation- invariant is created
Finally, a similanty measure between two trademark is calculated based on these feature vectors The manuscript entitled “Trademark Image Retrieval Based on Scale, Rotation,
‘franslation, Invariant Features” related the issue of my thesis was published inComputing
and Communication Technologies, Research, Innovation, and Vision for the 1'uture
(RIVE), 2013 LEME RIVF International Conference onl0-13 Nov 2013
9
Trang 11My thesisis organized as follows: Chapter 1 is introduction of my thesis Chapter 2 represents some related works Chapter 3 illustrates background about some problems related Chapter 4 presents proposed method in detail Chapter 5 provides installation of Visual Studio 2010 with OpenCV2.4.2 on Window 7 for implementing my thesis anda presenlalion of experimertal results Chapler 6 is conclusion the thesis Additionally, in
‘Appendix, | show the whole of source code of my thesis for reading convenience
10
Trang 12Chapter 2: Related work
In recent years, researchers have proposed a wide range of solutions in a bid to
alleviate the workload for the trademark registration offices Chen, Sun, Yang [1]
suggested two maim steps for computing Tealures vector Tnilially, object region extracted, from a principal orienlatiorerolaled image is equally partitioned info 16 regions Then an
entropy vector as feature vector of an image is constructed by computing information
entropy of each partitioned region This automatic shape-based retrieval technique
achieves the desired performance, good imvariance under rotalion, lrmslation, scale, noise,
degree of thicknes
, and human visual pereephon satisfaction However, this single-fealure retrieval sysiem docs not scem l mect multi-aspect of appreciation To improve this, among others, single-feature Zemike Moments (4M) in [4, 10] and invariant moments in [3, 5, 6] features of each are combined with other features Experiment results presented by [4] showed that this method has steady performance and good invariant properly under tanslalion, rotation and scale Moreover, and the low noise sensitivity of Zernike moments made this method more insensitive to noise However, because different users have different understanding for image similarity, the present methods of trademark image retrieval have some shortcomings in some aspects such as the retrieval ability to geometric deformation images, retrieval accuracy, the consistency between image and human visual perception Yet the retrieval by using Zomike moment method in [10|shows it can rapidlyretrieve trademarks, A new method is proposed in [3] based on cosine distance and normalized distance measures The cosine distance metric normalizes all features vector to unit length and makes it invariant against relative in-plane scaling transformation of the image content The normalized distance combines two distances measures such as cosine distance and Kuclidean distance which shows more aceuracy than one method alone ‘rhe proposed measures take into account the integration of global features (invariant moments and eccentricity) and local features (entropy histogram and distance histogram) It first indexes trademark images database (DB) in order to search for trademarks in narrowed limit and reduce time computation and thon calculates similaritics for features vector to obtain the total similarity between features vector.An altemative solution worth a mentioning is four shape features: global features (invariant moments and eccentricity) and local features (entropy histogram and distance histogram) (16] are exploited by [3]
lt
Trang 13Recently, [5] combined nine feature vectors of low-order coler moments in HSV color
space with low-orders Llu moments and eccentricity which are extracted from gray shape-
region image by Rui and Huang’s (1998) technique The way of Gauss normalization is
apphed to thosc features, and the weight of every feature van be adjusted flexibly
[17] Good results have been obtaimed in the experiments which prove thal the mulG-feature
combination way is better than other single-feature ways [6] employed10 invariant moments as shape features of trademark images improved by [20] These features are input
toan ensemble of RBFNNs traimed vĩa ä rnintrnzatlon of the localized generalization error
to recognize the trademark images Tn this currord system, the Irademark images are black- and-white images The system will be improved (o adopl color trademark images as further study
In [2, 7], the ways of proposed combination of features are definitely different It is
admitted thal cach of them performs well The equidistamee based on concentric circles
[14] is used to partition region, labelled the first step in [4] anid [2] [4], and [2] differ in the
implementation of the second stop: [4] caloulated its feate vector F composed of comesponding region ZM, while [2] combined region feature vectors of 200 values with
contours features which are considered the comer-to-centroid triangulations detected by
the Hong & Jiang’s improved SUSAN algorithm [15] Twanaga, of at [7] pul forward the modified angle-distance pair-wise histogram based on the angle histogram and distance histogram of trademark object ‘This system outperforms both moment-based and independent histogram: 1e angle, distance, color Ixperiments conducted on registered
trademark databases Impressive results were shown to demonstrate the robustness of the
proposed approach Morcover, it is quite simplcto construct the distance—angle pair-wise histogram for a trademark object
4s the state-of-the-art method, [10] integrated 2M with SIFT’ features In this
approach, Zemike moments of the retrieved image were firstly extracted and sortedaccording to similarity Candidate images were formed Then, the SIFT features wore used for matching the query image accurately with candidate images This method not only keeps high precision- recall of SIF features and is superior thanthe method based
on the single “emike moments feature, but also improves effective retrieval speedcompared to the single SIT features This method can be well applied to the
12
Trang 14trademark image retrievalsystem ‘this newly proposed approach enhances the retrieval performances ‘Tuan N.G., et al in [27] presented a new method based on the discriminative properties of trademark images for text recognition m trademark images The experimental resulls show the significant gain im toxl recognition accuracy of proposed mothod in comparing with iraditional ext recognition methods This contribution
seems to deal with a part of recognition of trademark image
However, those approaches seem to ignore not only partial trademark comparison, but also mirrored trademark Furthermore, the approaches have only concentrated on either original wademark without removing noise elements of trademark or standard database which contains no noise Additionally, these approaches have laken the trademark images
as a compleled object to process and do nol concer wilh the detailed visual shape in the trademark images ‘therefore, they cannot detect the paitial similarity between trademark images.Nonetheless, calculating distance between two features also plays an extremely important part in measuring the similarity degrees among images For this reason, the mentioned solutions cach endeavours to propose an appropriale mcasure Lo some extcril
To overcome the above-mentioned drawbacks, an novel content-based trademark
recognition method is proposed with these four maiz slages: (i) presprocess or scale down
the trademark images and converts them into binary image; (ii) extract dominant shape objects from the binary images; (iii) apply RI3RC algorithm to extract rotation-invariant, scale-invariant, translation-invariant features from the shape objects; and (iv) use Euclidian distance to measure suntlarity of two images and then retrieve 10 trademark images which are the most similar lo query trademark images The thesis focuses on handling
Vietnamese composite-mark database
13
Trang 15Chapter 3: Background
3.1 Pre-processing
Converting pray scale to binary image
Tn [31], segmenlation involves scparaling an image into regions (or Lhcir contours) corresponding lo objects We usually iry to segmenl regions by iđenliiVing connnon properties Or, similarly, we identify contours by identifying differencesbetween regions The simplest property that pixels in a region can share is intensity So, a natural way to
segmenl such regions is through thresholding, the separation of light, and dark regions
Thresholding creates binary images from grey-level ones by Lurnings all pixels below seine threshold to zero and all pixels aboul that threshold lo one If g(x,y) is a taresholded
version of f(x, y) al some global hreshold 7,
0, otherwise
914 pcan eT )
Problems with thresholding
The major problem with threshokbng is thal we consider only the irlensily, not any relationships between the pixels.here is no guarantee that the pixels identified by the thresholding process are contiguous.We can easily inckide extraneous pixels that aren't
part of the desired region, and we can just as easily miss isolatedpixels within the region
normal intensity in the region When we use thresholding, we typically have to play with
it, sometimes losing too much of the region and sometimesgetting too many extraneous background pixels (Shadows of objects in the image are also a real pain - notjust where they fall across another object bul where they mistakenly get included as part of a dark object on a lightbackground.)
Trang 16We can deal, at least in part, with such uneven ilhunination by determining thresholds locally, That is, instead ofhaving a single global threshold, we allow the
threshold itself to smoothly vary across the image
Automated methods for finding thresholds
To scl a global threshold or to atlapt 4 local threshold Lo an area, we usually look al the histogram to see if we can find two or more distinct modes - one for the foreground and one for the background Recall that a histogram is a probability distribution:
Simply sot the threshold Tsuch thal c(T) = 1/p (Or, if we're looking for a dark abject ona light background,c(T) = (1—1/p)
Finding peaks and valleys
One extremely simple way to find a suitable threshold is to find each of the modes (local maxima) and then find thevalley (minimum) between them.While this method
appea
imple, there are two main problems with ithe histogram may be noisy, thus causing many local minima and maxima To got around this, the histogram isusually smoothed before trying to find separate modes, the sum of two separate distributions, each with their own mode, may not produce a distribution with twodistinct modes
Clustering (K-Means Variation)
Auother way to look ai the problem is that we have lwo groups of pixcls, one with
one range of values and one withanother Whal makes (thresholding difficull is that the:
ranges usually overlap What we want to do is to minimizethe error of classifying a background pixel as a foreground one or vice versa To do this, we try to minimize the areaunder the histogram for one repion that lies on the other region's side of the threshold
15
Trang 17‘The problem is that we don’thave the histograms for each region, only the histogram for the combined regions Understand that the place of minimum overlap (the place where the misclassified areas of the distributions areequal) is nor is not necessarily where the valley ocours in the combined histogram This occurs, for ceample, wherone cluster has a wide distribution and the other a narrow ane One way tha we can ty to do this is to consider the values in the two regions as two clusters.In other words, let 2B (T) be the mean of all pixels fess than the threshold and “@(1') be the mean of all pixelsgreater than the threshold We want to find a threshold such that the following holds:
to how closetheir intensities are to uB(F) and ZO(T) respectively) Now, update the estimates of uB(T') and 40 (1) respectivelyby actually calculating the means of the pixels
on each side of the threshold This process repeats until the algorithmconverges This method works well if the spreads of the distributions are approximately equal, but it does not handle well thecase where the distribulions have differing variauces
Clustering (The Otsu Method)
Another way of accomplishing sunilar results is to set the threshold so as to try to make each cluster as tight as possible,thus minimizing their overlap Obviously, we can’t
change the distributions, bul we can adjust where weseparale them (the threshold) As we
adjust the threshold one way, we increase the spread of one and decrease thespread of The other ‘The goal then is to select the threshold that minimizes the combined spread, We can
define the within-class variance as the weighted sum of the variances of each cluster:
Gein T) = My (PORT) + 7,(F 08 (7) (6
of (7)- the variance of the pixels in the background (below threshold)
o2(T) = the variance of the pixels in theforeground (above threshold)
16
Trang 18and [0,4 — 1] is the range of intensity levels
Computing this within-class variance for each of the two classes for each possible threshold involves a lot ofcomputation, but there’s an easier way.lf we subtract the within-class variance from the total variance of the combined distribution, you get somethingealled the between-class variance:
FBeeween (T) = 9? — O Signin PY @
= Rgц)[#g(T) — H]? + nạ(1)[maŒ — uP a0) whereg"is the combined variance and jis the combined mean Notice that the between- class variance is simply the weighted variance of the cluster means themselves around the overall mean Substituting # = ng (T)4p(T) + no(T)uo(T) and simplifying, we get
OBecween (T3) = nạ(†)na(f)[up(3 ~ Ho TP ay
So, for each polential threshold 7 we
= Separate the pixels into two clusters according to the threshold,
= Find the mean of each cluster
= Square the differonce between the means
* Multiply by the mambor of pixels itt one cluster times (he number in the other
This depends only on the difference between the means of the two clusters, thus avoiding having to calculate differences between individual intensifies and the clustermeans ‘The optimal threshold is the one that maximizes the between-class variance
(or, conversely, minimizes the within-class variance)
‘This still sounds like a lot of work, since we have to do this for each possible
threshold, but it tums out that thecomputations aren’t independent as we change from one threshold to another We can update ng(T), no(T), and the respective cluster means Hg(T) and jip(T) as pixels move from one cluster to the other as T increases Using simple recurrence relations we can update the between-class variance as we successively test each Ihreshold:
— #aŒ)mg nT 4
Trang 19Ho no O)-nrT Mạ(ŒT + 1) = nob
‘This method is called the Otsw method
Whereas the Otsu method separaled the two clusters according to the threshold and tned
to optimize some statistical measure, mixture modeling assumes that there already exists
two distributions and we must find them Once we know the parameters of the
distributions, it's easy to determine the best threshold Unfortmately, we have six unknown parameters (1g, 7g, Mg, Mg, 7p, and op), so we need to make some estimates of
these quantities Ti the two distributions are teasonably well separated (some averlap but
not too much), we can choose an arbitrary threshold 7 and assume that the mean and
standard deviation of each group approximates the mean and standard deviationof thetwo underlying populations We can then measure how well a mix of the two distributions
approximates the overall distribution:
F = SY "[Apoaet(@) — Rimage too] an
Choosing the optimal threshold thus becomes a matter of finding the one that causes the mixture of the twoestimated Gaussian distributions to best approximate the actual
histogram (minimizes F) Unfortunately, the solution space is too large to search
exhaustively, so most methods use some form of gradient descent method Such gradient descent methods depend heavily on the accuracy of the initial estimate, but the Otsu method or similar clustering methods can usually provide reasonable initial estimates Mixture modeling also extends to models with more than two underlying
distributions (more than two types ofregions)
Multispectral thresholding
A technique for segmenting images with multiple components (color images,Landsat images, or MRI images with T1, T2, and proton-density bands) warks by
18
Trang 20estimating the optimal thresholdin one channel and then segmenting, the overall image based on that threshold iach of theseregions is then subdivided independently using properties of the second channel Tt is repeated again for the third channel, and so onruming through all charmels repeatedly until each region in the image exhibils a disiribulion indicative of a coherentregion (a single mode)
Thresholding along boundaries
If we want our thresholding method to give stay fairly true to the boundaries of the object, we can first apply someboundary-finding method (such as edge detection techniques) and (hen sample the pixels only where the boundary probability is high Thus, our threshold mcthod based on pixels near boundarivs will cause separalions of the pixels
in ways that tendto preserve the boundaries Other scattered distributions within the object
or the background are of no relevance [lowever, if the characteristics change along the boundary, we're still in trouble And, of course, there’s still noguarantee thal we" nat have extraneous pixels or holes
3.2, Objectdescription
In [33], Objects are represented as a collection of pixels in an image Thus, for purposes of recognition weneed to describe the properties of proups of pixels The description is often just a sel of numbers: the object's deseriplors From these, we can compare and recognize objects by simply matching the descriptors of objects in an image against the descriptors of known objects However, to be useful for recognition, descriptors should have four important properties First, they should define a complete set That is, two objects must have the same descriptors if and only if they have the same shape Sccoudly, they should be congruent, As such, we should be able to recognizesimilar objects when they have similar descriptors Thirdly, it is convenient that they haveinvariant properties For example, rotation-invariant descriptors will be useful for recognizing objects whatever their orientation Other important invariance properties include scale and position and also invariance to affine and perspective changes These last two properties are very important when recognizing objects observed from different viewpoints 1n addition to these three properties, the descriptors should be a compact set Namely, a descriptor should represent the essence of an object in an efficient way hat is,
it should only contain information about what makes an object unique, or different from
19
Trang 21the other objects ‘The quantity of information used to desoribe this characterization should be less than the information necessary to have a complete description of the object
itself Unfortunately, there is na set of complete and compact descriptors to characterize
general objecis Thus, the best reengnilion performance is obtained by carefully selected
properlies As such, the process of recognition is strongly related io cach particular
application with a particular type of objects Liere, the characterization of objects is presented by two forms of descriptors Region and shape desoriptors characterize an
arrangement of pixels within the area and the arrangement of pixels in the perimeter or boundary, respectively This region versus perimeler kind of representation is conmnon in
image analysis For example, edges can be located by region growing (lo label arca) or by differentiation (to label perimeter) ‘there are many techniques that can be used to obtain descriptors of an object's boundary
3.3 Feature vectors extraction
3.3.1 Discrete Fourier Transform (DFT)
In [34|, the Fourier Transform will decompose an image into its smus and cosines
components In other words, it will transform an image from its spatial domain to its frequency domain The idea is that any function may be approximated exactly with the
sum of infinite sinus and casimes functions The Fourier Transform is a way low to do
this Mathematically a two dimensional images Fourier transform is
Fặ,Ð =2 100e nữ) ag)
Here f is the image value in its spatial domain and I’ in its frequency domain The result of the transformation is complex numbers Displaying this is possible either via a
real image and a complex image or via a magnitude and a phase image However,
throughout the image processing algorithms only the magnitude image is interesting as this contains all the information we need about the images geometric structure
Nevertheless, if we intend to make some modifications of the image in these forms and
then we need to retransform it we will need to preserve both of these.
Trang 223.3.2 Log-polar transform
In [32], for two-dimensional images, the iog-polar transform [Schwartz80] is a
change from Cartesian to polar coordinales:(x,y) 0 re’, where r = fx? Fy?
andexp(i?) = expifi arctanifly/x)) To separate out the polar coordinates into a (p,@)
space that is relative to some center point(x,,y,), we take the log so that p=
loge Ge — x," + (y — y,)’anda = arclan tty — y.)/( — x,)) For image purposes -
when we need to “fit” the interesting stuff into the available image memory - we typically apply a scaling factor m to p Fig 2 shows a square object on the left and its
encoding in log-polar space
‘Hig 2 ‘Ihe Jog-polar transform maps (x,y) into (log(r), 8)
The log-polar transform can be used to create two-dimensional invariant
TepreserdaHons oÏ object views by shifting he transformed image’s cerler af mass to a fixed point in the log-polar plane; sco Fig.3 On the lefl are three shapes (hal we want lo recognize as “square” ‘I'he problem is, they look very different One is much larger than the others and another is rotated ‘Lhe log-polar transform appears on the right in igure 3 Observe that size differences in the (x, y) plane are converted Lo shifls along the logt>) axis of the log-polar plane and thal the rotation differences are converted to shifts along
the @-axis in [he log-polar plane Tf we take the ans{ormed center of cach transformed
square in the log-polar plane and then recenterthat point to a certain fixed position, then all the squares will show up identically in the log-polar plane This yields a type of invariance lo two-dimensional rotation and scaling
21
Trang 23
Log-Polar
Fig 3, Log-polar transform of rotated and scaled squares: size goes to a shift on the log(r) axis aud rotation
toa shift on the © — axis
in the database is computed and compared to report a few of most similar images
‘A similarity measurement must be selected to decide how close a vector is to another vector The problem can be converted to computing the discrepancy between two
vectors x,y € K* Several distance measurements are presented as follows
t4G@,y) = llx-ylh = Shaly — #;| @ly
Another distance measurement called the supremenorm, is computed by
T2(x,¥) = max «|3, — 3,| (22)
2
Trang 243.42 Mahalanobis distance
‘The Mahalanobis distance between two vectors xand ywith respect to the training patterns {x;}is computedby
82( x,y) = Gr — yES TO — y), (23)
where the mean vector uand the sample covariance matrix Sfrom the sample{x;|1 < í <
nof size nare computed by S=In/=Inxi—wxr—we with
49
3.43 Chord đistance
The chord distance belween lwo vectors x and y is to measure the dislance between
the projected vectors of x and y onto the unit sphere, which can be computedby
8;Gœ,y) = |š—3|| wvheer = llrllz s = lIylls
A simple computaHon leads to 8ạ(>,y) = 2sinlfu/2) with œ begin the angle between vectors xand y
‘A similar measurement based on the angle between vectors x and y is defined as
Ta(,y) = 1 — |cos1f#)|, cos(a) = IIRTR 3%
Trang 25Chapter 4: Proposed method
4.1 Pre-processing
Tn this initial stage, images are scaled down with the smaller side of 300 pixels and
converted into gray scales The images are then converted into binary trademark using
Olsu’s method [11] which minaaizes the weighted withi-class variance or maximizes the
between-class variance Otsu’s algorithm is also one of the five automated methods for
finding threshold: finding peaks and valleys, clustering (K-Means Variation), mixture
modeling, and multispectral thresholding
4.2, Visual shape objects extraction
In this stage, shape objects in fonn of commecicd contours are retrieved from the binary image using Suzuki's algorithm in [12] liach of those detected contours is stored as
a vector of points Alt of the vectors are crpanized as a struchwe of hierarchy, which contains information about the image lopology The number of contours is manipulated by
the texture of image
There are four oplions proc
sed in [12] for contour retrieval: (i) retrieve only the extreme ouler contours, (ii) retrieve all of the contours without establishing any hierarchical relationships, (iii) retrieve all of the contours organized into a two-level hierarchy; and (iv) retrieve all of the contours and reconstruct a full hierarchy of nested
contours
In the present research, we adopted the second option for shape object extraction From a binary (rademark image, we extracted a number of shape object images However,
duc to noise presence of input trademark image, many noise contours were extracted as
shape objects ‘I'o prevent this problem, we applied a filter so that the noise contours were removed Observations showed that the dominant shape contours usually have a much larger area in comparison with the nowe contours Furthermore, due to the characteristics
of fadersarks in our databasc, mosL adernarks consist of one or two dominant shape
objects which play a primary role in a company’s reputation For this reason, we propose
an algorithm to extract up to two dominant shape objects out of a binary image ‘The algorithm composes 4 main steps and one function named FilterContours (see Fig 4) responsible for taking out twa dominant shape objects FilterContours operation was
24
Trang 26tightly based on 2 thresholds: T'] = 3.82 and ‘T2 = 81707 which were discovered via our
experiment on each wademark image in our database
1 Compute each contour area using the Green formulapresented in [29]
2, Sorl these extracted contours according lo descending order of contours area
3 Remove noise shape contours im trademark image, just keep lwo 2nd and 3rd
biggest area contours
4 Remove one of the kept contours by FilterContours function
4.1 if (The area of 2nd contour is TI times bigger than that of 3rd contour and the
area of 2nd is less than T2) then 3rd contour is removed and 2nd contour is remained
4.2 else if (the area of 2nd is greater than T2) then 2nd contour is deleted and 3rd
contour is kept 4.3 else then both contours are maintained
return (One or Two Contours)
Fig 4 Contour filter algorithm
4.3 Scale, rotation, translation invariant features
For each extracted shape object, a corresponding feature vector is created In order to generalize the action of trademark copy such as duplication, rotation, and resizing, the
extracted feature vector of the shape object must be invariant in terms of rotation, scale, and translation In this paper, we use RBRC algorithm in [13] which is composed of three
sieps: a two-dimensional Fouricr transform (DFT), the magnitude of the Fourier yopresented inlo polar coordinates, and DFT RBRCalgorithm makes our molhod scale-
mivariant rotation-invariant, and translation-invariant This 1s in line with Fourier-Mellin
approach and polar Fourier representation [19, 26] the approaches suggested in [19, 26]
combine the phase correlation technique with the polar representation to address the
25
Trang 27problem of both translated and rotated objects However, a major difference between RBRC algorithm and Jourier-Mellin is that DIT used in last step in the former, while
phase correlation applied in the latter To explain why RBRC algorithm is invariant in
terms of rotation, translation and scale, theory about log-polar transform (LPT) transform related to magnitude of DFT is presented in [19,26] We present Log-polar transform
(LPT) and Representation of magnitude of DFT into log-polar coordirale for reading
where(x,,y,) is the center pixel of the transformation in the Cartesian coordinates
(x, y) denotes the sampling pixel in the Cartesian cpordinates and (p, 9) denotes the log-
radius and (he angular position in the log-polar coordinates Given g(x',y') a scaled and
rotated image of {(x,y) with scale rotation parameters a and a degrees, we have:
a stan (F) = tan (Tein) =O + 30)
Expression (5), (7) proves that scaling and rotation int Carlesiani domain corresponds
to pure translation in log-polar damain However, when the original image is translated by
(%o, Yq), the corresponding log-polar coordinales ia represented by
Of = tant 2222, p' = log (= mY + — Ho? G10)" GD
26
Trang 28‘Those equations, as well as in [22] indicated that, in case of the slight translation, it produces only a modified log-polar image ‘I'o overcome this limitation, the algorithm first
applies Fourier transform and then applies the Log-Polar Transform to the magnitude
specttum The magnitudes of the Fourier transform of thesc images and its translaled
counterpart are same i.¢.5 invariant to translation but relain the effects af rotation and scaling [19, 21, 22, 23, 24, 25, 26] Those comments are explained more clearly in
representation of magnitude of DFT into log-polar coordinate
Acvording to [19, 21, 23, 24, 25, 26], as well as based on the rotation and tanslation properties of Fourier transform that the power spectrum of the rotated and translated image will also be rotated by the same orientation while the magnitude remains the same Let F,(E,n)and F, (E,n)be the Fourier transforms of images f; (x, yJand f,(x, y), respectively
We are interested in three below cases:
‘the first case: Lf f,differs from f, only by a displacement xạ, Yo), then
or in frequency domain
B(un) = ¿ 1796568) 5 (Em) @3)
In second case: If f,(x,y) is a translated and rotated replica of f,(x,y) with translation (x, vo) and rotation đọ, then
?Œ,y) = fie cos Oo + y sin Oy — Xp, -x.sin@, + y.cos8g -—yo) — (34)
DFTs off, and fy are related as shown below:
Rạ(š,n) = e TẦn0%e 1®) x FL(.cos ổạ + n.sữn 8ạ, —Ệ sỉn đọ + ncos Bọ) (35)
H is supposed that Mƒ; and M; are the magnitudes of /;and /; , respectively They
have relation as shown:
Mạ(£,n) = Mì(& cos 0a + tị siit Op, —E sin Oy + 4 COS Oy) (36)
Mì(,8) = M;(p,8 — 8e) By
Wherep and 0 are the radius and angle in the polar co-ordinate system, respectively
tà 3
Trang 29In the last case:If fy is a translated, rotated and scaled version of fz, the Fourier magnitude spectra are transformed to log-polar representations and related by:
Mz (log p, 0) = My (log p — log s,6 — 8) (39)
It is obviously that scale, rotation, translation of image is represented as translation in
frequency domain in Log-Polar coordinate This translation is eliminated by applying DFT
one more time
An act of copying a trademark can be done by:
e One trademark is scaled, rotated, or translated from the other
e One trademark is combined from a part of the other
¢ One trademark is mirrored from the other
In order to recognize the copied trademark image, we derive a trademark similarity
measure based on its feature vectors After stage of creating feature vectors, a trademark
input image is represented by one or two feature vector Let J and /' the two trademark
image, we suppose that F; and Fwhere i = 1,2; j = 1,2 are feature vectors of Jand I,
respectively.
Trang 30We propose that degree of similarity of two trademarks 7J'signed S(LF') is
primarily the smallest distance between two feature vectors, one in set (F;)and one in set F) denoted bydist(F,, F by a F) We employed Buclidian distance 10 compute the distance of pkụ 1
two feature vectors, which can be expressed as follows:
#(,f) = mini
st(#, 4) )Jwhere i,j = 1,2 (6)
Trang 31Chapter 5: Experiments and results
Trademark image database in my thesis was taken from NOIP [29] In [27], it was
indicated that there are about 30% trademark images containing text only, 70% trademark
images containing text and other object such as picture, symbol, 80% consecutive
characters in word have same color and 90% trademark images have number of color
smaller than 5 We collect 243 trademark images in the trademark image database which
are sort of composite-mark to construct our testing trademark image database Fig 6 shows
some trademark image samples of the collected testing trademark database
Fig 6 Samples of the collected trademark images for testing
To evaluate the performance accuracy of our method, we used S(J,1') to retrieve
top 5 most similar trademark image to an input query image In addition, we modify the
input image to mimic the acts of copying a trademark such as translation, rotation, scaling,
mirror and partial copy The following are illustrations for each case
5.1 Implementation
OpenCVin [34] is an open-source BSD-licensed library that includes several hundreds of computer vision algorithms OpenCV has also a modular structure, which means that the package includes several shared or static libraries The following modules are available:
© core - a compact module defining basic data structures, including the dense multi-
dimensional array Mat and basic functions used by all other modules
30
Trang 32« imgproc - an image processing module that includes linear and non-linear image Gillering, geometrical image lansformations (resize, affine and perspective warping,
generic table-based remapping), color space conversion, histograms, and so on
« video - a video aralysis module that includes motion eslimalion, background
subbaction, and object tracking algorillins
« calib3d - basic multiple-view geometry algorithms, single and stereo camera
calibration, objecl pose estimation, slereo correspondence algorithms, and elements
of 3D reconstruction
« features2d - salient feature detectors, descriptors, and descriptor matchers
« objdetect - detection of objects and instances of the predefined classes (far
example, faces, eyes, mugs, people, cars, and so on)
« highgui - an easy-to-use interface to video capturing, image and video codecs, as
well as simple Ul capabilities
« gpu - GPU-accelerated algorithms from different OpenC'V modules
It is illustration of installation procedure of OpenCV library and how it can be
integrated with Microsoft Visual Studio 2010
« Download OpeuCV2.4.2 at www.sourecforge.nel/projecls/opencvlibrary/’
& Trstallto folder CAOpenCV2.4.2
« Now this ( C\OpenCV2.1.2 ) folder contains source files for OpenCV These source files need to be built for specific development environment which in the case
is Microsoll Visual Sludio 2010 CMake utilily cart be used to generate build Giles for Microsoft Visual Studio 2010
& Download and Install CMake utility from (www.cmske.org/flesAl2.8/cmake- 2.8 2-win32-x86.oxe)
« Open CMake and select the source directory for OpenCV source files ie CAOpenCV2.0), Select the build dircelary, for instance CAOpenCV2.4.XBuild
« Once source and build directories are selected Press Configure button and specify
generalor Microsof\ Visual Studio 10 and Int finish
31