0 UNIVERSITY OF ENGINEERING AND TECHNOLOGY VIETNAM NATIONAL UNIVERSITY, HANOI NGUYEN TIENDUNG TRADEMARK IMAGE RETRIEVAL BASED ON SCALE, ROTATION, TRANSLATION, INVARIANT FEATURES MA
Trang 10
UNIVERSITY OF ENGINEERING AND TECHNOLOGY
VIETNAM NATIONAL UNIVERSITY, HANOI
NGUYEN TIENDUNG
TRADEMARK IMAGE RETRIEVAL BASED ON
SCALE, ROTATION, TRANSLATION,
INVARIANT FEATURES
MASTER THESIS:INFORMATION TECHNOLOGY
Hanoi - 2014
Trang 21
UNIVERSITY OF ENGINEERING AND TECHNOLOGY
VIETNAM NATIONAL UNIVERSITY, HANOI
NGUYEN TIENDUNG
TRADEMARK IMAGE RETRIEVAL BASED
ON SCALE, ROTATION, TRANSLATION,
INVARIANT FEATURES
Major : Computer Science
Code : 60480101
MASTER THESIS: INFORMATION TECHNOLOGY
Supervised by: Dr Le Thanh Ha
Hanoi - 2014 Hanoi - 2010
Trang 32
Originality Statement
‘I hereby declare that this submission is my own work and to the best of my knowledge
it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at University of Engineering and Technology (UET) or any other educational institution, except where due acknowledgement is made in the thesis I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project’s design and conception or in style, presentation and linguistic expression is acknowledged.’
Signed
Trang 43
TABLE OF CONTENS
Originality Statement 2
ABBREVIATION 6
Abstract 7
Chapter 1: Introduction 8
Chapter 2: Related work 11
Chapter 3: Background 14
3.1 Pre-processing 14
3.2 Object description 19
3.3 Feature vectors extraction 20
3.3.1 Discrete Fourier Transform (DFT) 20
3.3.2 Log-polar transform 21
3.4 Measure of similarity 22
3.4.1 Euclidean distance 22
3.4.2 Mahalanobis distance 23
3.4.3 Chord distance 23
Chapter 4: Proposed method 24
4.1 Pre-processing 24
4.2 Visual shape objects extraction 24
4.3 Scale, rotation, translation invariant features 25
4.4 Measure of similarity 28
Chapter 5: Experiments and results 30
5.1 Implementation 30
5.2 Test results for exact copy actions 32
Trang 54
5.3 Test results for scaling action 33
5.4 Test results for rotating actions 34
5.5 Test results for mirror actions 35
5.6 Test results for partial copy actions 36
5.7 Test results for random query trademark 38
5.8 Testing summary 38
Chapter 6: Conclusion 40
REFERENCES 41
APPENDIX 45
Pre-processing 45
Visual shape objects extraction 45
Scale, rotation, translation invariant features extraction 47
Matching by measure of similarity and retrieval Trademark Images 49
Trang 65
List of Figure
Fig 1 Some trademark image samples 8
Fig 2 The log-polar transform maps (𝐱, 𝐲) into (𝐥𝐨𝐠(𝐫), 𝛉) 21
Fig 3 Log-polar transform of rotated and scaled squares: size goes to a shift on the 𝐥𝐨𝐠(𝐫) axis and rotation to a shift on the 𝛉 − 𝐚𝐱𝐢𝐬 22
Fig 4 Contour filter algorithm 25
Fig 5 Illustration of three stages of the proposed method 28
Fig 6 Samples of the collected trademark images for testing 30
Fig 7 Results for exact copy tests 32
Fig 8 Result for scaling tests 33
Fig 9 Results for translation and scaling tests 34
Fig 10 Results for rotation tests 35
Fig 11 Results for mirror tests 36
Fig 12 Results for parital copy tests 37
Fig 13 Results for random tests 38
Trang 76
ABBREVIATION
DFT: Discrete Fourier Transform
CBIR: Content Based Image Retrieval SIFT: Scale-invariant feature transform
Trang 87
Abstract
Trademark registration offices or authorities have been bombarded with requests from enterprises These authorities face a great deal of difficulty in protecting enterprises’ rights such as copyright, license, or uniqueness of logo or trademark since they have only conventional clustering Urgent and essential need for sufficient automatic trademark image retrieval system, therefore, is entirely worth thorough research In this thesis, a novel trademark image retrieval method is proposed in which the input trademark image
is first separated into dominant visual shape images then a feature vector for each shape image which is scale-, rotation-, and translation- invariant is created Finally, a similarity measure between two trademark is calculated based on these feature vectors Given a query trademark image, retrieval procedure is carried out by taking the most five similar trademark images in a predefined trademark Various experiments are conducted to mimic the many types of trademark copying actions and the experimental results exhibit the robustness of our retrieval method under these trademark copy actions
Trang 98
Chapter 1: Introduction
From an economic perspective, a trademark is clearly understood as a word, a design,
a picture, a complex symbol or even a combination of such, which is put on a product or standing for service of particular company In [2], four types of popular trademarks are listed in order of visual complexity: word-in-mark (only characters or words in the mark), device-mark (graphical or figurative elements), composite-mark (characters or words and graphical elements) and complex-mark (complex image) Fig 1 offers some trademark samples
Fig 1 Some trademark image samples
Every Company or Financial organization desires to own a distinctive, meaningful, and descriptive logo which offers both exclusive and right of its characteristic Drawing attention of consumers to their products or services and market viability dependsactually
on not only designing an intellectual and attractive trademark, but also whether or not preventing consumer confusion
The world markets have remarkably expandedand grown for global economic scenario caused by different trade related practices coming closer to each other at international level A great number of businesses have been established This has resulted
in millions of trademarks submitted to various trademark offices world over for registration need to have distinctiveness from the existing trademarks as per definitions and trade practices in different countries and this is likely to be on an increase in years to come.Actually, millions of trademarks already registered and millions of applications filed for trademarks registration are aggravating the problem of issuing the trademark certificates Therefore, the trademark registration authorities have received many trademark protection applications from enterprises.The problem for finding the similar trademark has become a challenge because These authorities face challenges in dealing with these proposals since they still use the traditional activity of classification (i.e., manual way) It is obvious that trademark registration with manual searching is very arduous task for the officials.It is really hard for them to make sure if a trademark is
Trang 109
duplicated: whether a particular trademark is registered or not; if a trademark resembles another registered trademark in any way, or, if copyright or license of trademark is infringed Thus, this poses an urgent need for an alternative automatic technology
In [33], there are different techniques and approaches currently in use for distinctness check for trademarks.The most popular and appreciated image processing techniques and approaches for the trademark distinctness check are Content Based Image retrieval techniques, which widely used for that purpose and some other approaches like shape and texture based similarity finding techniques are also used.Image processing tools and techniques can be used to solve different problems related to image, text, graphics and color etc A Trademark can be a combination of text, graphics, image, and colored texture Based on these, one can divide them in these components for finding the similarity among different trademarks retrieved from the trademark database Most of the recent techniques used for the image retrieval have mainly utilized the features like color, texture, shape etc They used existing CBIR technique, i.e Content Based Image Retrieval Systems to retrieve the images based on visual features like texture, color, shape In this technique extraction of color feature using the color histogram technique is utilized It also considers the shape feature because it is an important feature in CBIR applications Many techniques
or approaches have been utilized for the image retrieval, some of which are based on improved pattern matching algorithms Some others take a much broader approach like searching just from the text files Some are based on shape and color feature and some have attempted morphological pattern based image matching and retrieval using a database A shape based technique introduced for the logo retrieval reported in a paper is also inadequate to solve the problem amicably
In this thesis, a novel trademark image retrieval method is proposed in which the input trademark image is first separated into dominant visual shape images then a feature vector for each shape image which is scale-, rotation-, and translation- invariant is created Finally, a similarity measure between two trademark is calculated based on these feature vectors The manuscript entitled “Trademark Image Retrieval Based on Scale, Rotation, Translation, Invariant Features” related the issue of my thesis was published inComputing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), 2013 IEEE RIVF International Conference on10-13 Nov 2013
Trang 1110
My thesisis organized as follows: Chapter 1 is introduction of my thesis Chapter 2 represents some related works Chapter 3 illustrates background about some problems related Chapter 4 presents proposed method in detail Chapter 5 provides installation of Visual Studio 2010 with OpenCV2.4.2 on Window 7 for implementing my thesis anda presentation of experimental results Chapter 6 is conclusionof the thesis Additionally, in Appendix, I show the whole of source code of my thesis for reading convenience
Trang 1211
Chapter 2: Related work
In recent years, researchers have proposed a wide range of solutions in a bid to alleviate the workload for the trademark registration offices Chen, Sun, Yang [1], suggested two main steps for computing features vector Initially, object region extracted from a principal orientation-rotated image is equally partitioned into 16 regions Then an entropy vector as feature vector of an image is constructed by computing information entropy of each partitioned region This automatic shape-based retrieval technique achieves the desired performance, good invariance under rotation, translation, scale, noise, degree of thickness, and human visual perception satisfaction However, this single-feature retrieval system does not seem to meet multi-aspect of appreciation To improve this, among others, single-feature Zernike Moments (ZM) in [4, 10] and invariant moments in [3, 5, 6] features of each are combined with other features.Experiment results presented by [4] showed that this method has steady performance and good invariant property under translation, rotation and scale Moreover, and the low noise sensitivity of Zernike moments made this method more insensitive to noise However, because different users have different understanding for image similarity, the present methods of trademark image retrieval have some shortcomings in some aspects such as the retrieval ability to geometric deformation images, retrieval accuracy, the consistency between image and human visual perception Yet the retrieval by using Zernike moment method in [10]shows it can rapidlyretrieve trademarks A new method is proposed in [3] based on cosine distance and normalized distance measures The cosine distance metric normalizes all features vector to unit length and makes it invariant against relative in-plane scaling transformation of the image content The normalized distance combines two distances measures such as cosine distance and Euclidean distance which shows more accuracy than one method alone The proposed measures take into account the integration of global features (invariant moments and eccentricity) and local features (entropy histogram and distance histogram) It first indexes trademark images database (DB) in order to search for trademarks in narrowed limit and reduce time computation and then calculates similarities for features vector to obtain the total similarity between features vector.An alternative solution worth a mentioning is four shape features: global features (invariant moments and eccentricity) and local features (entropy histogram and distance histogram) [16] are exploited by [3]
Trang 1312
Recently, [5] combined nine feature vectors of low-order color moments in HSV color space with low-orders Hu moments and eccentricity which are extracted from gray shape-region image by Rui and Huang’s (1998) technique The way of Gauss normalization is applied to those features, and the weight of every feature can be adjusted flexibly [17].Good results have been obtained in the experiments which prove that the multi-feature combination way is better than other single-feature ways [6] employed10 invariant moments as shape features of trademark images improved by [20] These features are input
to an ensemble of RBFNNs trained via a minimization of the localized generalization error
to recognize the trademark images In this current system, the trademark images are and-white images The system will be improved to adopt color trademark images as further study
black-In [2, 7], the ways of proposed combination of features are definitely different It is admitted that each of them performs well The equidistance based on concentric circles [14] is used to partition region, labelled the first step in [4] and [2] [4], and [2] differ in the implementation of the second step: [4] calculated its feature vector F composed of corresponding region ZM, while [2] combined region feature vectors of 200 values with contours features which are considered the corner-to-centroid triangulations detected by the Hong & Jiang’s improved SUSAN algorithm [15] Iwanaga, et al [7] put forward the modified angle-distance pair-wise histogram based on the angle histogram and distance histogram of trademark object This system outperforms both moment-based and independent histogram; i.e angle, distance, color Experiments conducted on registered trademark databases Impressive results were shown to demonstrate the robustness of the proposed approach Moreover, it is quite simpleto construct the distance–angle pair-wise histogram for a trademark object
As the state-of-the-art method, [10] integrated ZM with SIFT features In this approach, Zernike moments of the retrieved image were firstly extracted and sortedaccording to similarity Candidate images were formed Then, the SIFT features were used for matching the query image accurately with candidate images This method not only keeps high precision- recall of SIFT features and is superior thanthe method based
on the single Zernike moments feature, but also improves effective retrieval speedcompared to the single SIFT features This method can be well applied to the
Trang 1413
trademark image retrievalsystem This newly proposed approach enhances the retrieval performances Tuan N.G., et al in [27] presented a new method based on the discriminative properties of trademark images for text recognition in trademark images The experimental results show the significant gain in text recognition accuracy of proposed method in comparing with traditional text recognition methods This contribution seems to deal with a part of recognition of trademark image
However, those approaches seem to ignore not only partial trademark comparison, but also mirrored trademark Furthermore, the approaches have only concentrated on either original trademark without removing noise elements of trademark or standard database which contains no noise Additionally, these approaches have taken the trademark images
as a completed object to process and do not concern with the detailed visual shape in the trademark images Therefore, they cannot detect the partial similarity between trademark images.Nonetheless, calculating distance between two features also plays an extremely important part in measuring the similarity degrees among images For this reason, the mentioned solutions each endeavours to propose an appropriate measure to some extent
To overcome the above-mentioned drawbacks, an novel content-based trademark recognition method is proposed with these four main stages: (i) pre-process or scale down the trademark images and converts them into binary image; (ii) extract dominant shape objects from the binary images; (iii) apply RBRC algorithm to extract rotation-invariant, scale-invariant, translation-invariant features from the shape objects; and (iv) use Euclidian distance to measure similarity of two images and then retrieve 10 trademark images which are the most similar to query trademark images The thesis focuses on handling Vietnamese composite-mark database
Trang 1514
Chapter 3: Background
3.1 Pre-processing
Converting gray scale to binary image
In [31], segmentation involves separating an image into regions (or their contours) corresponding to objects We usually try to segment regions by identifying common
properties Or, similarly, we identify contours by identifying differencesbetween regions
The simplest property that pixels in a region can share is intensity So, a natural way to
segment such regions is through thresholding, the separation of light and dark regions
Thresholding creates binary images from grey-level ones by turning all pixels below some threshold to zero and all pixels about that threshold to one If 𝑔(𝑥, 𝑦) is a thresholded version of 𝑓(𝑥, 𝑦) at some global threshold 𝑇,
𝑔 𝑥, 𝑦 = 0, 𝑜𝑡𝑒𝑟𝑤𝑖𝑠𝑒1, 𝑖𝑓 𝑓(𝑥, 𝑦) ≥ 𝑇 (1)
Problems with thresholding
The major problem with thresholding is that we consider only the intensity, not any relationships between the pixels.There is no guarantee that the pixels identified by the thresholding process are contiguous.We can easily include extraneous pixels that aren’t part of the desired region, and we can just as easily miss isolatedpixels within the region (especially near the boundaries of the region) These effects get worse as the noise gets worse,simply because it’s more likely that the pixels intensity doesn’t represent the normal intensity in the region.When we use thresholding, we typically have to play with
it, sometimes losing too much of the region and sometimesgetting too many extraneous background pixels (Shadows of objects in the image are also a real pain - notjust where they fall across another object but where they mistakenly get included as part of a dark object on a lightbackground.)
Trang 1615
We can deal, at least in part, with such uneven illumination by determining thresholds locally That is, instead ofhaving a single global threshold, we allow the threshold itself to smoothly vary across the image
Automated methods for finding thresholds
To set a global threshold or to adapt a local threshold to an area, we usually look at the histogram to see if we can find two or more distinct modes - one for the foreground and one for the background Recall that a histogram is a probability distribution:
𝑐(𝑔) = 𝑝(𝑔)𝑔
Simply set the threshold 𝑇such that 𝑐(𝑇) = 1/𝑝 (Or, if we’re looking for a dark object
on a light background,𝑐(𝑇) = (1 − 1/𝑝)
Finding peaks and valleys
One extremely simple way to find a suitable threshold is to find each of the modes (local maxima) and then find thevalley (minimum) between them.While this method appears simple, there are two main problems with it:the histogram may be noisy, thus causing many local minima and maxima To get around this, the histogram isusually smoothed before trying to find separate modes; the sum of two separate distributions, each with their own mode, may not produce a distribution with twodistinct modes
Clustering (K-Means Variation)
Another way to look at the problem is that we have two groups of pixels, one with one range of values and one withanother What makes thresholding difficult is that these ranges usually overlap What we want to do is to minimizethe error of classifying a background pixel as a foreground one or vice versa To do this, we try to minimize the areaunder the histogram for one region that lies on the other region’s side of the threshold
Trang 1716
The problem is that we don’thave the histograms for each region, only the histogram for the combined regions Understand that the place of minimum overlap (the place where the
misclassified areas of the distributions areequal) is not is not necessarily where the valley
occurs in the combined histogram This occurs, for example, whenone cluster has a wide distribution and the other a narrow one One way that we can try to do this is to consider
the values in the two regions as two clusters.In other words, let 𝜇𝐵(𝑇) be the mean of all
pixels less than the threshold and 𝜇𝑂(𝑇) be the mean of all pixelsgreater than the threshold We want to find a threshold such that the following holds:
∀𝑔 ≥ 𝑇 ∶ |𝑔 − 𝜇𝐵(𝑇)| > |𝑔 − 𝜇𝑂 𝑇 | (4)
and
∀𝑔 < 𝑇 ∶ |𝑔 − 𝜇𝐵(𝑇)| < |𝑔 − 𝜇𝑂(𝑇)| (5) The basic idea is to start by estimating𝜇𝐵(𝑇) as the average of the four corner pixels (assumed to be background) and 𝜇𝑂(𝑇) as the average of everythingelse Set the threshold to be halfway between 𝜇𝐵(𝑇) and 𝜇𝑂(𝑇) (thus separating the pixels according
to how closetheir intensities are to 𝜇𝐵(𝑇) and 𝜇𝑂(𝑇) respectively) Now, update the estimates of 𝜇𝐵(𝑇) and 𝜇𝑂(𝑇) respectivelyby actually calculating the means of the pixels
on each side of the threshold This process repeats until the algorithmconverges.This method works well if the spreads of the distributions are approximately equal, but it does
not handle well thecase where the distributions have differing variances
Clustering (The Otsu Method)
Another way of accomplishing similar results is to set the threshold so as to try to make each cluster as tight as possible,thus minimizing their overlap Obviously, we can’t change the distributions, but we can adjust where weseparate them (the threshold) As we adjust the threshold one way, we increase the spread of one and decrease thespread of the other The goal then is to select the threshold that minimizes the combined spread.We can
define the within-class variance as the weighted sum of the variances of each cluster:
𝜍𝑤𝑖𝑡 𝑖𝑛2 𝑇 = 𝑛𝐵 𝑇 𝜍𝐵2 𝑇 + 𝑛𝑜(𝑇)𝜍02(𝑇) (6)
𝜍𝐵2 𝑇 = the variance of the pixels in the background (below threshold)
𝜍02 𝑇 = the variance of the pixels in theforeground (above threshold)
Trang 1817
and [0,N − 1] is the range of intensity levels
Computing this within-class variance for each of the two classes for each possible threshold involves a lot ofcomputation, but there’s an easier way.If we subtract the within-class variance from the total variance of the combined distribution, you get somethingcalled the between-class variance:
𝜍𝐵𝑒𝑡𝑤𝑒𝑒𝑛2 𝑇 = 𝜍2 − 𝜍𝑤𝑖𝑡 𝑖𝑛2 𝑇 (9)
= 𝑛𝐵 𝑇 𝜇𝐵 𝑇 − 𝜇 2 + 𝑛𝑂(𝑇) 𝜇𝑂 𝑇 − 𝜇 2 (10) where𝜍2is the combined variance and μ is the combined mean Notice that the between-
class variance is simply the weighted variance of the cluster means themselves around the overall mean Substituting 𝜇 = 𝑛𝐵 𝑇 𝜇𝐵(𝑇) + 𝑛𝑂 𝑇 𝜇𝑂(𝑇) and simplifying, we get
𝜍𝐵𝑒𝑡𝑤𝑒𝑒𝑛2 𝑇 = 𝑛𝐵 𝑇 𝑛𝑂 𝑇 𝜇𝐵 𝑇 − 𝜇𝑂 𝑇 2 (11)
So, for each potential threshold T we
Separate the pixels into two clusters according to the threshold
Find the mean of each cluster
Square the difference between the means
Multiply by the number of pixels in one cluster times the number in the other This depends only on the difference between the means of the two clusters, thus avoiding having to calculate differences between individual intensities and the clustermeans The optimal threshold is the one that maximizes the between-class variance (or, conversely, minimizes the within-class variance)
This still sounds like a lot of work, since we have to do this for each possible threshold, but it turns out that thecomputations aren’t independent as we change from one threshold to another We can update 𝑛𝐵(𝑇), 𝑛𝑂(𝑇), and the respective cluster means
𝜇𝐵(𝑇) and 𝜇𝑂(𝑇) as pixels move from one cluster to the other as T increases Using
simple recurrence relations we can update the between-class variance as we successively test each threshold:
𝜇𝐵 𝑇 + 1 = 𝜇𝐵 𝑇 𝑛𝐵 𝑇 +𝑛𝑇𝑇
Trang 19to optimize some statistical measure, mixture modeling assumes that there already exists
two distributions and we must find them Once we know the parameters of the distributions, it’s easy to determine the best threshold Unfortunately, we have six unknown parameters (𝑛𝐵, 𝑛𝑂, 𝜇𝐵, 𝜇𝑂, 𝜍𝐵, 𝑎𝑛𝑑 𝜍𝑂), so we need to make some estimates of these quantities If the two distributions are reasonably well separated (some overlap but
not too much), we can choose an arbitrary threshold T and assume that the mean and
standard deviation of each group approximates the mean and standard deviationof thetwo underlying populations We can then measure how well a mix of the two distributions approximates the overall distribution:
𝐹 = 𝑁−1 𝑚𝑜𝑑 𝑒𝑙 𝑔 − 𝑖𝑚𝑎𝑔𝑒 𝑔 2
Choosing the optimal threshold thus becomes a matter of finding the one that causes the mixture of the twoestimated Gaussian distributions to best approximate the actual
histogram (minimizes F) Unfortunately, the solution space is too large to search
exhaustively, so most methods use some form of gradient descent method Such gradient descent methods depend heavily on the accuracy of the initial estimate, but the Otsu method or similar clustering methods can usually provide reasonable initial estimates.Mixture modeling also extends to models with more than two underlying distributions (more than two types ofregions)
Multispectral thresholding
A technique for segmenting images with multiple components (color images,Landsat images, or MRI images with T1, T2, and proton-density bands) works by
Trang 2019
estimating the optimal thresholdin one channel and then segmenting the overall image based on that threshold Each of theseregions is then subdivided independently using properties of the second channel It is repeated again for the third channel, and so on,running through all channels repeatedly until each region in the image exhibits a distribution indicative of a coherentregion (a single mode)
Thresholding along boundaries
If we want our thresholding method to give stay fairly true to the boundaries of the object, we can first apply someboundary-finding method (such as edge detection techniques) and then sample the pixels only where the boundary probability is high.Thus, our threshold method based on pixels near boundaries will cause separations of the pixels
in ways that tendto preserve the boundaries Other scattered distributions within the object
or the background are of no relevance.However, if the characteristics change along the boundary, we’re still in trouble And, of course, there’s still noguarantee that we’ll not have extraneous pixels or holes
3.2 Objectdescription
In [33], Objects are represented as a collection of pixels in an image Thus, for purposes of recognition weneed to describe the properties of groups of pixels The
description is often just a set of numbers: the object’s descriptors From these, we can
compare and recognize objects by simply matching the descriptors of objects in an image against the descriptors of known objects However, to be useful for recognition,
descriptors should have four important properties First, they should define a complete set
That is, two objects must have the same descriptors if and only if they have the same
shape Secondly, they should be congruent As such, we should be able to recognizesimilar objects when they have similar descriptors Thirdly, it is convenient that they haveinvariant properties For example, rotation-invariant descriptors will be useful for recognizing objects whatever their orientation Other important invariance properties
include scale and position and also invariance to affine and perspective changes These last two properties are very important when recognizing objects observed from different
viewpoints In addition to these three properties, the descriptors should be a compact set
Namely, a descriptor should represent the essence of an object in an efficient way That is,
it should only contain information about what makes an object unique, or different from
Trang 2120
the other objects The quantity of information used to describe this characterization should be less than the information necessary to have a complete description of the object itself Unfortunately, there is no set of complete and compact descriptors to characterize general objects Thus, the best recognition performance is obtained by carefully selected properties As such, the process of recognition is strongly related to each particular application with a particular type of objects Here, the characterization of objects is
presented by two forms of descriptors Region and shape descriptors characterize an arrangement of pixels within the area and the arrangement of pixels in the perimeter or boundary, respectively This region versus perimeter kind of representation is common in image analysis For example, edges can be located by region growing (to label area) or by differentiation (to label perimeter) There are many techniques that can be used to obtain
descriptors of an object’s boundary
3.3 Feature vectors extraction
3.3.1 Discrete Fourier Transform (DFT)
In [34], the Fourier Transform will decompose an image into its sinus and cosines components In other words, it will transform an image from its spatial domain to its frequency domain The idea is that any function may be approximated exactly with the sum of infinite sinus and cosines functions The Fourier Transform is a way how to do this Mathematically a two dimensional images Fourier transform is:
𝐹 𝑘, 𝑙 = 𝑓 𝑖, 𝑗 𝑒−𝑖2𝜋(𝑘𝑖𝑁 +𝑘𝑗
𝑁 ) 𝑁−1
Trang 2221
3.3.2 Log-polar transform
In [32], for two-dimensional images, the log-polar transform [Schwartz80] is a change from Cartesian to polar coordinates:(𝑥, 𝑦) ↔ 𝑟𝑒𝑖𝜃, where 𝑟 = 𝑥2 + 𝑦2and𝑒𝑥𝑝 𝑖𝜃 = exp(i arctan(y/x)) To separate out the polar coordinates into a (𝜌, 𝜃) space that is relative to some center point (𝑥𝑐, 𝑦𝑐), we take the log so that 𝜌 =log 𝑥 − 𝑥𝑐 2 + 𝑦 − 𝑦𝑐 2and𝜃 = arctan((𝑦 − 𝑦𝑐)/(𝑥 − 𝑥𝑐)) For image purposes - when we need to “fit” the interesting stuff into the available image memory - we typically apply a scaling factor m to ρ Fig 2 shows a square object on the left and its encoding in log-polar space
Fig 2 The log-polar transform maps (𝐱, 𝐲) into (𝐥𝐨𝐠(𝐫), 𝛉)
The log-polar transform can be used to create two-dimensional invariant representations of object views by shifting the transformed image’s center of mass to a fixed point in the log-polar plane; see Fig.3 On the left are three shapes that we want to recognize as “square” The problem is, they look very different One is much larger than the others and another is rotated The log-polar transform appears on the right in Figure 3
Observe that size differences in the (x, y) plane are converted to shifts along the log(r)
axis of the log-polar plane and that the rotation differences are converted to shift s along
the θ-axis in the log-polar plane If we take the transformed center of each transformed
square in the log-polar plane and then recenterthat point to a certain fixed position, then all the squares will show up identically in the log-polar plane This yields a type of invariance to two-dimensional rotation and scaling
Trang 23in the database is computed and compared to report a few of most similar images
A similarity measurement must be selected to decide how close a vector is to another vector The problem can be converted to computing the discrepancy between two vectors 𝑥, 𝑦 ∈ 𝑅𝑑 Several distance measurements are presented as follows
𝜏2 𝑥, 𝑦 = max1≤𝑗 ≤𝑑 𝑥𝑗 − 𝑦𝑗 (22)
Trang 24𝑛 of size n are computed by 𝑆=1𝑛𝑖=1𝑛𝑥𝑖−𝑢𝑥𝑖−𝑢𝑡 with
Trang 25modeling, and multispectral thresholding
4.2 Visual shape objects extraction
In this stage, shape objects in form of connected contours are retrieved from the binary image using Suzuki's algorithm in [12] Each of those detected contours is stored as
a vector of points All of the vectors are organized as a structure of hierarchy, which contains information about the image topology The number of contours is manipulated by the texture of image
There are four options processed in [12] for contour retrieval: (i) retrieve only the extreme outer contours, (ii) retrieve all of the contours without establishing any hierarchical relationships, (iii) retrieve all of the contours organized into a two-level hierarchy; and (iv) retrieve all of the contours and reconstruct a full hierarchy of nested contours
In the present research, we adopted the second option for shape object extraction From a binary trademark image, we extracted a number of shape object images However, due to noise presence of input trademark image, many noise contours were extracted as shape objects To prevent this problem, we applied a filter so that the noise contours were removed Observations showed that the dominant shape contours usually have a much larger area in comparison with the noise contours Furthermore, due to the characteristics
of trademarks in our database, most trademarks consist of one or two dominant shape objects which play a primary role in a company’s reputation For this reason, we propose
an algorithm to extract up to two dominant shape objects out of a binary image The algorithm composes 4 main steps and one function named FilterContours (see Fig 4) responsible for taking out two dominant shape objects FilterContours operation was
Trang 261 Compute each contour area using the Green formulapresented in [29]
2 Sort these extracted contours according to descending order of contours area
3 Remove noise shape contours in trademark image; just keep two 2nd and 3rd biggest area contours
4 Remove one of the kept contours by FilterContours function
4.1 if (The area of 2nd contour is T1 times bigger than that of 3rd contour and the area of 2nd is less than T2) then 3rd contour is removed and 2nd contour is
remained
4.2 else if (the area of 2nd is greater than T2) then 2nd contour is deleted and 3rd
contour is kept
4.3 else then both contours are maintained
return (One or Two Contours)
Fig 4 Contour filter algorithm
4.3 Scale, rotation, translation invariant features
For each extracted shape object, a corresponding feature vector is created In order to generalize the action of trademark copy such as duplication, rotation, and resizing, the extracted feature vector of the shape object must be invariant in terms of rotation, scale, and translation In this paper, we use RBRC algorithm in [13] which is composed of three steps: a two-dimensional Fourier transform (𝐷𝐹𝑇), the magnitude of the Fourier represented into polar coordinates, and 𝐷𝐹𝑇 𝑅𝐵𝑅𝐶algorithm makes our method scale-invariant rotation-invariant, and translation-invariant This is in line with Fourier-Mellin approach and polar Fourier representation [19, 26] The approaches suggested in [19, 26] combine the phase correlation technique with the polar representation to address the
Trang 2726
problem of both translated and rotated objects However, a major difference between
RBRC algorithm and Fourier-Mellin is that DFT used in last step in the former, while
phase correlation applied in the latter To explain why RBRC algorithm is invariant in
terms of rotation, translation and scale, theory about log-polar transform (𝐿𝑃𝑇) transform
related to magnitude of 𝐷𝐹𝑇 is presented in [19,26] We present Log-polar transform
(𝐿𝑃𝑇) and Representation of magnitude of 𝐷𝐹𝑇 into log-polar coordinate for reading
convenience
Log-polar transform is a nonlinear and nonuniform sampling method used to convert
image from the Cartesian coordinates 𝐼(𝑥, 𝑦) to the log-polar coordinates 𝐼𝐿𝑃(𝜌, 𝜃) [24]
The mathematical expression of 𝐿𝑃𝑇 procedure is shown below:
𝜌 = 𝑥 − 𝑥𝑐 2 – 𝑦 − 𝑦𝑐 2, 𝜃 = tan−1 (𝑦−𝑦𝑐 )
where(𝑥𝑐, 𝑦𝑐) is the center pixel of the transformation in the Cartesian coordinates
(x, y) denotes the sampling pixel in the Cartesian coordinates and (ρ, θ) denotes the
log-radius and the angular position in the log-polar coordinates Given 𝑔(𝑥′, 𝑦′) a scaled and
rotated image of 𝑓(𝑥, 𝑦) with scale rotation parameters a and α degrees, we have:
to pure translation in log-polar domain However, when the original image is translated by
(𝑥0, 𝑦0), the corresponding log-polar coordinates is represented by:
𝜃′ = tan−1 (𝑦 −𝑦0)
(𝑥−𝑥 0 ), 𝜌′ = log 𝑥 − 𝑥0 2 + 𝑦 − 𝑦0 2 (31)
Trang 2827
Those equations, as well as in [22] indicated that, in case of the slight translation, it
produces only a modified log-polar image To overcome this limitation, the algorithm first
applies Fourier transform and then applies the Log-Polar Transform to the magnitude
spectrum The magnitudes of the Fourier transform of these images and its translated
counterpart are same i.e.; invariant to translation but retain the effects of rotation and
scaling [19, 21, 22, 23, 24, 25, 26] Those comments are explained more clearly in
representation of magnitude of 𝐷𝐹𝑇 into log-polar coordinate
According to [19, 21, 23, 24, 25, 26], as well as based on the rotation and translation
properties of Fourier transform that the power spectrum of the rotated and translated image
will also be rotated by the same orientation while the magnitude remains the same Let
𝐹1(𝜉, 𝜂)and 𝐹2(𝜉, 𝜂)be the Fourier transforms of images 𝑓1(𝑥, 𝑦)and 𝑓2(𝑥, 𝑦), respectively
We are interested in three below cases:
The first case: If 𝑓2differs from 𝑓1 only by a displacement 𝑥0, 𝑦0), then
𝑓2 𝑥, 𝑦 = 𝑓1(𝑥 − 𝑥0, 𝑦 − 𝑦0) (32)
or in frequency domain:
𝐹2 𝜉, 𝜂 = 𝑒−𝑗 2𝜋(𝜉 𝑥0+𝜂 𝑦0)∗ 𝐹1(𝜉, 𝜂) (33)
In second case: If 𝑓2(𝑥, 𝑦) is a translated and rotated replica of 𝑓1(𝑥, 𝑦) with
translation (𝑥0, 𝑦0) and rotation 𝜃0, then
𝑓2 𝑥, 𝑦 = 𝑓1 𝑥 cos 𝜃0 + 𝑦 sin 𝜃0 − 𝑥0, −𝑥 sin 𝜃0 + 𝑦 cos 𝜃0 − 𝑦0 (34) DFTs of𝑓1 and 𝑓2 are related as shown below:
𝐹2 𝜉, 𝜂 = 𝑒−𝑗 2𝜋(𝜉 𝑥 0 +𝜂 𝑦 0 )∗ 𝐹1(𝜉 cos 𝜃0 + 𝜂 𝑠𝑖𝑛 𝜃0, −𝜉 sin 𝜃0 + 𝜂 cos 𝜃0) (35)
Ii is supposed that 𝑀1 and 𝑀2 are the magnitudes of 𝐹1and 𝐹2 , respectively They
have relation as shown:
𝑀2 𝜉, 𝜂 = 𝑀1(𝜉 cos 𝜃0 + 𝜂 𝑠𝑖𝑛 𝜃0, −𝜉 sin 𝜃0 + 𝜂 cos 𝜃0) (36) The Fourier magnitude spectra are transformed to polar representation:
Where𝜌 and 𝜃 are the radius and angle in the polar co-ordinate system, respectively
Trang 2928
In the last case:If 𝑓1 is a translated, rotated and scaled version of 𝑓2, the Fourier magnitude spectra are transformed to log-polar representations and related by:
𝑀2 𝜌, 𝜃 = 𝑀1 𝜌 𝑠 , 𝜃 − 𝜃0 (38)
𝑀2 log 𝜌, 𝜃 = 𝑀1(log 𝜌 − log 𝑠 , 𝜃 − 𝜃0) (39)
It is obviously that scale, rotation, translation of image is represented as translation in frequency domain in Log-Polar coordinate This translation is eliminated by applying 𝐷𝐹𝑇 one more time
Fig 5 Illustration of three stages of the proposed method
Fig 5 summarizes the three stages of our method The query image was merged from two images in our database After it was pre-processed, the query image was transformed into a binary image Then, the two dominant shape objects were extracted via object extraction stage Finally, two feature vectors were extracted from the shape objects
4.4 Measure of similarity
An act of copying a trademark can be done by:
One trademark is scaled, rotated, or translated from the other
One trademark is combined from a part of the other
One trademark is mirrored from the other
In order to recognize the copied trademark image, we derive a trademark similarity measure based on its feature vectors After stage of creating feature vectors, a trademark input image is represented by one or two feature vector Let 𝐼 and 𝐼′ the two trademark image, we suppose that 𝐹𝑖 and 𝐹𝑗𝑤𝑒𝑟𝑒 𝑖 = 1, 2; 𝑗 = 1, 2 are feature vectors of 𝐼and 𝐼′, respectively
Trang 3029
We propose that degree of similarity of two trademarks 𝐼, 𝐼′signed 𝑆(𝐼, 𝐼′) is primarily the smallest distance between two feature vectors; one in set (𝐹𝑖)and one in set (𝐹𝑗) denoted by𝑑𝑖𝑠𝑡(𝐹𝑖, 𝐹𝑗) We employed Euclidian distance to compute the distance of two feature vectors, which can be expressed as follows:
𝑆 𝐼, 𝐼′ = min{dist 𝐹𝑖, 𝐹𝑗 }𝑤𝑒𝑟𝑒 𝑖, 𝑗 = 1,2 (16)
Trang 3130
Chapter 5: Experiments and results
Trademark image database in my thesis was taken from NOIP [29] In [27], it was indicated that there are about 30% trademark images containing text only, 70% trademark images containing text and other object such as picture, symbol, 80% consecutive characters in word have same color and 90% trademark images have number of color smaller than 5 We collect 243 trademark images in the trademark image database which are sort of composite-mark to construct our testing trademark image database Fig 6 shows some trademark image samples of the collected testing trademark database
Fig 6 Samples of the collected trademark images for testing
To evaluate the performance accuracy of our method, we used 𝑆(𝐼, 𝐼′) to retrieve top 5 most similar trademark image to an input query image In addition, we modify the input image to mimic the acts of copying a trademark such as translation, rotation, scaling, mirror and partial copy The following are illustrations for each case
5.1 Implementation
OpenCVin [34] is an open-source BSD-licensed library that includes several hundreds of computer vision algorithms OpenCV has also a modular structure, which means that the package includes several shared or static libraries The following modules are available:
core - a compact module defining basic data structures, including the dense
multi-dimensional array Mat and basic functions used by all other modules
Trang 3231
imgproc - an image processing module that includes linear and non-linear image
filtering, geometrical image transformations (resize, affine and perspective warping, generic table-based remapping), color space conversion, histograms, and so on
video - a video analysis module that includes motion estimation, background
subtraction, and object tracking algorithms
calib3d - basic multiple-view geometry algorithms, single and stereo camera
calibration, object pose estimation, stereo correspondence algorithms, and elements
of 3D reconstruction
features2d - salient feature detectors, descriptors, and descriptor matchers
objdetect - detection of objects and instances of the predefined classes (for
example, faces, eyes, mugs, people, cars, and so on)
highgui - an easy-to-use interface to video capturing, image and video codecs, as
well as simple UI capabilities
gpu - GPU-accelerated algorithms from different OpenCV modules
It is illustration of installation procedure of OpenCV library and how it can be
integrated with Microsoft Visual Studio 2010
Download OpenCV2.4.2 at www.sourceforge.net/projects/opencvlibrary/
Install to folder C:\OpenCV2.4.2
Now this ( C:\OpenCV2.4.2 ) folder contains source files for OpenCV These source files need to be built for specific development environment which in the case
is Microsoft Visual Studio 2010 CMake utility can be used to generate build files for Microsoft Visual Studio 2010
Download and Install CMake utility from 2.8.2-win32-x86.exe)
(www.cmake.org/files/v2.8/cmake- Open CMake and select the source directory for OpenCV source files i.e( C:\OpenCV2.0) Select the build directory, for instance C:\OpenCV2.4.2\Build
Once source and build directories are selected Press Configure button and specify generator Microsoft Visual Studio 10 and hit finish