Luận văn trích xuất Ảnh trademark dựa trên các Đặc trưng bất biến dịch chuyển quay tỷ lệ trademark image retrieval based on scale rotation translation invariant features

UNIVERSITY OF ENGINEERING AND TECHNOLOGY VIETNAM NATIONAL UNIVERSITY, HANOI NGUYEN TIENDUNG TRADEMARK IMAGE RETRIEVAL BASED ON SCALE, ROTATION, TRANSLATION, INVARIANT FEATURES MASTER

Trang 1

UNIVERSITY OF ENGINEERING AND TECHNOLOGY

VIETNAM NATIONAL UNIVERSITY, HANOI

NGUYEN TIENDUNG

TRADEMARK IMAGE RETRIEVAL BASED ON

SCALE, ROTATION, TRANSLATION,

INVARIANT FEATURES

MASTER THESIS: INFORMATION TECHNOLOGY

Hanoi - 2014

Trang 2

UNIVERSITY OF ENGINEERING AND TECHNOLOGY

VIETNAM NATIONAL UNIVERSITY, HANOI

NGUYEN TIENDUNG

TRADEMARK IMAGE RETRIEVAL BASED

ON SCALE, ROTATION, TRANSLATION,

INVARIANT FEATURES

Major : Compuler Science

Code : 60480101

MASTER THESIS: INFORMATION TECHNOLOGY

Supervised by: Dr Le Thanh Ha

Hanoi - 2014

Trang 3

Originality Statement

‘Thereby declare that this submission is my own work and to the best of my knowledge

it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or

diploma at University of Engineering and Technology (UET) or any other educational

mstitulion, except where dus acknowledgement is made in the thesis T also declare thai the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project's design and conception or in style, presentation

and linguistic expression is acknowledged.”

Signed

a

Trang 4

Feature vectors extraction

Diserete Fourier Transform (DFT) Lop-polar transform

‘Measure of similarity

FEuchdean distance

Mahalanobis distance Chord distance

Chapter 4: Proposed method

Al

42

43

44

PT6-ĐFOCGSSÌnE ào ccvccctscceg "

Visual shape objects extraction HH Hee,

Scale, rotation, translation invariant Íeattes

Trang 5

5.3 ‘Test results for scaling action Lạng HH HH0 0 em re

5.4 Testresults for rotating actions on

55 Test results for mirror actions

5.6 Test results for partial copy actions

5.7 Test results for random query trademark

Visual shape objects extraction

Scale, rotation, translation invariant features extraction

Matching by measure of similarity and retrieval Irademark lnaages

35

Trang 6

Some trademark image samples

The log-polar transform maps (x y) into (log(r), 8)

21

Log-polar transform of rotated and scaled squares: size goes to a shift an the

log(r) axis and rotation to a shift ơn the 6 — axis

Fig, 4 Contow filter algorithm

Illustration of three stages of the proposed method

Samples of the collected trademark images for testing

Results for exacl copy Lests Result for scaling tests

Results for translation and scaling: tests

Results for rotation tests

Results for mirror tests

Results [or parital copy tesis Results for random lesls

Trang 7

ABBREVIATION

DEFT: Discrete Fourier Transform

CBIR: Content Based Tnage Retneval

SIFT: Scale-invariant feature transform

Trang 8

Abstract

Trademark registration offices or authorities have been bombarded with requests

from enterprises ‘hese authorities face a great deal of difficulty in protecting enterprises’ rights such as copyright, license, or uniqueness of logo or trademark since they have only conventional clustering Urgent and essential need for sufficient automatic trademark unage retrieval system, therefore, is enGrely worth thorough research In this thesis, a

novel trademark image retrieval method is proposed in which the input trademark image

is first separated into dominant visual shape images then a feature vector for each shape image which is scale-, rotation-, and translation- invariant is created, Vinally, a similarity measure between two Irademark is calculated based on these fealure vectors Given a query tradsmark image, relricval procedure is carried out by taking the most five similar trademark images ina predefined trademark Various experiments are conducted to mimic the many types of trademark copying actions and the experimental results exhibit the

robustness of our retrieval method under these trademark copy actions

Trang 9

Chapter 1: Introduction

From an economic perspective, a trademark is clearly understood as a word, a design,

a picture, a complex symbol or even a combination of such, which is put on a product or standing for service of particular company In [2], four types of popular trademarks are listed in order of visual complexity: word-in-mark (only characters or words in the mark), device-mark (graphical or figurative elements), composite-mark (characters or words and graphical elements) and complex-mark (complex image) Fig 1 offers some trademark

G

BAGTRAN

samples

Fig 1 Some trademark image samples

Every Company or Financial organization desires to own a distinctive, meaningful, and descriptive logo which offers both exclusive and right of its characteristic Drawing

attention of consumers to their products or services and market viability dependsactually

on not only designing an intellectual and attractive trademark, but also whether or not

preventing consumer confusion

The world markets have remarkably expandedand grown for global economic

scenario caused by different trade related practices coming closer to each other at international level A great number of businesses have been established This has resulted

in millions of trademarks submitted to various trademark offices world over for

registration need to have distinctiveness from the existing trademarks as per definitions and trade practices in different countries and this is likely to be on an increase in years to

come.Actually, millions of trademarks already registered and millions of applications filed

for trademarks registration are aggravating the problem of issuing the trademark

certificates Therefore, the trademark registration authorities have received many

trademark protection applications from enterprises The problem for finding the similar trademark has become a challenge because These authorities face challenges in dealing

with these proposals since they still use the traditional activity of classification (i manual way) It is obvious that trademark registration with manual searching is very

arduous task for the officials.It is really hard for them to make sure if a trademark is

8

Trang 10

duplicated: whether a particular trademark is registered or not; if a trademark resembles another registered trademark in any way, ot, if copyright or license of trademark is infringed Thus, this poses an urgent need for an alternative automatic technology

Tn [33], there are different tectmiques and approaches currently tn usc for distinetrisss check [or trademarks The most popular and appreciated image processing lechniques and approaches for the trademark distinctness check are Content Based Image retrieval techniques, which widely used for that purpose and some other approaches like shape and

texture based similarity finding lechsiques are also used Image processing tools and

fovkniques can be usod to solve different problems related to image, lex, graphics and color cle A Trademark can be a combination of text, graphics, image, and colored Lexture Based on these, one can divide them in these components for finding the similarity among

different trademarks retrieved from the trademark database Most of the recent techniques

used [or (he image retrieval have mainly utilized the features like color, lexlure, shape etc They used existing CBIR tcclmique, ic Content Based Tmage Retrieval Sysiems to rotticve the images based on visual features like texture, color, shape In this technique extraction of color feature using the color histogram technique is utilized It also considers the shape feature because it is an important feature in CBIR applications Many techniques

or approaches have been utilized for the image retrieval, some of which are based on improved pattern matching algorithms Some others take a much broader approach like searching just from the text files Some are based on shape and color feature and some have attempted morphological patter based image matching and retrieval using a

database A shape based technique introduced for the logo retrieval reparted in a paper is

also inadequate to solve the problem amicably

In this thesis, a novel trademark image retrieval method is proposed in which the input trademark image is first separated into dominant visual shape images then a feature vector for each shape image which is scale-, rotation-, and translation- invariant is created

Finally, a similanty measure between two trademark is calculated based on these feature vectors The manuscript entitled “Trademark Image Retrieval Based on Scale, Rotation,

‘franslation, Invariant Features” related the issue of my thesis was published inComputing

and Communication Technologies, Research, Innovation, and Vision for the 1'uture

(RIVE), 2013 LEME RIVF International Conference onl0-13 Nov 2013

9

Trang 11

My thesisis organized as follows: Chapter 1 is introduction of my thesis Chapter 2 represents some related works Chapter 3 illustrates background about some problems related Chapter 4 presents proposed method in detail Chapter 5 provides installation of Visual Studio 2010 with OpenCV2.4.2 on Window 7 for implementing my thesis anda presenlalion of experimertal results Chapler 6 is conclusion the thesis Additionally, in

‘Appendix, | show the whole of source code of my thesis for reading convenience

10

Trang 12

Chapter 2: Related work

In recent years, researchers have proposed a wide range of solutions in a bid to

alleviate the workload for the trademark registration offices Chen, Sun, Yang [1]

suggested two maim steps for computing Tealures vector Tnilially, object region extracted, from a principal orienlatiorerolaled image is equally partitioned info 16 regions Then an

entropy vector as feature vector of an image is constructed by computing information

entropy of each partitioned region This automatic shape-based retrieval technique

achieves the desired performance, good imvariance under rotalion, lrmslation, scale, noise,

degree of thicknes

, and human visual pereephon satisfaction However, this single-fealure retrieval sysiem docs not scem l mect multi-aspect of appreciation To improve this, among others, single-feature Zemike Moments (4M) in [4, 10] and invariant moments in [3, 5, 6] features of each are combined with other features Experiment results presented by [4] showed that this method has steady performance and good invariant properly under tanslalion, rotation and scale Moreover, and the low noise sensitivity of Zernike moments made this method more insensitive to noise However, because different users have different understanding for image similarity, the present methods of trademark image retrieval have some shortcomings in some aspects such as the retrieval ability to geometric deformation images, retrieval accuracy, the consistency between image and human visual perception Yet the retrieval by using Zomike moment method in [10|shows it can rapidlyretrieve trademarks, A new method is proposed in [3] based on cosine distance and normalized distance measures The cosine distance metric normalizes all features vector to unit length and makes it invariant against relative in-plane scaling transformation of the image content The normalized distance combines two distances measures such as cosine distance and Kuclidean distance which shows more aceuracy than one method alone ‘rhe proposed measures take into account the integration of global features (invariant moments and eccentricity) and local features (entropy histogram and distance histogram) It first indexes trademark images database (DB) in order to search for trademarks in narrowed limit and reduce time computation and thon calculates similaritics for features vector to obtain the total similarity between features vector.An altemative solution worth a mentioning is four shape features: global features (invariant moments and eccentricity) and local features (entropy histogram and distance histogram) (16] are exploited by [3]

lt

Trang 13

Recently, [5] combined nine feature vectors of low-order coler moments in HSV color

space with low-orders Llu moments and eccentricity which are extracted from gray shape-

region image by Rui and Huang’s (1998) technique The way of Gauss normalization is

apphed to thosc features, and the weight of every feature van be adjusted flexibly

[17] Good results have been obtaimed in the experiments which prove thal the mulG-feature

combination way is better than other single-feature ways [6] employed10 invariant moments as shape features of trademark images improved by [20] These features are input

toan ensemble of RBFNNs traimed vĩa ä rnintrnzatlon of the localized generalization error

to recognize the trademark images Tn this currord system, the Irademark images are black- and-white images The system will be improved (o adopl color trademark images as further study

In [2, 7], the ways of proposed combination of features are definitely different It is

admitted thal cach of them performs well The equidistamee based on concentric circles

[14] is used to partition region, labelled the first step in [4] anid [2] [4], and [2] differ in the

implementation of the second stop: [4] caloulated its feate vector F composed of comesponding region ZM, while [2] combined region feature vectors of 200 values with

contours features which are considered the comer-to-centroid triangulations detected by

the Hong & Jiang’s improved SUSAN algorithm [15] Twanaga, of at [7] pul forward the modified angle-distance pair-wise histogram based on the angle histogram and distance histogram of trademark object ‘This system outperforms both moment-based and independent histogram: 1e angle, distance, color Ixperiments conducted on registered

trademark databases Impressive results were shown to demonstrate the robustness of the

proposed approach Morcover, it is quite simplcto construct the distance—angle pair-wise histogram for a trademark object

4s the state-of-the-art method, [10] integrated 2M with SIFT’ features In this

approach, Zemike moments of the retrieved image were firstly extracted and sortedaccording to similarity Candidate images were formed Then, the SIFT features wore used for matching the query image accurately with candidate images This method not only keeps high precision- recall of SIF features and is superior thanthe method based

on the single “emike moments feature, but also improves effective retrieval speedcompared to the single SIT features This method can be well applied to the

12

Trang 14

trademark image retrievalsystem ‘this newly proposed approach enhances the retrieval performances ‘Tuan N.G., et al in [27] presented a new method based on the discriminative properties of trademark images for text recognition m trademark images The experimental resulls show the significant gain im toxl recognition accuracy of proposed mothod in comparing with iraditional ext recognition methods This contribution

seems to deal with a part of recognition of trademark image

However, those approaches seem to ignore not only partial trademark comparison, but also mirrored trademark Furthermore, the approaches have only concentrated on either original wademark without removing noise elements of trademark or standard database which contains no noise Additionally, these approaches have laken the trademark images

as a compleled object to process and do nol concer wilh the detailed visual shape in the trademark images ‘therefore, they cannot detect the paitial similarity between trademark images.Nonetheless, calculating distance between two features also plays an extremely important part in measuring the similarity degrees among images For this reason, the mentioned solutions cach endeavours to propose an appropriale mcasure Lo some extcril

To overcome the above-mentioned drawbacks, an novel content-based trademark

recognition method is proposed with these four maiz slages: (i) presprocess or scale down

the trademark images and converts them into binary image; (ii) extract dominant shape objects from the binary images; (iii) apply RI3RC algorithm to extract rotation-invariant, scale-invariant, translation-invariant features from the shape objects; and (iv) use Euclidian distance to measure suntlarity of two images and then retrieve 10 trademark images which are the most similar lo query trademark images The thesis focuses on handling

Vietnamese composite-mark database

13

Trang 15

Chapter 3: Background

3.1 Pre-processing

Converting pray scale to binary image

Tn [31], segmenlation involves scparaling an image into regions (or Lhcir contours) corresponding lo objects We usually iry to segmenl regions by iđenliiVing connnon properties Or, similarly, we identify contours by identifying differencesbetween regions The simplest property that pixels in a region can share is intensity So, a natural way to

segmenl such regions is through thresholding, the separation of light, and dark regions

Thresholding creates binary images from grey-level ones by Lurnings all pixels below seine threshold to zero and all pixels aboul that threshold lo one If g(x,y) is a taresholded

version of f(x, y) al some global hreshold 7,

0, otherwise

914 pcan eT )

Problems with thresholding

The major problem with threshokbng is thal we consider only the irlensily, not any relationships between the pixels.here is no guarantee that the pixels identified by the thresholding process are contiguous.We can easily inckide extraneous pixels that aren't

part of the desired region, and we can just as easily miss isolatedpixels within the region

normal intensity in the region When we use thresholding, we typically have to play with

it, sometimes losing too much of the region and sometimesgetting too many extraneous background pixels (Shadows of objects in the image are also a real pain - notjust where they fall across another object bul where they mistakenly get included as part of a dark object on a lightbackground.)

Trang 16

We can deal, at least in part, with such uneven ilhunination by determining thresholds locally, That is, instead ofhaving a single global threshold, we allow the

threshold itself to smoothly vary across the image

Automated methods for finding thresholds

To scl a global threshold or to atlapt 4 local threshold Lo an area, we usually look al the histogram to see if we can find two or more distinct modes - one for the foreground and one for the background Recall that a histogram is a probability distribution:

Simply sot the threshold Tsuch thal c(T) = 1/p (Or, if we're looking for a dark abject ona light background,c(T) = (1—1/p)

Finding peaks and valleys

One extremely simple way to find a suitable threshold is to find each of the modes (local maxima) and then find thevalley (minimum) between them.While this method

appea

imple, there are two main problems with ithe histogram may be noisy, thus causing many local minima and maxima To got around this, the histogram isusually smoothed before trying to find separate modes, the sum of two separate distributions, each with their own mode, may not produce a distribution with twodistinct modes

Clustering (K-Means Variation)

Auother way to look ai the problem is that we have lwo groups of pixcls, one with

one range of values and one withanother Whal makes (thresholding difficull is that the:

ranges usually overlap What we want to do is to minimizethe error of classifying a background pixel as a foreground one or vice versa To do this, we try to minimize the areaunder the histogram for one repion that lies on the other region's side of the threshold

15

Trang 17

‘The problem is that we don’thave the histograms for each region, only the histogram for the combined regions Understand that the place of minimum overlap (the place where the misclassified areas of the distributions areequal) is nor is not necessarily where the valley ocours in the combined histogram This occurs, for ceample, wherone cluster has a wide distribution and the other a narrow ane One way tha we can ty to do this is to consider the values in the two regions as two clusters.In other words, let 2B (T) be the mean of all pixels fess than the threshold and “@(1') be the mean of all pixelsgreater than the threshold We want to find a threshold such that the following holds:

to how closetheir intensities are to uB(F) and ZO(T) respectively) Now, update the estimates of uB(T') and 40 (1) respectivelyby actually calculating the means of the pixels

on each side of the threshold This process repeats until the algorithmconverges This method works well if the spreads of the distributions are approximately equal, but it does not handle well thecase where the distribulions have differing variauces

Clustering (The Otsu Method)

Another way of accomplishing sunilar results is to set the threshold so as to try to make each cluster as tight as possible,thus minimizing their overlap Obviously, we can’t

change the distributions, bul we can adjust where weseparale them (the threshold) As we

adjust the threshold one way, we increase the spread of one and decrease thespread of The other ‘The goal then is to select the threshold that minimizes the combined spread, We can

define the within-class variance as the weighted sum of the variances of each cluster:

Gein T) = My (PORT) + 7,(F 08 (7) (6

of (7)- the variance of the pixels in the background (below threshold)

o2(T) = the variance of the pixels in theforeground (above threshold)

16

Trang 18

and [0,4 — 1] is the range of intensity levels

Computing this within-class variance for each of the two classes for each possible threshold involves a lot ofcomputation, but there’s an easier way.lf we subtract the within-class variance from the total variance of the combined distribution, you get somethingealled the between-class variance:

FBeeween (T) = 9? — O Signin PY @

= RgÑ†)[#g(T) — H]? + nạ(1)[maŒ — uP a0) whereg"is the combined variance and jis the combined mean Notice that the between- class variance is simply the weighted variance of the cluster means themselves around the overall mean Substituting # = ng (T)4p(T) + no(T)uo(T) and simplifying, we get

OBecween (T3) = nạ(†)na(f)[up(3 ~ Ho TP ay

So, for each polential threshold 7 we

= Separate the pixels into two clusters according to the threshold,

= Find the mean of each cluster

= Square the differonce between the means

* Multiply by the mambor of pixels itt one cluster times (he number in the other

This depends only on the difference between the means of the two clusters, thus avoiding having to calculate differences between individual intensifies and the clustermeans ‘The optimal threshold is the one that maximizes the between-class variance

(or, conversely, minimizes the within-class variance)

‘This still sounds like a lot of work, since we have to do this for each possible

threshold, but it tums out that thecomputations aren’t independent as we change from one threshold to another We can update ng(T), no(T), and the respective cluster means Hg(T) and jip(T) as pixels move from one cluster to the other as T increases Using simple recurrence relations we can update the between-class variance as we successively test each Ihreshold:

— #aŒ)mg nT 4

Trang 19

Ho no O)-nrT Mạ(ŒT + 1) = nob

‘This method is called the Otsw method

Whereas the Otsu method separaled the two clusters according to the threshold and tned

to optimize some statistical measure, mixture modeling assumes that there already exists

two distributions and we must find them Once we know the parameters of the

distributions, it's easy to determine the best threshold Unfortmately, we have six unknown parameters (1g, 7g, Mg, Mg, 7p, and op), so we need to make some estimates of

these quantities Ti the two distributions are teasonably well separated (some averlap but

not too much), we can choose an arbitrary threshold 7 and assume that the mean and

standard deviation of each group approximates the mean and standard deviationof thetwo underlying populations We can then measure how well a mix of the two distributions

approximates the overall distribution:

F = SY "[Apoaet(@) — Rimage too] an

Choosing the optimal threshold thus becomes a matter of finding the one that causes the mixture of the twoestimated Gaussian distributions to best approximate the actual

histogram (minimizes F) Unfortunately, the solution space is too large to search

exhaustively, so most methods use some form of gradient descent method Such gradient descent methods depend heavily on the accuracy of the initial estimate, but the Otsu method or similar clustering methods can usually provide reasonable initial estimates Mixture modeling also extends to models with more than two underlying

distributions (more than two types ofregions)

Multispectral thresholding

A technique for segmenting images with multiple components (color images,Landsat images, or MRI images with T1, T2, and proton-density bands) warks by

18

Trang 20

estimating the optimal thresholdin one channel and then segmenting, the overall image based on that threshold iach of theseregions is then subdivided independently using properties of the second channel Tt is repeated again for the third channel, and so onruming through all charmels repeatedly until each region in the image exhibils a disiribulion indicative of a coherentregion (a single mode)

Thresholding along boundaries

If we want our thresholding method to give stay fairly true to the boundaries of the object, we can first apply someboundary-finding method (such as edge detection techniques) and (hen sample the pixels only where the boundary probability is high Thus, our threshold mcthod based on pixels near boundarivs will cause separalions of the pixels

in ways that tendto preserve the boundaries Other scattered distributions within the object

or the background are of no relevance [lowever, if the characteristics change along the boundary, we're still in trouble And, of course, there’s still noguarantee thal we" nat have extraneous pixels or holes

3.2, Objectdescription

In [33], Objects are represented as a collection of pixels in an image Thus, for purposes of recognition weneed to describe the properties of proups of pixels The description is often just a sel of numbers: the object's deseriplors From these, we can compare and recognize objects by simply matching the descriptors of objects in an image against the descriptors of known objects However, to be useful for recognition, descriptors should have four important properties First, they should define a complete set That is, two objects must have the same descriptors if and only if they have the same shape Sccoudly, they should be congruent, As such, we should be able to recognizesimilar objects when they have similar descriptors Thirdly, it is convenient that they haveinvariant properties For example, rotation-invariant descriptors will be useful for recognizing objects whatever their orientation Other important invariance properties include scale and position and also invariance to affine and perspective changes These last two properties are very important when recognizing objects observed from different viewpoints 1n addition to these three properties, the descriptors should be a compact set Namely, a descriptor should represent the essence of an object in an efficient way hat is,

it should only contain information about what makes an object unique, or different from

19

Trang 21

the other objects ‘The quantity of information used to desoribe this characterization should be less than the information necessary to have a complete description of the object

itself Unfortunately, there is na set of complete and compact descriptors to characterize

general objecis Thus, the best reengnilion performance is obtained by carefully selected

properlies As such, the process of recognition is strongly related io cach particular

application with a particular type of objects Liere, the characterization of objects is presented by two forms of descriptors Region and shape desoriptors characterize an

arrangement of pixels within the area and the arrangement of pixels in the perimeter or boundary, respectively This region versus perimeler kind of representation is conmnon in

image analysis For example, edges can be located by region growing (lo label arca) or by differentiation (to label perimeter) ‘there are many techniques that can be used to obtain descriptors of an object's boundary

3.3 Feature vectors extraction

3.3.1 Discrete Fourier Transform (DFT)

In [34|, the Fourier Transform will decompose an image into its smus and cosines

components In other words, it will transform an image from its spatial domain to its frequency domain The idea is that any function may be approximated exactly with the

sum of infinite sinus and casimes functions The Fourier Transform is a way low to do

this Mathematically a two dimensional images Fourier transform is

Fặ,Ð =2 100e nữ) ag)

Here f is the image value in its spatial domain and I’ in its frequency domain The result of the transformation is complex numbers Displaying this is possible either via a

real image and a complex image or via a magnitude and a phase image However,

throughout the image processing algorithms only the magnitude image is interesting as this contains all the information we need about the images geometric structure

Nevertheless, if we intend to make some modifications of the image in these forms and

then we need to retransform it we will need to preserve both of these.

Trang 22

3.3.2 Log-polar transform

In [32], for two-dimensional images, the iog-polar transform [Schwartz80] is a

change from Cartesian to polar coordinales:(x,y) 0 re’, where r = fx? Fy?

andexp(i?) = expifi arctanifly/x)) To separate out the polar coordinates into a (p,@)

space that is relative to some center point(x,,y,), we take the log so that p=

loge Ge — x," + (y — y,)’anda = arclan tty — y.)/( — x,)) For image purposes -

when we need to “fit” the interesting stuff into the available image memory - we typically apply a scaling factor m to p Fig 2 shows a square object on the left and its

encoding in log-polar space

‘Hig 2 ‘Ihe Jog-polar transform maps (x,y) into (log(r), 8)

The log-polar transform can be used to create two-dimensional invariant

TepreserdaHons oÏ object views by shifting he transformed image’s cerler af mass to a fixed point in the log-polar plane; sco Fig.3 On the lefl are three shapes (hal we want lo recognize as “square” ‘I'he problem is, they look very different One is much larger than the others and another is rotated ‘Lhe log-polar transform appears on the right in igure 3 Observe that size differences in the (x, y) plane are converted Lo shifls along the logt>) axis of the log-polar plane and thal the rotation differences are converted to shifts along

the @-axis in [he log-polar plane Tf we take the ans{ormed center of cach transformed

square in the log-polar plane and then recenterthat point to a certain fixed position, then all the squares will show up identically in the log-polar plane This yields a type of invariance lo two-dimensional rotation and scaling

21

Trang 23

Log-Polar

Fig 3, Log-polar transform of rotated and scaled squares: size goes to a shift on the log(r) axis aud rotation

toa shift on the © — axis

in the database is computed and compared to report a few of most similar images

‘A similarity measurement must be selected to decide how close a vector is to another vector The problem can be converted to computing the discrepancy between two

vectors x,y € K* Several distance measurements are presented as follows

t4G@,y) = llx-ylh = Shaly — #;| @ly

Another distance measurement called the supremenorm, is computed by

T2(x,¥) = max «|3, — 3,| (22)

2

Trang 24

3.42 Mahalanobis distance

‘The Mahalanobis distance between two vectors xand ywith respect to the training patterns {x;}is computedby

82( x,y) = Gr — yES TO — y), (23)

where the mean vector uand the sample covariance matrix Sfrom the sample{x;|1 < í <

nof size nare computed by S=In/=Inxi—wxr—we with

49

3.43 Chord đistance

The chord distance belween lwo vectors x and y is to measure the dislance between

the projected vectors of x and y onto the unit sphere, which can be computedby

8;Gœ,y) = |š—3|| wvheer = llrllz s = lIylls

A simple computaHon leads to 8ạ(>,y) = 2sinlfu/2) with œ begin the angle between vectors xand y

‘A similar measurement based on the angle between vectors x and y is defined as

Ta(,y) = 1 — |cos1f#)|, cos(a) = IIRTR 3%

Trang 25

Chapter 4: Proposed method

4.1 Pre-processing

Tn this initial stage, images are scaled down with the smaller side of 300 pixels and

converted into gray scales The images are then converted into binary trademark using

Olsu’s method [11] which minaaizes the weighted withi-class variance or maximizes the

between-class variance Otsu’s algorithm is also one of the five automated methods for

finding threshold: finding peaks and valleys, clustering (K-Means Variation), mixture

modeling, and multispectral thresholding

4.2, Visual shape objects extraction

In this stage, shape objects in fonn of commecicd contours are retrieved from the binary image using Suzuki's algorithm in [12] liach of those detected contours is stored as

a vector of points Alt of the vectors are crpanized as a struchwe of hierarchy, which contains information about the image lopology The number of contours is manipulated by

the texture of image

There are four oplions proc

sed in [12] for contour retrieval: (i) retrieve only the extreme ouler contours, (ii) retrieve all of the contours without establishing any hierarchical relationships, (iii) retrieve all of the contours organized into a two-level hierarchy; and (iv) retrieve all of the contours and reconstruct a full hierarchy of nested

contours

In the present research, we adopted the second option for shape object extraction From a binary (rademark image, we extracted a number of shape object images However,

duc to noise presence of input trademark image, many noise contours were extracted as

shape objects ‘I'o prevent this problem, we applied a filter so that the noise contours were removed Observations showed that the dominant shape contours usually have a much larger area in comparison with the nowe contours Furthermore, due to the characteristics

of fadersarks in our databasc, mosL adernarks consist of one or two dominant shape

objects which play a primary role in a company’s reputation For this reason, we propose

an algorithm to extract up to two dominant shape objects out of a binary image ‘The algorithm composes 4 main steps and one function named FilterContours (see Fig 4) responsible for taking out twa dominant shape objects FilterContours operation was

24

Trang 26

tightly based on 2 thresholds: T'] = 3.82 and ‘T2 = 81707 which were discovered via our

experiment on each wademark image in our database

1 Compute each contour area using the Green formulapresented in [29]

2, Sorl these extracted contours according lo descending order of contours area

3 Remove noise shape contours im trademark image, just keep lwo 2nd and 3rd

biggest area contours

4 Remove one of the kept contours by FilterContours function

4.1 if (The area of 2nd contour is TI times bigger than that of 3rd contour and the

area of 2nd is less than T2) then 3rd contour is removed and 2nd contour is remained

4.2 else if (the area of 2nd is greater than T2) then 2nd contour is deleted and 3rd

contour is kept 4.3 else then both contours are maintained

return (One or Two Contours)

Fig 4 Contour filter algorithm

4.3 Scale, rotation, translation invariant features

For each extracted shape object, a corresponding feature vector is created In order to generalize the action of trademark copy such as duplication, rotation, and resizing, the

extracted feature vector of the shape object must be invariant in terms of rotation, scale, and translation In this paper, we use RBRC algorithm in [13] which is composed of three

sieps: a two-dimensional Fouricr transform (DFT), the magnitude of the Fourier yopresented inlo polar coordinates, and DFT RBRCalgorithm makes our molhod scale-

mivariant rotation-invariant, and translation-invariant This 1s in line with Fourier-Mellin

approach and polar Fourier representation [19, 26] the approaches suggested in [19, 26]

combine the phase correlation technique with the polar representation to address the

25

Trang 27

problem of both translated and rotated objects However, a major difference between RBRC algorithm and Jourier-Mellin is that DIT used in last step in the former, while

phase correlation applied in the latter To explain why RBRC algorithm is invariant in

terms of rotation, translation and scale, theory about log-polar transform (LPT) transform related to magnitude of DFT is presented in [19,26] We present Log-polar transform

(LPT) and Representation of magnitude of DFT into log-polar coordirale for reading

where(x,,y,) is the center pixel of the transformation in the Cartesian coordinates

(x, y) denotes the sampling pixel in the Cartesian cpordinates and (p, 9) denotes the log-

radius and (he angular position in the log-polar coordinates Given g(x',y') a scaled and

rotated image of {(x,y) with scale rotation parameters a and a degrees, we have:

a stan (F) = tan (Tein) =O + 30)

Expression (5), (7) proves that scaling and rotation int Carlesiani domain corresponds

to pure translation in log-polar damain However, when the original image is translated by

(%o, Yq), the corresponding log-polar coordinales ia represented by

Of = tant 2222, p' = log (= mY + — Ho? G10)" GD

26

Trang 28

‘Those equations, as well as in [22] indicated that, in case of the slight translation, it produces only a modified log-polar image ‘I'o overcome this limitation, the algorithm first

applies Fourier transform and then applies the Log-Polar Transform to the magnitude

specttum The magnitudes of the Fourier transform of thesc images and its translaled

counterpart are same i.¢.5 invariant to translation but relain the effects af rotation and scaling [19, 21, 22, 23, 24, 25, 26] Those comments are explained more clearly in

representation of magnitude of DFT into log-polar coordinate

Acvording to [19, 21, 23, 24, 25, 26], as well as based on the rotation and tanslation properties of Fourier transform that the power spectrum of the rotated and translated image will also be rotated by the same orientation while the magnitude remains the same Let F,(E,n)and F, (E,n)be the Fourier transforms of images f; (x, yJand f,(x, y), respectively

We are interested in three below cases:

‘the first case: Lf f,differs from f, only by a displacement xạ, Yo), then

or in frequency domain

B(un) = ¿ 1796568) 5 (Em) @3)

In second case: If f,(x,y) is a translated and rotated replica of f,(x,y) with translation (x, vo) and rotation đọ, then

?Œ,y) = fie cos Oo + y sin Oy — Xp, -x.sin@, + y.cos8g -—yo) — (34)

DFTs off, and fy are related as shown below:

Rạ(š,n) = e TẦn0%e 1®) x FL(.cos ổạ + n.sữn 8ạ, —Ệ sỉn đọ + ncos Bọ) (35)

H is supposed that Mƒ; and M; are the magnitudes of /;and /; , respectively They

have relation as shown:

Mạ(£,n) = Mì(& cos 0a + tị siit Op, —E sin Oy + 4 COS Oy) (36)

Mì(,8) = M;(p,8 — 8e) By

Wherep and 0 are the radius and angle in the polar co-ordinate system, respectively

tà 3

Trang 29

In the last case:If fy is a translated, rotated and scaled version of fz, the Fourier magnitude spectra are transformed to log-polar representations and related by:

Mz (log p, 0) = My (log p — log s,6 — 8) (39)

It is obviously that scale, rotation, translation of image is represented as translation in

frequency domain in Log-Polar coordinate This translation is eliminated by applying DFT

one more time

An act of copying a trademark can be done by:

e One trademark is scaled, rotated, or translated from the other

e One trademark is combined from a part of the other

¢ One trademark is mirrored from the other

In order to recognize the copied trademark image, we derive a trademark similarity

measure based on its feature vectors After stage of creating feature vectors, a trademark

input image is represented by one or two feature vector Let J and /' the two trademark

image, we suppose that F; and Fwhere i = 1,2; j = 1,2 are feature vectors of Jand I,

respectively.

Trang 30

We propose that degree of similarity of two trademarks 7J'signed S(LF') is

primarily the smallest distance between two feature vectors, one in set (F;)and one in set F) denoted bydist(F,, F by a F) We employed Buclidian distance 10 compute the distance of pkụ 1

two feature vectors, which can be expressed as follows:

#(,f) = mini

st(#, 4) )Jwhere i,j = 1,2 (6)

Trang 31

Chapter 5: Experiments and results

Trademark image database in my thesis was taken from NOIP [29] In [27], it was

indicated that there are about 30% trademark images containing text only, 70% trademark

images containing text and other object such as picture, symbol, 80% consecutive

characters in word have same color and 90% trademark images have number of color

smaller than 5 We collect 243 trademark images in the trademark image database which

are sort of composite-mark to construct our testing trademark image database Fig 6 shows

some trademark image samples of the collected testing trademark database

Fig 6 Samples of the collected trademark images for testing

To evaluate the performance accuracy of our method, we used S(J,1') to retrieve

top 5 most similar trademark image to an input query image In addition, we modify the

input image to mimic the acts of copying a trademark such as translation, rotation, scaling,

mirror and partial copy The following are illustrations for each case

5.1 Implementation

OpenCVin [34] is an open-source BSD-licensed library that includes several hundreds of computer vision algorithms OpenCV has also a modular structure, which means that the package includes several shared or static libraries The following modules are available:

dimensional array Mat and basic functions used by all other modules

30

Trang 32

« imgproc - an image processing module that includes linear and non-linear image Gillering, geometrical image lansformations (resize, affine and perspective warping,

generic table-based remapping), color space conversion, histograms, and so on

« video - a video aralysis module that includes motion eslimalion, background

subbaction, and object tracking algorillins

« calib3d - basic multiple-view geometry algorithms, single and stereo camera

calibration, objecl pose estimation, slereo correspondence algorithms, and elements

of 3D reconstruction

« features2d - salient feature detectors, descriptors, and descriptor matchers

« objdetect - detection of objects and instances of the predefined classes (far

example, faces, eyes, mugs, people, cars, and so on)

« highgui - an easy-to-use interface to video capturing, image and video codecs, as

well as simple Ul capabilities

« gpu - GPU-accelerated algorithms from different OpenC'V modules

It is illustration of installation procedure of OpenCV library and how it can be

integrated with Microsoft Visual Studio 2010

« Download OpeuCV2.4.2 at www.sourecforge.nel/projecls/opencvlibrary/’

& Trstallto folder CAOpenCV2.4.2

« Now this ( C\OpenCV2.1.2 ) folder contains source files for OpenCV These source files need to be built for specific development environment which in the case

is Microsoll Visual Sludio 2010 CMake utilily cart be used to generate build Giles for Microsoft Visual Studio 2010

& Download and Install CMake utility from (www.cmske.org/flesAl2.8/cmake- 2.8 2-win32-x86.oxe)

« Open CMake and select the source directory for OpenCV source files ie CAOpenCV2.0), Select the build dircelary, for instance CAOpenCV2.4.XBuild

« Once source and build directories are selected Press Configure button and specify

generalor Microsof\ Visual Studio 10 and Int finish

31

Tiêu đề	Trademark Image Retrieval Based On Scale Rotation Translation Invariant Features
Tác giả	Nguyen TienDung
Người hướng dẫn	Dr. Le Thanh Ha
Trường học	University of Engineering and Technology Vietnam National University, Hanoi
Chuyên ngành	Information Technology
Thể loại	Thesis
Năm xuất bản	2014
Thành phố	Hanoi

Định dạng
Số trang	64
Dung lượng	1,41 MB