Trích xuất ảnh trademark dựa trên các đặc trưng bất biến dịch chuyển, quay, tỷ lệ (Trademark Image Retrieval Based on Scale, Rotation, Translation Invariant Features) : M.A Thesis Information Technology : 60 48 01

0 UNIVERSITY OF ENGINEERING AND TECHNOLOGY VIETNAM NATIONAL UNIVERSITY, HANOI NGUYEN TIENDUNG TRADEMARK IMAGE RETRIEVAL BASED ON SCALE, ROTATION, TRANSLATION, INVARIANT FEATURES MA

Trang 1

0

UNIVERSITY OF ENGINEERING AND TECHNOLOGY

VIETNAM NATIONAL UNIVERSITY, HANOI

NGUYEN TIENDUNG

TRADEMARK IMAGE RETRIEVAL BASED ON

SCALE, ROTATION, TRANSLATION,

INVARIANT FEATURES

MASTER THESIS:INFORMATION TECHNOLOGY

Hanoi - 2014

Trang 2

1

UNIVERSITY OF ENGINEERING AND TECHNOLOGY

VIETNAM NATIONAL UNIVERSITY, HANOI

NGUYEN TIENDUNG

TRADEMARK IMAGE RETRIEVAL BASED

ON SCALE, ROTATION, TRANSLATION,

INVARIANT FEATURES

Major : Computer Science

Code : 60480101

MASTER THESIS: INFORMATION TECHNOLOGY

Supervised by: Dr Le Thanh Ha

Hanoi - 2014 Hanoi - 2010

Trang 3

2

Originality Statement

‘I hereby declare that this submission is my own work and to the best of my knowledge

it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at University of Engineering and Technology (UET) or any other educational institution, except where due acknowledgement is made in the thesis I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project’s design and conception or in style, presentation and linguistic expression is acknowledged.’

Signed

Trang 4

3

TABLE OF CONTENS

Originality Statement 2

ABBREVIATION 6

Abstract 7

Chapter 1: Introduction 8

Chapter 2: Related work 11

Chapter 3: Background 14

3.1 Pre-processing 14

3.2 Object description 19

3.3 Feature vectors extraction 20

3.3.1 Discrete Fourier Transform (DFT) 20

3.3.2 Log-polar transform 21

3.4 Measure of similarity 22

3.4.1 Euclidean distance 22

3.4.2 Mahalanobis distance 23

3.4.3 Chord distance 23

Chapter 4: Proposed method 24

4.1 Pre-processing 24

4.2 Visual shape objects extraction 24

4.3 Scale, rotation, translation invariant features 25

4.4 Measure of similarity 28

Chapter 5: Experiments and results 30

5.1 Implementation 30

5.2 Test results for exact copy actions 32

Trang 5

4

5.3 Test results for scaling action 33

5.4 Test results for rotating actions 34

5.5 Test results for mirror actions 35

5.6 Test results for partial copy actions 36

5.7 Test results for random query trademark 38

5.8 Testing summary 38

Chapter 6: Conclusion 40

REFERENCES 41

APPENDIX 45

Pre-processing 45

Visual shape objects extraction 45

Scale, rotation, translation invariant features extraction 47

Matching by measure of similarity and retrieval Trademark Images 49

Trang 6

5

List of Figure

Fig 1 Some trademark image samples 8

Fig 2 The log-polar transform maps (𝐱, 𝐲) into (𝐥𝐨𝐠(𝐫), 𝛉) 21

Fig 3 Log-polar transform of rotated and scaled squares: size goes to a shift on the 𝐥𝐨𝐠(𝐫) axis and rotation to a shift on the 𝛉 − 𝐚𝐱𝐢𝐬 22

Fig 4 Contour filter algorithm 25

Fig 5 Illustration of three stages of the proposed method 28

Fig 6 Samples of the collected trademark images for testing 30

Fig 7 Results for exact copy tests 32

Fig 8 Result for scaling tests 33

Fig 9 Results for translation and scaling tests 34

Fig 10 Results for rotation tests 35

Fig 11 Results for mirror tests 36

Fig 12 Results for parital copy tests 37

Fig 13 Results for random tests 38

Trang 7

6

ABBREVIATION

DFT: Discrete Fourier Transform

CBIR: Content Based Image Retrieval SIFT: Scale-invariant feature transform

Trang 8

7

Abstract

Trademark registration offices or authorities have been bombarded with requests from enterprises These authorities face a great deal of difficulty in protecting enterprises’ rights such as copyright, license, or uniqueness of logo or trademark since they have only conventional clustering Urgent and essential need for sufficient automatic trademark image retrieval system, therefore, is entirely worth thorough research In this thesis, a novel trademark image retrieval method is proposed in which the input trademark image

is first separated into dominant visual shape images then a feature vector for each shape image which is scale-, rotation-, and translation- invariant is created Finally, a similarity measure between two trademark is calculated based on these feature vectors Given a query trademark image, retrieval procedure is carried out by taking the most five similar trademark images in a predefined trademark Various experiments are conducted to mimic the many types of trademark copying actions and the experimental results exhibit the robustness of our retrieval method under these trademark copy actions

Trang 9

8

Chapter 1: Introduction

From an economic perspective, a trademark is clearly understood as a word, a design,

a picture, a complex symbol or even a combination of such, which is put on a product or standing for service of particular company In [2], four types of popular trademarks are listed in order of visual complexity: word-in-mark (only characters or words in the mark), device-mark (graphical or figurative elements), composite-mark (characters or words and graphical elements) and complex-mark (complex image) Fig 1 offers some trademark samples

Fig 1 Some trademark image samples

Every Company or Financial organization desires to own a distinctive, meaningful, and descriptive logo which offers both exclusive and right of its characteristic Drawing attention of consumers to their products or services and market viability dependsactually

on not only designing an intellectual and attractive trademark, but also whether or not preventing consumer confusion

The world markets have remarkably expandedand grown for global economic scenario caused by different trade related practices coming closer to each other at international level A great number of businesses have been established This has resulted

in millions of trademarks submitted to various trademark offices world over for registration need to have distinctiveness from the existing trademarks as per definitions and trade practices in different countries and this is likely to be on an increase in years to come.Actually, millions of trademarks already registered and millions of applications filed for trademarks registration are aggravating the problem of issuing the trademark certificates Therefore, the trademark registration authorities have received many trademark protection applications from enterprises.The problem for finding the similar trademark has become a challenge because These authorities face challenges in dealing with these proposals since they still use the traditional activity of classification (i.e., manual way) It is obvious that trademark registration with manual searching is very arduous task for the officials.It is really hard for them to make sure if a trademark is

Trang 10

9

duplicated: whether a particular trademark is registered or not; if a trademark resembles another registered trademark in any way, or, if copyright or license of trademark is infringed Thus, this poses an urgent need for an alternative automatic technology

In [33], there are different techniques and approaches currently in use for distinctness check for trademarks.The most popular and appreciated image processing techniques and approaches for the trademark distinctness check are Content Based Image retrieval techniques, which widely used for that purpose and some other approaches like shape and texture based similarity finding techniques are also used.Image processing tools and techniques can be used to solve different problems related to image, text, graphics and color etc A Trademark can be a combination of text, graphics, image, and colored texture Based on these, one can divide them in these components for finding the similarity among different trademarks retrieved from the trademark database Most of the recent techniques used for the image retrieval have mainly utilized the features like color, texture, shape etc They used existing CBIR technique, i.e Content Based Image Retrieval Systems to retrieve the images based on visual features like texture, color, shape In this technique extraction of color feature using the color histogram technique is utilized It also considers the shape feature because it is an important feature in CBIR applications Many techniques

or approaches have been utilized for the image retrieval, some of which are based on improved pattern matching algorithms Some others take a much broader approach like searching just from the text files Some are based on shape and color feature and some have attempted morphological pattern based image matching and retrieval using a database A shape based technique introduced for the logo retrieval reported in a paper is also inadequate to solve the problem amicably

In this thesis, a novel trademark image retrieval method is proposed in which the input trademark image is first separated into dominant visual shape images then a feature vector for each shape image which is scale-, rotation-, and translation- invariant is created Finally, a similarity measure between two trademark is calculated based on these feature vectors The manuscript entitled “Trademark Image Retrieval Based on Scale, Rotation, Translation, Invariant Features” related the issue of my thesis was published inComputing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), 2013 IEEE RIVF International Conference on10-13 Nov 2013

Trang 11

10

My thesisis organized as follows: Chapter 1 is introduction of my thesis Chapter 2 represents some related works Chapter 3 illustrates background about some problems related Chapter 4 presents proposed method in detail Chapter 5 provides installation of Visual Studio 2010 with OpenCV2.4.2 on Window 7 for implementing my thesis anda presentation of experimental results Chapter 6 is conclusionof the thesis Additionally, in Appendix, I show the whole of source code of my thesis for reading convenience

Trang 12

11

Chapter 2: Related work

In recent years, researchers have proposed a wide range of solutions in a bid to alleviate the workload for the trademark registration offices Chen, Sun, Yang [1], suggested two main steps for computing features vector Initially, object region extracted from a principal orientation-rotated image is equally partitioned into 16 regions Then an entropy vector as feature vector of an image is constructed by computing information entropy of each partitioned region This automatic shape-based retrieval technique achieves the desired performance, good invariance under rotation, translation, scale, noise, degree of thickness, and human visual perception satisfaction However, this single-feature retrieval system does not seem to meet multi-aspect of appreciation To improve this, among others, single-feature Zernike Moments (ZM) in [4, 10] and invariant moments in [3, 5, 6] features of each are combined with other features.Experiment results presented by [4] showed that this method has steady performance and good invariant property under translation, rotation and scale Moreover, and the low noise sensitivity of Zernike moments made this method more insensitive to noise However, because different users have different understanding for image similarity, the present methods of trademark image retrieval have some shortcomings in some aspects such as the retrieval ability to geometric deformation images, retrieval accuracy, the consistency between image and human visual perception Yet the retrieval by using Zernike moment method in [10]shows it can rapidlyretrieve trademarks A new method is proposed in [3] based on cosine distance and normalized distance measures The cosine distance metric normalizes all features vector to unit length and makes it invariant against relative in-plane scaling transformation of the image content The normalized distance combines two distances measures such as cosine distance and Euclidean distance which shows more accuracy than one method alone The proposed measures take into account the integration of global features (invariant moments and eccentricity) and local features (entropy histogram and distance histogram) It first indexes trademark images database (DB) in order to search for trademarks in narrowed limit and reduce time computation and then calculates similarities for features vector to obtain the total similarity between features vector.An alternative solution worth a mentioning is four shape features: global features (invariant moments and eccentricity) and local features (entropy histogram and distance histogram) [16] are exploited by [3]

Trang 13

12

Recently, [5] combined nine feature vectors of low-order color moments in HSV color space with low-orders Hu moments and eccentricity which are extracted from gray shape-region image by Rui and Huang’s (1998) technique The way of Gauss normalization is applied to those features, and the weight of every feature can be adjusted flexibly [17].Good results have been obtained in the experiments which prove that the multi-feature combination way is better than other single-feature ways [6] employed10 invariant moments as shape features of trademark images improved by [20] These features are input

to an ensemble of RBFNNs trained via a minimization of the localized generalization error

to recognize the trademark images In this current system, the trademark images are and-white images The system will be improved to adopt color trademark images as further study

black-In [2, 7], the ways of proposed combination of features are definitely different It is admitted that each of them performs well The equidistance based on concentric circles [14] is used to partition region, labelled the first step in [4] and [2] [4], and [2] differ in the implementation of the second step: [4] calculated its feature vector F composed of corresponding region ZM, while [2] combined region feature vectors of 200 values with contours features which are considered the corner-to-centroid triangulations detected by the Hong & Jiang’s improved SUSAN algorithm [15] Iwanaga, et al [7] put forward the modified angle-distance pair-wise histogram based on the angle histogram and distance histogram of trademark object This system outperforms both moment-based and independent histogram; i.e angle, distance, color Experiments conducted on registered trademark databases Impressive results were shown to demonstrate the robustness of the proposed approach Moreover, it is quite simpleto construct the distance–angle pair-wise histogram for a trademark object

As the state-of-the-art method, [10] integrated ZM with SIFT features In this approach, Zernike moments of the retrieved image were firstly extracted and sortedaccording to similarity Candidate images were formed Then, the SIFT features were used for matching the query image accurately with candidate images This method not only keeps high precision- recall of SIFT features and is superior thanthe method based

on the single Zernike moments feature, but also improves effective retrieval speedcompared to the single SIFT features This method can be well applied to the

Trang 14

13

trademark image retrievalsystem This newly proposed approach enhances the retrieval performances Tuan N.G., et al in [27] presented a new method based on the discriminative properties of trademark images for text recognition in trademark images The experimental results show the significant gain in text recognition accuracy of proposed method in comparing with traditional text recognition methods This contribution seems to deal with a part of recognition of trademark image

However, those approaches seem to ignore not only partial trademark comparison, but also mirrored trademark Furthermore, the approaches have only concentrated on either original trademark without removing noise elements of trademark or standard database which contains no noise Additionally, these approaches have taken the trademark images

as a completed object to process and do not concern with the detailed visual shape in the trademark images Therefore, they cannot detect the partial similarity between trademark images.Nonetheless, calculating distance between two features also plays an extremely important part in measuring the similarity degrees among images For this reason, the mentioned solutions each endeavours to propose an appropriate measure to some extent

To overcome the above-mentioned drawbacks, an novel content-based trademark recognition method is proposed with these four main stages: (i) pre-process or scale down the trademark images and converts them into binary image; (ii) extract dominant shape objects from the binary images; (iii) apply RBRC algorithm to extract rotation-invariant, scale-invariant, translation-invariant features from the shape objects; and (iv) use Euclidian distance to measure similarity of two images and then retrieve 10 trademark images which are the most similar to query trademark images The thesis focuses on handling Vietnamese composite-mark database

Trang 15

14

Chapter 3: Background

3.1 Pre-processing

Converting gray scale to binary image

In [31], segmentation involves separating an image into regions (or their contours) corresponding to objects We usually try to segment regions by identifying common

properties Or, similarly, we identify contours by identifying differencesbetween regions

The simplest property that pixels in a region can share is intensity So, a natural way to

segment such regions is through thresholding, the separation of light and dark regions

Thresholding creates binary images from grey-level ones by turning all pixels below some threshold to zero and all pixels about that threshold to one If 𝑔(𝑥, 𝑦) is a thresholded version of 𝑓(𝑥, 𝑦) at some global threshold 𝑇,

𝑔 𝑥, 𝑦 = 0, 𝑜𝑡𝑕𝑒𝑟𝑤𝑖𝑠𝑒1, 𝑖𝑓 𝑓(𝑥, 𝑦) ≥ 𝑇 (1)

Problems with thresholding

The major problem with thresholding is that we consider only the intensity, not any relationships between the pixels.There is no guarantee that the pixels identified by the thresholding process are contiguous.We can easily include extraneous pixels that aren’t part of the desired region, and we can just as easily miss isolatedpixels within the region (especially near the boundaries of the region) These effects get worse as the noise gets worse,simply because it’s more likely that the pixels intensity doesn’t represent the normal intensity in the region.When we use thresholding, we typically have to play with

it, sometimes losing too much of the region and sometimesgetting too many extraneous background pixels (Shadows of objects in the image are also a real pain - notjust where they fall across another object but where they mistakenly get included as part of a dark object on a lightbackground.)

Trang 16

15

We can deal, at least in part, with such uneven illumination by determining thresholds locally That is, instead ofhaving a single global threshold, we allow the threshold itself to smoothly vary across the image

Automated methods for finding thresholds

To set a global threshold or to adapt a local threshold to an area, we usually look at the histogram to see if we can find two or more distinct modes - one for the foreground and one for the background Recall that a histogram is a probability distribution:

𝑐(𝑔) = 𝑝(𝑔)𝑔

Simply set the threshold 𝑇such that 𝑐(𝑇) = 1/𝑝 (Or, if we’re looking for a dark object

on a light background,𝑐(𝑇) = (1 − 1/𝑝)

Finding peaks and valleys

One extremely simple way to find a suitable threshold is to find each of the modes (local maxima) and then find thevalley (minimum) between them.While this method appears simple, there are two main problems with it:the histogram may be noisy, thus causing many local minima and maxima To get around this, the histogram isusually smoothed before trying to find separate modes; the sum of two separate distributions, each with their own mode, may not produce a distribution with twodistinct modes

Clustering (K-Means Variation)

Another way to look at the problem is that we have two groups of pixels, one with one range of values and one withanother What makes thresholding difficult is that these ranges usually overlap What we want to do is to minimizethe error of classifying a background pixel as a foreground one or vice versa To do this, we try to minimize the areaunder the histogram for one region that lies on the other region’s side of the threshold

Trang 17

16

The problem is that we don’thave the histograms for each region, only the histogram for the combined regions Understand that the place of minimum overlap (the place where the

misclassified areas of the distributions areequal) is not is not necessarily where the valley

occurs in the combined histogram This occurs, for example, whenone cluster has a wide distribution and the other a narrow one One way that we can try to do this is to consider

the values in the two regions as two clusters.In other words, let 𝜇𝐵(𝑇) be the mean of all

pixels less than the threshold and 𝜇𝑂(𝑇) be the mean of all pixelsgreater than the threshold We want to find a threshold such that the following holds:

∀𝑔 ≥ 𝑇 ∶ |𝑔 − 𝜇𝐵(𝑇)| > |𝑔 − 𝜇𝑂 𝑇 | (4)

and

∀𝑔 < 𝑇 ∶ |𝑔 − 𝜇𝐵(𝑇)| < |𝑔 − 𝜇𝑂(𝑇)| (5) The basic idea is to start by estimating𝜇𝐵(𝑇) as the average of the four corner pixels (assumed to be background) and 𝜇𝑂(𝑇) as the average of everythingelse Set the threshold to be halfway between 𝜇𝐵(𝑇) and 𝜇𝑂(𝑇) (thus separating the pixels according

to how closetheir intensities are to 𝜇𝐵(𝑇) and 𝜇𝑂(𝑇) respectively) Now, update the estimates of 𝜇𝐵(𝑇) and 𝜇𝑂(𝑇) respectivelyby actually calculating the means of the pixels

on each side of the threshold This process repeats until the algorithmconverges.This method works well if the spreads of the distributions are approximately equal, but it does

not handle well thecase where the distributions have differing variances

Clustering (The Otsu Method)

Another way of accomplishing similar results is to set the threshold so as to try to make each cluster as tight as possible,thus minimizing their overlap Obviously, we can’t change the distributions, but we can adjust where weseparate them (the threshold) As we adjust the threshold one way, we increase the spread of one and decrease thespread of the other The goal then is to select the threshold that minimizes the combined spread.We can

define the within-class variance as the weighted sum of the variances of each cluster:

𝜍𝑤𝑖𝑡 𝑕𝑖𝑛2 𝑇 = 𝑛𝐵 𝑇 𝜍𝐵2 𝑇 + 𝑛𝑜(𝑇)𝜍02(𝑇) (6)

𝜍𝐵2 𝑇 = the variance of the pixels in the background (below threshold)

𝜍02 𝑇 = the variance of the pixels in theforeground (above threshold)

Trang 18

17

and [0,N − 1] is the range of intensity levels

Computing this within-class variance for each of the two classes for each possible threshold involves a lot ofcomputation, but there’s an easier way.If we subtract the within-class variance from the total variance of the combined distribution, you get somethingcalled the between-class variance:

𝜍𝐵𝑒𝑡𝑤𝑒𝑒𝑛2 𝑇 = 𝜍2 − 𝜍𝑤𝑖𝑡 𝑕𝑖𝑛2 𝑇 (9)

= 𝑛𝐵 𝑇 𝜇𝐵 𝑇 − 𝜇 2 + 𝑛𝑂(𝑇) 𝜇𝑂 𝑇 − 𝜇 2 (10) where𝜍2is the combined variance and μ is the combined mean Notice that the between-

class variance is simply the weighted variance of the cluster means themselves around the overall mean Substituting 𝜇 = 𝑛𝐵 𝑇 𝜇𝐵(𝑇) + 𝑛𝑂 𝑇 𝜇𝑂(𝑇) and simplifying, we get

𝜍𝐵𝑒𝑡𝑤𝑒𝑒𝑛2 𝑇 = 𝑛𝐵 𝑇 𝑛𝑂 𝑇 𝜇𝐵 𝑇 − 𝜇𝑂 𝑇 2 (11)

So, for each potential threshold T we

 Separate the pixels into two clusters according to the threshold

 Find the mean of each cluster

 Square the difference between the means

 Multiply by the number of pixels in one cluster times the number in the other This depends only on the difference between the means of the two clusters, thus avoiding having to calculate differences between individual intensities and the clustermeans The optimal threshold is the one that maximizes the between-class variance (or, conversely, minimizes the within-class variance)

This still sounds like a lot of work, since we have to do this for each possible threshold, but it turns out that thecomputations aren’t independent as we change from one threshold to another We can update 𝑛𝐵(𝑇), 𝑛𝑂(𝑇), and the respective cluster means

𝜇𝐵(𝑇) and 𝜇𝑂(𝑇) as pixels move from one cluster to the other as T increases Using

simple recurrence relations we can update the between-class variance as we successively test each threshold:

𝜇𝐵 𝑇 + 1 = 𝜇𝐵 𝑇 𝑛𝐵 𝑇 +𝑛𝑇𝑇

Trang 19

to optimize some statistical measure, mixture modeling assumes that there already exists

two distributions and we must find them Once we know the parameters of the distributions, it’s easy to determine the best threshold Unfortunately, we have six unknown parameters (𝑛𝐵, 𝑛𝑂, 𝜇𝐵, 𝜇𝑂, 𝜍𝐵, 𝑎𝑛𝑑 𝜍𝑂), so we need to make some estimates of these quantities If the two distributions are reasonably well separated (some overlap but

not too much), we can choose an arbitrary threshold T and assume that the mean and

standard deviation of each group approximates the mean and standard deviationof thetwo underlying populations We can then measure how well a mix of the two distributions approximates the overall distribution:

𝐹 = 𝑁−1 𝑕𝑚𝑜𝑑 𝑒𝑙 𝑔 − 𝑕𝑖𝑚𝑎𝑔𝑒 𝑔 2

Choosing the optimal threshold thus becomes a matter of finding the one that causes the mixture of the twoestimated Gaussian distributions to best approximate the actual

histogram (minimizes F) Unfortunately, the solution space is too large to search

exhaustively, so most methods use some form of gradient descent method Such gradient descent methods depend heavily on the accuracy of the initial estimate, but the Otsu method or similar clustering methods can usually provide reasonable initial estimates.Mixture modeling also extends to models with more than two underlying distributions (more than two types ofregions)

Multispectral thresholding

A technique for segmenting images with multiple components (color images,Landsat images, or MRI images with T1, T2, and proton-density bands) works by

Trang 20

19

estimating the optimal thresholdin one channel and then segmenting the overall image based on that threshold Each of theseregions is then subdivided independently using properties of the second channel It is repeated again for the third channel, and so on,running through all channels repeatedly until each region in the image exhibits a distribution indicative of a coherentregion (a single mode)

Thresholding along boundaries

If we want our thresholding method to give stay fairly true to the boundaries of the object, we can first apply someboundary-finding method (such as edge detection techniques) and then sample the pixels only where the boundary probability is high.Thus, our threshold method based on pixels near boundaries will cause separations of the pixels

in ways that tendto preserve the boundaries Other scattered distributions within the object

or the background are of no relevance.However, if the characteristics change along the boundary, we’re still in trouble And, of course, there’s still noguarantee that we’ll not have extraneous pixels or holes

3.2 Objectdescription

In [33], Objects are represented as a collection of pixels in an image Thus, for purposes of recognition weneed to describe the properties of groups of pixels The

description is often just a set of numbers: the object’s descriptors From these, we can

compare and recognize objects by simply matching the descriptors of objects in an image against the descriptors of known objects However, to be useful for recognition,

descriptors should have four important properties First, they should define a complete set

That is, two objects must have the same descriptors if and only if they have the same

shape Secondly, they should be congruent As such, we should be able to recognizesimilar objects when they have similar descriptors Thirdly, it is convenient that they haveinvariant properties For example, rotation-invariant descriptors will be useful for recognizing objects whatever their orientation Other important invariance properties

include scale and position and also invariance to affine and perspective changes These last two properties are very important when recognizing objects observed from different

viewpoints In addition to these three properties, the descriptors should be a compact set

Namely, a descriptor should represent the essence of an object in an efficient way That is,

it should only contain information about what makes an object unique, or different from

Trang 21

20

the other objects The quantity of information used to describe this characterization should be less than the information necessary to have a complete description of the object itself Unfortunately, there is no set of complete and compact descriptors to characterize general objects Thus, the best recognition performance is obtained by carefully selected properties As such, the process of recognition is strongly related to each particular application with a particular type of objects Here, the characterization of objects is

presented by two forms of descriptors Region and shape descriptors characterize an arrangement of pixels within the area and the arrangement of pixels in the perimeter or boundary, respectively This region versus perimeter kind of representation is common in image analysis For example, edges can be located by region growing (to label area) or by differentiation (to label perimeter) There are many techniques that can be used to obtain

descriptors of an object’s boundary

3.3 Feature vectors extraction

3.3.1 Discrete Fourier Transform (DFT)

In [34], the Fourier Transform will decompose an image into its sinus and cosines components In other words, it will transform an image from its spatial domain to its frequency domain The idea is that any function may be approximated exactly with the sum of infinite sinus and cosines functions The Fourier Transform is a way how to do this Mathematically a two dimensional images Fourier transform is:

𝐹 𝑘, 𝑙 = 𝑓 𝑖, 𝑗 𝑒−𝑖2𝜋(𝑘𝑖𝑁 +𝑘𝑗

𝑁 ) 𝑁−1

Trang 22

21

3.3.2 Log-polar transform

In [32], for two-dimensional images, the log-polar transform [Schwartz80] is a change from Cartesian to polar coordinates:(𝑥, 𝑦) ↔ 𝑟𝑒𝑖𝜃, where 𝑟 = 𝑥2 + 𝑦2and𝑒𝑥𝑝 𝑖𝜃 = exp⁡(i arctan⁡(y/x)) To separate out the polar coordinates into a (𝜌, 𝜃) space that is relative to some center point (𝑥𝑐, 𝑦𝑐), we take the log so that 𝜌 =log⁡ 𝑥 − 𝑥𝑐 2 + 𝑦 − 𝑦𝑐 2and𝜃 = arctan⁡((𝑦 − 𝑦𝑐)/(𝑥 − 𝑥𝑐)) For image purposes - when we need to “fit” the interesting stuff into the available image memory - we typically apply a scaling factor m to ρ Fig 2 shows a square object on the left and its encoding in log-polar space

Fig 2 The log-polar transform maps (𝐱, 𝐲) into (𝐥𝐨𝐠(𝐫), 𝛉)

The log-polar transform can be used to create two-dimensional invariant representations of object views by shifting the transformed image’s center of mass to a fixed point in the log-polar plane; see Fig.3 On the left are three shapes that we want to recognize as “square” The problem is, they look very different One is much larger than the others and another is rotated The log-polar transform appears on the right in Figure 3

Observe that size differences in the (x, y) plane are converted to shifts along the log(r)

axis of the log-polar plane and that the rotation differences are converted to shift s along

the θ-axis in the log-polar plane If we take the transformed center of each transformed

square in the log-polar plane and then recenterthat point to a certain fixed position, then all the squares will show up identically in the log-polar plane This yields a type of invariance to two-dimensional rotation and scaling

Trang 23

in the database is computed and compared to report a few of most similar images

A similarity measurement must be selected to decide how close a vector is to another vector The problem can be converted to computing the discrepancy between two vectors 𝑥, 𝑦 ∈ 𝑅𝑑 Several distance measurements are presented as follows

𝜏2 𝑥, 𝑦 = max1≤𝑗 ≤𝑑 𝑥𝑗 − 𝑦𝑗 (22)

Trang 24

𝑛 of size n are computed by 𝑆=1𝑛𝑖=1𝑛𝑥𝑖−𝑢𝑥𝑖−𝑢𝑡 with

Trang 25

modeling, and multispectral thresholding

4.2 Visual shape objects extraction

In this stage, shape objects in form of connected contours are retrieved from the binary image using Suzuki's algorithm in [12] Each of those detected contours is stored as

a vector of points All of the vectors are organized as a structure of hierarchy, which contains information about the image topology The number of contours is manipulated by the texture of image

There are four options processed in [12] for contour retrieval: (i) retrieve only the extreme outer contours, (ii) retrieve all of the contours without establishing any hierarchical relationships, (iii) retrieve all of the contours organized into a two-level hierarchy; and (iv) retrieve all of the contours and reconstruct a full hierarchy of nested contours

In the present research, we adopted the second option for shape object extraction From a binary trademark image, we extracted a number of shape object images However, due to noise presence of input trademark image, many noise contours were extracted as shape objects To prevent this problem, we applied a filter so that the noise contours were removed Observations showed that the dominant shape contours usually have a much larger area in comparison with the noise contours Furthermore, due to the characteristics

of trademarks in our database, most trademarks consist of one or two dominant shape objects which play a primary role in a company’s reputation For this reason, we propose

an algorithm to extract up to two dominant shape objects out of a binary image The algorithm composes 4 main steps and one function named FilterContours (see Fig 4) responsible for taking out two dominant shape objects FilterContours operation was

Trang 26

1 Compute each contour area using the Green formulapresented in [29]

2 Sort these extracted contours according to descending order of contours area

3 Remove noise shape contours in trademark image; just keep two 2nd and 3rd biggest area contours

4 Remove one of the kept contours by FilterContours function

4.1 if (The area of 2nd contour is T1 times bigger than that of 3rd contour and the area of 2nd is less than T2) then 3rd contour is removed and 2nd contour is

remained

4.2 else if (the area of 2nd is greater than T2) then 2nd contour is deleted and 3rd

contour is kept

4.3 else then both contours are maintained

return (One or Two Contours)

Fig 4 Contour filter algorithm

4.3 Scale, rotation, translation invariant features

For each extracted shape object, a corresponding feature vector is created In order to generalize the action of trademark copy such as duplication, rotation, and resizing, the extracted feature vector of the shape object must be invariant in terms of rotation, scale, and translation In this paper, we use RBRC algorithm in [13] which is composed of three steps: a two-dimensional Fourier transform (𝐷𝐹𝑇), the magnitude of the Fourier represented into polar coordinates, and 𝐷𝐹𝑇 𝑅𝐵𝑅𝐶algorithm makes our method scale-invariant rotation-invariant, and translation-invariant This is in line with Fourier-Mellin approach and polar Fourier representation [19, 26] The approaches suggested in [19, 26] combine the phase correlation technique with the polar representation to address the

Trang 27

26

problem of both translated and rotated objects However, a major difference between

RBRC algorithm and Fourier-Mellin is that DFT used in last step in the former, while

phase correlation applied in the latter To explain why RBRC algorithm is invariant in

terms of rotation, translation and scale, theory about log-polar transform (𝐿𝑃𝑇) transform

related to magnitude of 𝐷𝐹𝑇 is presented in [19,26] We present Log-polar transform

(𝐿𝑃𝑇) and Representation of magnitude of 𝐷𝐹𝑇 into log-polar coordinate for reading

convenience

Log-polar transform is a nonlinear and nonuniform sampling method used to convert

image from the Cartesian coordinates 𝐼(𝑥, 𝑦) to the log-polar coordinates 𝐼𝐿𝑃(𝜌, 𝜃) [24]

The mathematical expression of 𝐿𝑃𝑇 procedure is shown below:

𝜌 = 𝑥 − 𝑥𝑐 2 – 𝑦 − 𝑦𝑐 2, 𝜃 = tan−1 (𝑦−𝑦𝑐 )

where(𝑥𝑐, 𝑦𝑐) is the center pixel of the transformation in the Cartesian coordinates

(x, y) denotes the sampling pixel in the Cartesian coordinates and (ρ, θ) denotes the

log-radius and the angular position in the log-polar coordinates Given 𝑔(𝑥′, 𝑦′) a scaled and

rotated image of 𝑓(𝑥, 𝑦) with scale rotation parameters a and α degrees, we have:

to pure translation in log-polar domain However, when the original image is translated by

(𝑥0, 𝑦0), the corresponding log-polar coordinates is represented by:

𝜃′ = tan−1 (𝑦 −𝑦0)

(𝑥−𝑥 0 ), 𝜌′ = log 𝑥 − 𝑥0 2 + 𝑦 − 𝑦0 2 (31)

Trang 28

27

Those equations, as well as in [22] indicated that, in case of the slight translation, it

produces only a modified log-polar image To overcome this limitation, the algorithm ﬁrst

applies Fourier transform and then applies the Log-Polar Transform to the magnitude

spectrum The magnitudes of the Fourier transform of these images and its translated

counterpart are same i.e.; invariant to translation but retain the effects of rotation and

scaling [19, 21, 22, 23, 24, 25, 26] Those comments are explained more clearly in

representation of magnitude of 𝐷𝐹𝑇 into log-polar coordinate

According to [19, 21, 23, 24, 25, 26], as well as based on the rotation and translation

properties of Fourier transform that the power spectrum of the rotated and translated image

will also be rotated by the same orientation while the magnitude remains the same Let

𝐹1(𝜉, 𝜂)and 𝐹2(𝜉, 𝜂)be the Fourier transforms of images 𝑓1(𝑥, 𝑦)and 𝑓2(𝑥, 𝑦), respectively

We are interested in three below cases:

The first case: If 𝑓2differs from 𝑓1 only by a displacement 𝑥0, 𝑦0), then

𝑓2 𝑥, 𝑦 = 𝑓1(𝑥 − 𝑥0, 𝑦 − 𝑦0) (32)

or in frequency domain:

𝐹2 𝜉, 𝜂 = 𝑒−𝑗 2𝜋(𝜉 𝑥0+𝜂 𝑦0)∗ 𝐹1(𝜉, 𝜂) (33)

In second case: If 𝑓2(𝑥, 𝑦) is a translated and rotated replica of 𝑓1(𝑥, 𝑦) with

translation (𝑥0, 𝑦0) and rotation 𝜃0, then

𝑓2 𝑥, 𝑦 = 𝑓1 𝑥 cos 𝜃0 + 𝑦 sin 𝜃0 − 𝑥0, −𝑥 sin 𝜃0 + 𝑦 cos 𝜃0 − 𝑦0 (34) DFTs of𝑓1 and 𝑓2 are related as shown below:

𝐹2 𝜉, 𝜂 = 𝑒−𝑗 2𝜋(𝜉 𝑥 0 +𝜂 𝑦 0 )∗ 𝐹1(𝜉 cos 𝜃0 + 𝜂 𝑠𝑖𝑛 𝜃0, −𝜉 sin 𝜃0 + 𝜂 cos 𝜃0) (35)

Ii is supposed that 𝑀1 and 𝑀2 are the magnitudes of 𝐹1and 𝐹2 , respectively They

have relation as shown:

𝑀2 𝜉, 𝜂 = 𝑀1(𝜉 cos 𝜃0 + 𝜂 𝑠𝑖𝑛 𝜃0, −𝜉 sin 𝜃0 + 𝜂 cos 𝜃0) (36) The Fourier magnitude spectra are transformed to polar representation:

Where𝜌 and 𝜃 are the radius and angle in the polar co-ordinate system, respectively

Trang 29

28

In the last case:If 𝑓1 is a translated, rotated and scaled version of 𝑓2, the Fourier magnitude spectra are transformed to log-polar representations and related by:

𝑀2 𝜌, 𝜃 = 𝑀1 𝜌 𝑠 , 𝜃 − 𝜃0 (38)

𝑀2 log 𝜌, 𝜃 = 𝑀1(log 𝜌 − log 𝑠 , 𝜃 − 𝜃0) (39)

It is obviously that scale, rotation, translation of image is represented as translation in frequency domain in Log-Polar coordinate This translation is eliminated by applying 𝐷𝐹𝑇 one more time

Fig 5 Illustration of three stages of the proposed method

Fig 5 summarizes the three stages of our method The query image was merged from two images in our database After it was pre-processed, the query image was transformed into a binary image Then, the two dominant shape objects were extracted via object extraction stage Finally, two feature vectors were extracted from the shape objects

4.4 Measure of similarity

An act of copying a trademark can be done by:

 One trademark is scaled, rotated, or translated from the other

 One trademark is combined from a part of the other

 One trademark is mirrored from the other

In order to recognize the copied trademark image, we derive a trademark similarity measure based on its feature vectors After stage of creating feature vectors, a trademark input image is represented by one or two feature vector Let 𝐼 and 𝐼′ the two trademark image, we suppose that 𝐹𝑖 and 𝐹𝑗𝑤𝑕𝑒𝑟𝑒 𝑖 = 1, 2; 𝑗 = 1, 2 are feature vectors of 𝐼and 𝐼′, respectively

Trang 30

29

We propose that degree of similarity of two trademarks 𝐼, 𝐼′signed 𝑆(𝐼, 𝐼′) is primarily the smallest distance between two feature vectors; one in set (𝐹𝑖)and one in set (𝐹𝑗) denoted by𝑑𝑖𝑠𝑡(𝐹𝑖, 𝐹𝑗) We employed Euclidian distance to compute the distance of two feature vectors, which can be expressed as follows:

𝑆 𝐼, 𝐼′ = min⁡{dist 𝐹𝑖, 𝐹𝑗 }𝑤𝑕𝑒𝑟𝑒 𝑖, 𝑗 = 1,2 (16)

Trang 31

30

Chapter 5: Experiments and results

Trademark image database in my thesis was taken from NOIP [29] In [27], it was indicated that there are about 30% trademark images containing text only, 70% trademark images containing text and other object such as picture, symbol, 80% consecutive characters in word have same color and 90% trademark images have number of color smaller than 5 We collect 243 trademark images in the trademark image database which are sort of composite-mark to construct our testing trademark image database Fig 6 shows some trademark image samples of the collected testing trademark database

Fig 6 Samples of the collected trademark images for testing

To evaluate the performance accuracy of our method, we used 𝑆(𝐼, 𝐼′) to retrieve top 5 most similar trademark image to an input query image In addition, we modify the input image to mimic the acts of copying a trademark such as translation, rotation, scaling, mirror and partial copy The following are illustrations for each case

5.1 Implementation

OpenCVin [34] is an open-source BSD-licensed library that includes several hundreds of computer vision algorithms OpenCV has also a modular structure, which means that the package includes several shared or static libraries The following modules are available:

 core - a compact module defining basic data structures, including the dense

multi-dimensional array Mat and basic functions used by all other modules

Trang 32

31

 imgproc - an image processing module that includes linear and non-linear image

filtering, geometrical image transformations (resize, affine and perspective warping, generic table-based remapping), color space conversion, histograms, and so on

 video - a video analysis module that includes motion estimation, background

subtraction, and object tracking algorithms

 calib3d - basic multiple-view geometry algorithms, single and stereo camera

calibration, object pose estimation, stereo correspondence algorithms, and elements

of 3D reconstruction

 features2d - salient feature detectors, descriptors, and descriptor matchers

 objdetect - detection of objects and instances of the predefined classes (for

example, faces, eyes, mugs, people, cars, and so on)

 highgui - an easy-to-use interface to video capturing, image and video codecs, as

well as simple UI capabilities

 gpu - GPU-accelerated algorithms from different OpenCV modules

It is illustration of installation procedure of OpenCV library and how it can be

integrated with Microsoft Visual Studio 2010

 Download OpenCV2.4.2 at www.sourceforge.net/projects/opencvlibrary/

 Install to folder C:\OpenCV2.4.2

 Now this ( C:\OpenCV2.4.2 ) folder contains source files for OpenCV These source files need to be built for specific development environment which in the case

is Microsoft Visual Studio 2010 CMake utility can be used to generate build files for Microsoft Visual Studio 2010

 Download and Install CMake utility from 2.8.2-win32-x86.exe)

(www.cmake.org/files/v2.8/cmake- Open CMake and select the source directory for OpenCV source files i.e( C:\OpenCV2.0) Select the build directory, for instance C:\OpenCV2.4.2\Build

 Once source and build directories are selected Press Configure button and specify generator Microsoft Visual Studio 10 and hit finish

Định dạng
Số trang	64
Dung lượng	2,11 MB