BUILDING A BOOK RECOGNITION PROGRAM ON ANDROID SMARTPHONE

Trang 1

ĐẠI HỌC QUỐC GIA HÀ NỘI TRƯỜNG ĐẠI HỌC CÔNG NGHỆ

CÔNG TRÌNH DỰ THI GIẢI THƯỞNG “SINH VIÊN NGHIÊN CỨU KHOA HỌC”

NĂM 2012

Tên công trình

BUILDING A BOOK RECOGNITION PROGRAM ON ANDROID SMARTPHONE

Họ tên sinh viên:

Hoàng Thanh Tùng, lớp K53CA-KHMT

Nguyễn Hữu Cường, lớp K53CA-KHMT

Đỗ Tất Thắng, lớp K53CA-KHMT

Khoa: Công nghệ thông tin

Giáo viên hướng dẫn: TS Nguyễn Phương Thái

HÀ NỘI, 2012

Trang 2

Contents

Abstract 3

Chapter 1: Introduction 4

1 Objective 4

2 Related works 4

Chapter 2: Approaches to image retrieval 5

1 Image meta search 5

2 Content based image retrieval 5

3 Approach for our system 6

Chapter 3: Image retrieval with OpenCV 7

1 Overview of OpenCV library 7

2 Interest point in Image Retrieval 8

3 Speeded Up Robust Feature 9

3.1 SURF’s properties 9

4 Image search in our system 11

Chapter 4: Building the system 14

1 Overview of system architecture 14

2 Handling client’s requests 15

2.1 Search for book information 15

2.2 Search for related book 16

2.3 Search for nearby bookshop 17

2.4 Rate a book 18

Chapter 5: Experimental result 19

Chapter 6: Future works 21

Chapter 7: Conclusion 22

References 23

Links to software and sites 23

Trang 3

Abstract

Information is an essential need of human Searching for information using search engine like Google, Bing and Yahoo is familiar to every people However, searching in text is boring and sometimes does not give the correct information Recently, Google have introduced the new image search engine which can search for images similar to the image uploaded by user We have developed a system which allows people to use images of book cover as queries for information about the book Our system provides an easier and more interesting way of searching The system was built for users with mobile devices which have camera function The system applies the modern technologies of Content Based Image Retrieval to provide a fast and reliable search engine Experiments on our database show that the system has high accuracy and is robust to many kind graphical deformations

Trang 4

Chapter 1: Introduction

1 Objective

Currently, the rapid development of the Internet leads to an exponential growth in the

amount of data Automatic searching and retrieving data from large database is currently

one of the most important research fields Image retrieval (IR) is the problem of finding

and retrieving images from digital image database Traditional methods utilize the

metadata associated with images such as captioning, keywords to classify images and

perform retrieval task The metadata are often created manually, these methods, therefore

cannot be applied for large database Content based image retrieval (CBIR) is the whole

new approach to IR problem In CBIR, the image will be classified and retrieved based of

the actual content of the image such as lines, colors, shapes, texture and any other

information which can be derived from the image CBIR, thus, can provide better

classification and more reliable search result A CBIR system also eliminates the need of

human force in annotating the images

Using CBIR on computer have become familiar for many people while there are quite a

few number of CBIR programs for mobile devices We aim to a simple but helpful CBIR

system for portable device users Our program, however, is not a trivial CBIR system

where only the closest matches of the input images are returned At the time this report is

written, a user of the system can take image of book cover, sending them to the server to

receive information of the book such as title, author, publisher, price, criticisms User can

also search for related books, bookshops and give opinion about books

The experimental result shows that the accuracy of the system is high for clear and large

input image The result is still acceptable when the image is noisy, small or when only a

part of the cover is captured

2 Related works

Google Goggles allows user to search for information about scenes, books, and any

objects they see just by taking and uploading photos of the scenes or objects The

program is accurate for high quality input images but the accuracy decreases dramatically

when the image is noisy or is taken from different views or in poor lighting condition

Goggles is also much slower than the traditional search engine provided by Google (it

could take 10 seconds to complete a search on phone with 3G connection)

Trang 5

Chapter 2: Approaches to image retrieval

As we have discussed in chapter 1, there are two main approaches to IR problem The traditional approach use metadata to perform the search while CBIR use information extracted from the image itself In this chapter we look deeper at each approach to see their advantages and disadvantages

1 Image meta search

In a meta search system, metadata are usually in text form and are indexed and stored in database The data are external to the images and are added to the images to make the meta search possible Image search in these systems is performed in the same way as in other text search engines Input to the system is the description of the input image (the description may be created by user of the system or derived from the context of the image) The search engine compares the description with metadata of images in database

to find closest matches and returns the result in descending order of relevancy

One advantage of meta search systems is that we could reuse the powerful text search engines to perform image retrieval Because indexing and searching in a text database is much faster than that in a multimedia database, this approach has better time performance than the CBIR approach The most common search engines today including Google, Bing and Yahoo use this approach to provide image search

A big disadvantage of the approach is that the metadata is external to the image and may not precisely describe the actual content of the image Not good metadata will produce a large number of irrelevant images in the search result Although many methods for creating the metadata automatically have been proposed (LDA for image retrieval, see [2], [3], [4]), the achieved result did not satisfy high expectation users The quality of search results relies largely on the quality of descriptions of input which are often created

by users Users may not always give good description to their images; the accuracy of the system therefore decreases accordingly Furthermore, requiring users to describe the images makes the search more complex and less interesting Thus, a more accurate and friendly search engine is desirable

2 Content based image retrieval

CBIR approach makes use of modern Computer Vision (CV) techniques to solve the image retrieval problem Differ from the meta search systems, CBIR systems do not store the metadata of images but the information derived from the images themselves including color, intensity, shapes, textures, lines, interesting points and many other useful information Different CBIR systems select different features to store and use different algorithm for classifying and searching images When users want to search for some

Trang 6

images, they just need to give their image and the system will automatically detect the relevant features from the image, comparing the information to that of the database images to find the best matches The results, thus, are graphically related to the input This help CBIR systems remove a large number of garbage results which are normally produced by meta search system CBIR systems also allow people to draw an approximation of their image and use that as the input to the search engine This breaks the limit of traditional IR systems where images can only described by words

CBIR systems, however cannot completely replace the old meta search systems Current algorithms for extracting visual features from images and searching in database of those features are still very complex in both time and space As a result, CBIR is not efficient for huge database or system with large number of queries per time interval Besides, searching for visually related images does not always give good result When users want

to find different images related to some events or people, CBIR is not suitable because the images may be graphically similar but they do not relate to the events or people

3 Approach for our system

As we have discussed, each approach has its own advantages and drawbacks We have selected CBIR as the method for developing our system There are a number of reasons for this decision

Firstly, our primary goal is creating an image search program for mobile users so we need

an interactive way of searching and sharing information Searching with text is very common and somewhat boring With our program, people can take images with their smart phone or digital cameras and use those images to search for necessary information

Secondly, we want to create a system which can give users information about thing that users do not know what it is or how to describe it This circumstance occurs when people travel to strange place, seeing things that they have never encountered before Our system can provide reliable information for users by searching for similar images and returning the information associated with those images to users

While there are too many meta search programs, there are only a few number of CBIR programs for mobile devices Hence, developing a CBIR program for those devices would be promising Android is currently the most popular operating system for mobile devices such as smartphone and tablet thus we have chosen Android as the platform for client program

Trang 7

Chapter 3: Image retrieval with OpenCV

1 Overview of OpenCV library

OpenCV [9] is an open source library for real time computer vision developed by Intel and currently supported by Willow Garage OpenCV offers many advanced functions for computer vision and image processing OpenCV is released under the BSD license The library is available for Linux, Windows, Mac OS and Android platform Originally written in C but C#, Java, Python, Ruby wrapper for OpenCV are available to users now

According to Willow Garage, OpenCV has over 500 functions with more than 2500 optimized algorithms OpenCV’s functions can be categorized as follow:

Trang 8

Figure 1: Overview of OpenCV’s functions Because of this rich collection of function, OpenCV is used by more than 40,000 people for both academic and commercial purpose

We have used OpenCV for detecting, describing interest points in images and matching those sets of interest point A set of keypoint carries the information about the image and

we can expect that two similar images should have two similar sets of keypoint Therefore, by comparing the two sets, we can measure the difference between the two images While OpenCV can detect many types of interest point, we have selected Speeded Up Robust Feature (SURF) The main properties of SURF are given in the next part of this chapter

2 Interest point in Image Retrieval

According to Herbert Bay et al [1] the process of finding similar images in database consists of three steps First we need to detect interest points at distinctive location in the image, these points could be corners, blobs or T-junctions We evaluate most the

repeatability of an interest point detector A good detector should be able to reliably find

the same physical interest points under different viewing condition The next step is describing the neighborhood of the detected interest points by a feature vector The two most important properties of this feature vector are distinctiveness and robustness Distinctiveness means that two feature vectors of two different images are different The

Trang 9

feature vector computed from a noisy, transformed version of the image should not be too different from the vector of the original image; this property is called the robustness of the detector The last step is matching the descriptor vectors of different images We measure the dissimilarity between the two vectors by the distance between them (the distance could be Mahalanobis or Euclidean distance) Distance between the vectors somehow reflects the distance between images, we need some other mechanism to refine the results and then order to images in database Due to the curse of dimensionality, matching high dimensional vectors is still a time consuming task Various techniques have been developed for matching high dimensional vectors OpenCV provide an approximate but fast algorithm for this problem called Best Bin First [5, 7] which we have used in our program

3 Speeded Up Robust Feature

Because we focused at using SURF for image retrieval, not studying how to detect and describe it, we do not give details about the mathematics foundation and other specialized knowledge For complete description of SURF, please consult Herbert Bay et al [1]

3.1 SURF’s properties

SURF was proposed by Herbert et al in 2008 and since then, it has been used in a wide range of CV application Performance of SURF can be compared to state of the art detector and descriptor while SURF is much faster SURF is built on the best to date detector and descriptor (a Hessian matrix based detector and a distribution based descriptor) SURF simplifies the detector and descriptor to achieve high speed while keeps the performance unchanged As stated by the authors, SURF has high repeating score, distinctiveness and is robust to image deformations The following figures are taken from Herbert Bay’s paper that show the performance of SURF in some typical benchmark databases

Trang 10

Figure 2: Repeatability score for image rotation of up to 180 degrees

Fast-Hessian is the more accurate detector and is used for SURF detector in OpenCV

Trang 11

Figure 3: Repeating score for three different images when

scale and blur of those images are changed

4 Image search in our system

OpenCV provide a strong implementation of SURF with the optimized algorithms for detecting, descripting and matching SURF keypoints In OpenCV, keypoint of all types are represented by an instance of KeyPoint class Classes relating to detecting,

SurfDescriptorExtractor, FlannBasedMatcher and DMatch

We first resize all images to the size of 200 by 300 pixels then run SurfFeatureDetector for every image to detect keypoints For each image, we try several setting of detector parameters to limit the number of keypoint between 150 and 200 Experiments show that

150 to 200 keypoints per image give good matching result while less time to compute Descriptor vector for each image is then computed using SurfDescriptorExtractor We store the keypoints, descriptor vector and associated information of the image (author,

Tiêu đề	Building a book recognition program on android smartphone
Tác giả	Hoàng Thanh Tùng, Nguyễn Hữu Cường, Đỗ Tất Thắng
Người hướng dẫn	TS. Nguyễn Phương Thái
Trường học	Đại Học Quốc Gia Hà Nội - Trường Đại Học Công Nghệ
Chuyên ngành	Công Nghệ Thông Tin
Thể loại	Công trình dự thi
Năm xuất bản	2012
Thành phố	Hà Nội

Định dạng
Số trang	23
Dung lượng	1,23 MB