BUILDING A BOOK RECOGNITION PROGRAM ON ANDROID SMARTPHONE
Trang 1ĐẠI HỌC QUỐC GIA HÀ NỘI TRƯỜNG ĐẠI HỌC CÔNG NGHỆ
CÔNG TRÌNH DỰ THI GIẢI THƯỞNG “SINH VIÊN NGHIÊN CỨU KHOA HỌC”
NĂM 2012
Tên công trình
BUILDING A BOOK RECOGNITION PROGRAM ON ANDROID SMARTPHONE
Họ tên sinh viên:
Hoàng Thanh Tùng, lớp K53CA-KHMT
Nguyễn Hữu Cường, lớp K53CA-KHMT
Đỗ Tất Thắng, lớp K53CA-KHMT
Khoa: Công nghệ thông tin
Giáo viên hướng dẫn: TS Nguyễn Phương Thái
HÀ NỘI, 2012
Trang 2Contents
Abstract 3
Chapter 1: Introduction 4
1 Objective 4
2 Related works 4
Chapter 2: Approaches to image retrieval 5
1 Image meta search 5
2 Content based image retrieval 5
3 Approach for our system 6
Chapter 3: Image retrieval with OpenCV 7
1 Overview of OpenCV library 7
2 Interest point in Image Retrieval 8
3 Speeded Up Robust Feature 9
3.1 SURF’s properties 9
4 Image search in our system 11
Chapter 4: Building the system 14
1 Overview of system architecture 14
2 Handling client’s requests 15
2.1 Search for book information 15
2.2 Search for related book 16
2.3 Search for nearby bookshop 17
2.4 Rate a book 18
Chapter 5: Experimental result 19
Chapter 6: Future works 21
Chapter 7: Conclusion 22
References 23
Links to software and sites 23
Trang 3Abstract
Information is an essential need of human Searching for information using search engine like Google, Bing and Yahoo is familiar to every people However, searching in text is boring and sometimes does not give the correct information Recently, Google have introduced the new image search engine which can search for images similar to the image uploaded by user We have developed a system which allows people to use images of book cover as queries for information about the book Our system provides an easier and more interesting way of searching The system was built for users with mobile devices which have camera function The system applies the modern technologies of Content Based Image Retrieval to provide a fast and reliable search engine Experiments on our database show that the system has high accuracy and is robust to many kind graphical deformations
Trang 4Chapter 1: Introduction
1 Objective
Currently, the rapid development of the Internet leads to an exponential growth in the
amount of data Automatic searching and retrieving data from large database is currently
one of the most important research fields Image retrieval (IR) is the problem of finding
and retrieving images from digital image database Traditional methods utilize the
metadata associated with images such as captioning, keywords to classify images and
perform retrieval task The metadata are often created manually, these methods, therefore
cannot be applied for large database Content based image retrieval (CBIR) is the whole
new approach to IR problem In CBIR, the image will be classified and retrieved based of
the actual content of the image such as lines, colors, shapes, texture and any other
information which can be derived from the image CBIR, thus, can provide better
classification and more reliable search result A CBIR system also eliminates the need of
human force in annotating the images
Using CBIR on computer have become familiar for many people while there are quite a
few number of CBIR programs for mobile devices We aim to a simple but helpful CBIR
system for portable device users Our program, however, is not a trivial CBIR system
where only the closest matches of the input images are returned At the time this report is
written, a user of the system can take image of book cover, sending them to the server to
receive information of the book such as title, author, publisher, price, criticisms User can
also search for related books, bookshops and give opinion about books
The experimental result shows that the accuracy of the system is high for clear and large
input image The result is still acceptable when the image is noisy, small or when only a
part of the cover is captured
2 Related works
Google Goggles allows user to search for information about scenes, books, and any
objects they see just by taking and uploading photos of the scenes or objects The
program is accurate for high quality input images but the accuracy decreases dramatically
when the image is noisy or is taken from different views or in poor lighting condition
Goggles is also much slower than the traditional search engine provided by Google (it
could take 10 seconds to complete a search on phone with 3G connection)
Trang 5Chapter 2: Approaches to image retrieval
As we have discussed in chapter 1, there are two main approaches to IR problem The traditional approach use metadata to perform the search while CBIR use information extracted from the image itself In this chapter we look deeper at each approach to see their advantages and disadvantages
1 Image meta search
In a meta search system, metadata are usually in text form and are indexed and stored in database The data are external to the images and are added to the images to make the meta search possible Image search in these systems is performed in the same way as in other text search engines Input to the system is the description of the input image (the description may be created by user of the system or derived from the context of the image) The search engine compares the description with metadata of images in database
to find closest matches and returns the result in descending order of relevancy
One advantage of meta search systems is that we could reuse the powerful text search engines to perform image retrieval Because indexing and searching in a text database is much faster than that in a multimedia database, this approach has better time performance than the CBIR approach The most common search engines today including Google, Bing and Yahoo use this approach to provide image search
A big disadvantage of the approach is that the metadata is external to the image and may not precisely describe the actual content of the image Not good metadata will produce a large number of irrelevant images in the search result Although many methods for creating the metadata automatically have been proposed (LDA for image retrieval, see [2], [3], [4]), the achieved result did not satisfy high expectation users The quality of search results relies largely on the quality of descriptions of input which are often created
by users Users may not always give good description to their images; the accuracy of the system therefore decreases accordingly Furthermore, requiring users to describe the images makes the search more complex and less interesting Thus, a more accurate and friendly search engine is desirable
2 Content based image retrieval
CBIR approach makes use of modern Computer Vision (CV) techniques to solve the image retrieval problem Differ from the meta search systems, CBIR systems do not store the metadata of images but the information derived from the images themselves including color, intensity, shapes, textures, lines, interesting points and many other useful information Different CBIR systems select different features to store and use different algorithm for classifying and searching images When users want to search for some
Trang 6images, they just need to give their image and the system will automatically detect the relevant features from the image, comparing the information to that of the database images to find the best matches The results, thus, are graphically related to the input This help CBIR systems remove a large number of garbage results which are normally produced by meta search system CBIR systems also allow people to draw an approximation of their image and use that as the input to the search engine This breaks the limit of traditional IR systems where images can only described by words
CBIR systems, however cannot completely replace the old meta search systems Current algorithms for extracting visual features from images and searching in database of those features are still very complex in both time and space As a result, CBIR is not efficient for huge database or system with large number of queries per time interval Besides, searching for visually related images does not always give good result When users want
to find different images related to some events or people, CBIR is not suitable because the images may be graphically similar but they do not relate to the events or people
3 Approach for our system
As we have discussed, each approach has its own advantages and drawbacks We have selected CBIR as the method for developing our system There are a number of reasons for this decision
Firstly, our primary goal is creating an image search program for mobile users so we need
an interactive way of searching and sharing information Searching with text is very common and somewhat boring With our program, people can take images with their smart phone or digital cameras and use those images to search for necessary information
Secondly, we want to create a system which can give users information about thing that users do not know what it is or how to describe it This circumstance occurs when people travel to strange place, seeing things that they have never encountered before Our system can provide reliable information for users by searching for similar images and returning the information associated with those images to users
While there are too many meta search programs, there are only a few number of CBIR programs for mobile devices Hence, developing a CBIR program for those devices would be promising Android is currently the most popular operating system for mobile devices such as smartphone and tablet thus we have chosen Android as the platform for client program
Trang 7Chapter 3: Image retrieval with OpenCV
1 Overview of OpenCV library
OpenCV [9] is an open source library for real time computer vision developed by Intel and currently supported by Willow Garage OpenCV offers many advanced functions for computer vision and image processing OpenCV is released under the BSD license The library is available for Linux, Windows, Mac OS and Android platform Originally written in C but C#, Java, Python, Ruby wrapper for OpenCV are available to users now
According to Willow Garage, OpenCV has over 500 functions with more than 2500 optimized algorithms OpenCV’s functions can be categorized as follow:
Trang 8Figure 1: Overview of OpenCV’s functions Because of this rich collection of function, OpenCV is used by more than 40,000 people for both academic and commercial purpose
We have used OpenCV for detecting, describing interest points in images and matching those sets of interest point A set of keypoint carries the information about the image and
we can expect that two similar images should have two similar sets of keypoint Therefore, by comparing the two sets, we can measure the difference between the two images While OpenCV can detect many types of interest point, we have selected Speeded Up Robust Feature (SURF) The main properties of SURF are given in the next part of this chapter
2 Interest point in Image Retrieval
According to Herbert Bay et al [1] the process of finding similar images in database consists of three steps First we need to detect interest points at distinctive location in the image, these points could be corners, blobs or T-junctions We evaluate most the
repeatability of an interest point detector A good detector should be able to reliably find
the same physical interest points under different viewing condition The next step is describing the neighborhood of the detected interest points by a feature vector The two most important properties of this feature vector are distinctiveness and robustness Distinctiveness means that two feature vectors of two different images are different The
Trang 9feature vector computed from a noisy, transformed version of the image should not be too different from the vector of the original image; this property is called the robustness of the detector The last step is matching the descriptor vectors of different images We measure the dissimilarity between the two vectors by the distance between them (the distance could be Mahalanobis or Euclidean distance) Distance between the vectors somehow reflects the distance between images, we need some other mechanism to refine the results and then order to images in database Due to the curse of dimensionality, matching high dimensional vectors is still a time consuming task Various techniques have been developed for matching high dimensional vectors OpenCV provide an approximate but fast algorithm for this problem called Best Bin First [5, 7] which we have used in our program
3 Speeded Up Robust Feature
Because we focused at using SURF for image retrieval, not studying how to detect and describe it, we do not give details about the mathematics foundation and other specialized knowledge For complete description of SURF, please consult Herbert Bay et al [1]
3.1 SURF’s properties
SURF was proposed by Herbert et al in 2008 and since then, it has been used in a wide range of CV application Performance of SURF can be compared to state of the art detector and descriptor while SURF is much faster SURF is built on the best to date detector and descriptor (a Hessian matrix based detector and a distribution based descriptor) SURF simplifies the detector and descriptor to achieve high speed while keeps the performance unchanged As stated by the authors, SURF has high repeating score, distinctiveness and is robust to image deformations The following figures are taken from Herbert Bay’s paper that show the performance of SURF in some typical benchmark databases
Trang 10Figure 2: Repeatability score for image rotation of up to 180 degrees
Fast-Hessian is the more accurate detector and is used for SURF detector in OpenCV
Trang 11Figure 3: Repeating score for three different images when
scale and blur of those images are changed
4 Image search in our system
OpenCV provide a strong implementation of SURF with the optimized algorithms for detecting, descripting and matching SURF keypoints In OpenCV, keypoint of all types are represented by an instance of KeyPoint class Classes relating to detecting,
SurfDescriptorExtractor, FlannBasedMatcher and DMatch
We first resize all images to the size of 200 by 300 pixels then run SurfFeatureDetector for every image to detect keypoints For each image, we try several setting of detector parameters to limit the number of keypoint between 150 and 200 Experiments show that
150 to 200 keypoints per image give good matching result while less time to compute Descriptor vector for each image is then computed using SurfDescriptorExtractor We store the keypoints, descriptor vector and associated information of the image (author,