Adaptive neuro fuzzy network for recommendation

International Journal Publications Submitted Trong Hai Duong and Duc Anh Nguyen, “User Behaviors-based Collaborative Filtering for Video Recommendation Using Ontology-based Neuro-Fuzzy

Trang 1

ADAPTIVE NEURO-FUZZY NETWORK FOR RECOMMENDATION

In Partial Fulfillment of the Requirements of the Degree of

MASTER OF INFORMATION TECHNOLOGY MANAGEMENT

In Computer Science and Engineer

Trang 2

ADAPTIVE NEURO-FUZZY NETWORK FOR RECOMMENDATION

In Partial Fulfillment of the Requirements of the Degree of

MASTER OF INFORMATION TECHNOLOGY MANAGEMENT

In Computer Science and Engineer

By

Mr Nguyen Duc Anh ID: MITM05001 International University - Vietnam National University HCMC

Trang 3

Acknowledgments

Throughout my thesis and development process, it is impossible for me to well complete all my tasks and missions without the support and encouragement from the other ones

At first, I would like to thank Dr Duong Trong Hai He is always by my side to support me identify the main ideas of this research this is the most important support for me

He instructs me to be familiar with data-mining, machine-learning, etc Moreover, He is willing to give me helpful advices whenever I have difficulties or troubles with my thesis

I am grateful to my family, who encourages and motivates me to keep moving forward

There are also my colleagues, schoolmates who also support and help me directly and indirectly; I want to say thank all of them

Trang 4

Plagiarism Statements

I would like to declare that, apart from the acknowledged references, this thesis either does not use language, ideas, or other original material from anyone; or has not been previously submitted to any other educational and research programs or institutions I fully understand that any writings in this thesis contradicted to the above statement will automatically lead to the rejection from the MITM program at the International University – Vietnam National University Hochiminh City

Trang 5

Copyright Statement

This copy of the thesis has been supplied on condition that anyone who consults it is understood to recognize that its copyright rests with its author and that no quotation from the thesis and no information derived from it may be published without the author’s prior consent

Trang 6

Table of Contents

Plagiarism Statements iv

Copyright Statement v

This Thesis based on Publications x

Abstract xi

Chapter 1: Introduction 1

1.1 Motivation 1

1.2 Goals of the Dissertation 1

1.3 Overal Approach 1

1.4 Related Work 2

1.5 Thesis Outline 5

Chapter 2: User Behaviors-based CF Using Neuro-Fuzzy Network 7

2.1 Profile Modeling 7

2.2 Content-based Filtering Using Neuro-Fuzzy Network 8

Chapter 3 Experiments 12

3.1 Dataset Introduction 12

3.1.1 Overview 12

3.1.2 Dataset analysis 13

3.2 Applied ANFIS to netflix dataset 14

3.2.1 ANFIS Model 14

3.2.2 Run the testing dataset 25

3.3 Evaluation Methods 27

3.4 Practice and Result 27

3.4.1 Movie 329 28

3.4.2 Movie 30 30

Trang 7

3.4.4 Movie 2848 33

3.4.5Movie 2548 34

3.5 Evaluation results 36

Chapter 4: Conclusion 38

References 39

Trang 8

List of Figures

Fg 2.1.1 1 Profile generation process 8

Fg3.1.1 1 Netflix Dataset structure 12

Fg3.1.2 1 Rating-scores statistic 13

Fg3.1.2 2 The rating-scores comparison for top 10 movies have highest number of rating14 Fg3.2.1 1 The ANFIS’s structure 14

Fg3.2.1 2 The main workflow of ANFIS 15

Fg3.2.1 3Sample of user profile Level 1 16

Fg3.2.1 4 Sample of user profile Level 2 18

Fg3.2.1 5HyperBox dataset and PureBox Dataset where before and after clustered by NCP 18 Fg3.2.1 6 Samples of purebox clusters 20

Fg3.2.1 7 The Max-Min PureBox 21

Fg3.2.1 8 The final result of the user profile building steps 22

Fg3.2.1 9 Samples of data use to training by Perceptron 23

Fg3.3 1 Distribution of Tranning set and Testing set in dataset 28

Fg3.4.2 1 Comparison between training data set and real dataset of movie 30 30

Fg3.4.2 2 Result of 100 samples used to test for movie 30 31

Fg3.5 1 MAE and RMSE of movies 2464,2548,30,2848,329 36

Trang 9

List of Table

Table 3.2.1 1 Samples of W had computed by Perception for Movie 329 23

Table 3.2.2.1Predict Rating-scores for 5 userssamples, movie 329 26

Table 3.4.1.1 Comparison between training data set and real dataset of movie 329 28

Table3.4.2 1 Comparison between training data set and real dataset of movie 30 30

Trang 10

This Thesis based on Publications

International Conference Publications (Accepted)

Duc Anh Nguyen and Trong Hai Duong, “Video Recommendation Using

Neuro-Fuzzy on Social TV Environment”, International conference on Computer Science, Applied Mathematics and Applications (ICCSAMA 2015) published in a volume of series

Advances in Intelligent Systems and Computing of Springer Verlag, indexed by ISI Proceedings, DBLP, Ulrich's, EI-Compendex, SCOPUS, Zentralblatt Math, MetaPress, Springerlink Issues in ISI-SCI journals

International Journal Publications (Submitted)

Trong Hai Duong and Duc Anh Nguyen, “User Behaviors-based Collaborative

Filtering for Video Recommendation Using Ontology-based Neuro-Fuzzy on Social TV”,

ELSEVIER, 03-2015

Trang 11

Abstract

Recommendation systems are systems that seek for prediction and give users recommendation about products or items that they might be interested in There are two common approaches, which have been proposed to perform recommendation system; they are content-based filtering (CBF) and collaborative filtering (CF) CBF methods are based on the description of previously preferred items to predict a target user’s rating On the other hand, CF methods are based on neighbors’ ratings to predict a target user’s rating In this work, we consider recommendation on the context of Social TV (STV) The watchers/users may either share, comment, rate, or tag videos in which they are interested in Each video must be watched and rated by many users For these assumptions, we proposed a novel model-based collaborative filtering using a fuzzy neural network to learn user’s social web behaviors to make video recommendation on STV We use Netflix data-set to evaluate the proposed method The result shown that the proposed approach is a significant effective method

Keywords: ANFIS, Ontology, Smart TV, Video, Recommendation system, and Neural

network

Trang 12

Chapter 1: Introduction

1.1 Motivation

Recommendation is a subclass of information filtering, which uses data on past user preferences to predict possible future likes and interests There are few approaches which applied in recommendation system such as Collaborative-based, Demographic-based, Content-based, Knowledge –based, Hybrid-based Recommendation

Prior collaborative filtering (CF) methods based on neighbors’ ratings to predict a target user’s rating A situation that there are no any neighbors, the traditional CF’s result is gone downhill To solve the aforementioned problem,

we proposed a novel model-based collaborative filtering using a fuzzy neural network to learn user’s social web behaviors for video recommendation on STV

1.2 Goals of the Dissertation

Our goals in this thesis focused on solve the problem of lack of neighbors in the traditional CF In that, we predict unknown rating from a target user to a target video by adjusting users profile and rating-scale values using ANFIS

1.3 Overal Approach

The idea of the proposed method is to adjust users’ social web behavior to their owning ratings dual with a target video In particular, a user profile is learned by the user’s social web behavior This user profile is presented by a vector For each target video, we collect all users’ profiles who rated on the target video Each user’s profile are considered as an input vector and his/her corresponding rating-score is as output value of the fuzzy neural network The

Trang 13

trained neural network is used to predict the rating of a user to the target video

We use netflix data set to evaluate the proposed method

1.4 Related Work

The trend for using online social networks to talk about TV programs and to share their opinions with others, is increasing This reflected with the dissemination of platforms designed for Social TV [1] The NoTube [1] brings the social web and TV closer to the consumers The social TV is able to provide users’ social context that personalize users’ TV program and video with both of content-based and collaborative-based filtering manners Content-based filtering (CBF)[4] relies on the description of previously preferred items of a target user and generates recommended items with content are similar to those the target user has preferred in the past without directly relying on the preferences of other users Collaborative filtering (CF) [5] relies on the basis of previously preferred items of a large group of users’ rating information and make recommended items

to a target user based on the items that similar users have preferred in the past, without relying on any information about the items themselves other than their ratings According to algorithms of CF, CF can be grouped into two types: (a) Memory-based collaborative filtering methods recommend items are those that were previously preferred by users who share similar preferences as the target user [6] These algorithms require all ratings, items, and users to be stored in memory

(b) Model-based collaborative filtering methods recommend items based on models that are trained by using the collection of ratings to identify patterns in the input data [7] The memory-based collaborative filtering store the training data in memory that is delayed until a recommendation is made to the system, as

Trang 14

opposed to model-based collaborative filtering, where the system tries to generalize a model using the training data before recommendation making The advantage of memory-based methods is deal with less parameters to be tuned, while the disadvantage is that the approach cannot deal with data scarcity

in a principled manner [9]

In Social TV, recommendation systems have been developed to help users access TV programs that are appropriate to their preferences by learning from viewing history data, mapping social users’ preferences and TV program attributes [15, 16, 9] Authors [9] proposed hybrid approach combining content-based methods with those based on collaborative filtering for TV program recommendation

To eliminate the overload computation of collaborative filtering, singular value decomposition technique [17] is applied in order to reduce the dimension

of the user-item representation, and afterwards, how this low-rank representation can be employed in order to generate item-based prediction, which has shown a good behavior in the TV domain Authors [10] proposed a framework for adaptive news recommendation in social media by utilizing user’s comments User’s comments are collected to build a topic profile using a weighted graph

To generate the weighted importance of topics, the standard TF/IDF model [11] and variant of the PageRank algorithms [12] are applied With the topic profile constructed, it can be used to select relevant news from a collection of news articles in the database by constructing a retrieval module using combination of the strengths of two state-of-the-art news retrieve time factor [13] and language model [14]

Trang 15

In fact, there are many researches on recommendation systems One of them

is the research named: “Neural Network Modeling for an Intelligent Recommendation System Supporting SRM for Universities in Thailand”, [21] proposed by Kanokwan Kongsakun and Chun Che Fung This is a recommendation system proposal, used to predict and recommend the appropriate courses for students thereby increase their chance of success Their proposal is based on students' historic records and final results The authors used Neural Network techniques to find the structures and relationships within data and final GPA of freshmen in subjects of interest The authors [21] had come to the conclusion that recommendation system is a useful service

According to another research named: “A Hybrid Latent Variable Neural Network Model for Item Recommendation” [22] The authors [22] proposed neural network model with latent input variables named Latent Neural Network (LNN), as a hybrid collaborative filtering of both approaches CF and CBF The strong point of LNN is that it addressed the cold-start problem, but the complexity of LNN requires more time to train than others In additional, LNN

is capable of modeling higher-order dependencies and nonlinearities in the data; but in fact the data in MovieLens data-set, Netflix data-set and the similar datasets are inherently sparse and nonlinear models Thus, their proposal is not suitable as well for that kind of data

Another method proposed by Christina Jianfeng Gao, Patrick Pantel, Michael Gamon, Xiaodong He, Li Deng [23] named “Modeling Interestingness with Deep Neural Networks”, this is a recommendation system to recommend users a target document they may interested in, based on analyzing the

Trang 16

documents which they have read According to this research, the authors [23] used two interestingness tasks: automatic highlighting and contextual entity search within their proposal

Another interesting proposal named: “A Hybrid Movie Recommender System Based on Neural Networks” [24], in which the authors [24] proposed a hybrid filtering approach to combine CF and CBF Their model had archived overall 82% of successful recommendations, although the authors said it seems strange that the precision falls as the user has evaluated many movies They came up with the final conclusion saying that the reason is as the watcher/user keeps evaluating movies, it is possible that user has covered a wide range of movies that share a common characteristic features (Kinds, Stars, Synopsis), while being totally different and, subsequently, differently evaluated [24]

1.5 Thesis Outline

In this thesis, about which, the introduction in chapter 1 aims to reveal the problems I have been conducting a research and the parameters included in my thesis research paper

The second part is chapter 2 named “User Behaviors-based CF Using Fuzzy Network” The main purpose of this part is to analyze in detail the relevant theories such as User modeling, ANFIS, TF/IDF, Perceptron, etc which will apply in my thesis research paper

Neuro-Chapter 3 is Experiment This chapter introduces about applying the proposed novel ANFIS for Video recommendation system and introduces the

Trang 17

Evaluation methods which I used to evaluate the results, In this thesis, I used Netflix as a sample dataset

Finally, chapter 4 is the last one of my thesis report, it presents the conclusion

Trang 18

Chapter 2: User Behaviors-based CF Using Neuro-Fuzzy Network

2.1 Profile Modeling

User profile can be static and dynamic information In static, user permanent information such as name, age, sex, educational background etc is included, whereas the dynamic user profile, the less permanent characteristics like user’s current motions, locations are mentioned; however, the user interest which often changes is mainly included Here, we consider the profile with only user interest, which is user’s social web behavior such as user’s posts, comments, share, ratings, preferences, and tags The user profile is represented by using a weighted vector defined as follows:

Definition 1 (Profile Feature).Let be a profile of an user The profile feature pi is defined as follows: = { ( ), ( ),…, ( )} is a set of pairs of concept and its weight

The process used to generate a user’s profile, which is presented in Fig:

2.1.1.1 The tf /idf weight (term frequency inverse document frequency) is a

weight often used in information retrieval and text mining This weight is a statistical measure used to evaluate how important a word is to a document in a

collection or corpus Here we use traditional vector space model (tf /idf) to

define the feature of the documents [18]

Trang 19

Fg 2.1.1 1 Profile generation process

2.2 Content-based Filtering Using Neuro-Fuzzy Network

(he/she watched and made a rating to them) The user’s profile can be considered

as a feature vector: = {( ), ( ), ., ( )}, where is a genre from movies in and generated by using vector space tf/idf We assume

that each movie ; j = 1 k also can be interested by n users =

can be denoted as , so the rating-score set of a movie with respect to

is denoted by ={ , ,…, } For each movie , we consider a black-box-typed model expressing a mathematical relationship between a input

of feature vectors of users in = { , ,…, , denoted by { , , …,

, is the feature vector of user of the data set and is rating-score from user to movie , as an output This work can be seen as system-

identifying process, in which the model works as a mathematical function f

expressed by a mapping as follows:

Trang 20

Relating to the FIS, it can be summarized as follows The FIS is built based

on the algorithm establishing an adaptive neuro-fuzzy system, ENFS [3] By using data driven method, the same features or characteristics of the object are expressed by hyperbox-typed data clusters, which can be considered as a structure upon which fuzzy sets and membership functions are established to build the FIS In the FIS, the fuzzy deducing rules are built based on constituting clauses depicting the fuzzy relationships typing MISO as following:

where is language variables expressing the result of clustering process; ; k

= 1…M is maximum membership value of the sample in -labeled data

clusters, which is used to establish the corresponding hyperbox value of this sample; is constituting rule; and is the constituting value, which is used to calculate the predicting value of the sample

We consider a set of the patterns covered by the h min-max hyperbox The is determined using two vertexes, the max vertex = [ ,

, …, ] and the min vertex = [ , , …, ], where =

Trang 21

considered as a pure hyperbox labeling m, and denoted An HB can be considered as a crisp frame on which different types of membership functions (MFs) can be adapted Here, the original Simpson’s MF is adopted, in which the

slope outside the HB is established by the value of the fuzziness parameter

=

Where t = ; is the number of pure hyperboxes labeling m Several pHB

can be associated with the same cluster labeling m, thus the overall input MF,

( is calculated as follows:

The process of the ANFIS can be summarized as follows:

Choose the number of neurons of the hidden layer

Step 1 Separate the data set {( , ), i = 1 k} (1) to build data clusters , i

= 1…m

Using the algorithm for parting data space, PDS [2], the given data set (1) is separated into hyper box-typed data clusters in the input space and hyper planes,

, i = 1 m, in the output data space Where, M is optimal number of data

clusters established by the clustering process

Step 2 Build a new data set, named NN-set, for training the NN

The NN-set has k samples with input-output samples depicted by (1)

Step 3 Train the NN

Trang 22

The NN-set is used for train the NN based on the algorithm Le-venberg- Marquardt

- Calculate values of MFs based on equations (3) and (4);

- The output of the neuro-fuzzy network is calculated as following equations:

Step 4.Check for stopping condition

Calculate error between output of the NN-set and corresponding depicting output of the NN

=

- If E N [E] : the structure FNS based on the NN is chosen;

- If E N > [E] : N=N+1 then return to Step 3

Trang 23

Chapter 3 Experiments 3.1 Dataset Introduction

3.1.1 Overview

The sample dataset used is netflix data set which contains 14,707,483ratings

which performed by 459,340 anonymous NetFlix’s customers over

17,770movies, from 1999-11-11 to 2005-12-31 The rating scale has 5 values: 5

is excellent, 4 is very good, 3 is good, 2 is fair, and 1 is poor

Fg3.1.1 1 Netflix Dataset structure

There are 2 primary tables named “rating” and “movie_info” The first one is

rating, it has 9 columns: movie_id, genre,rating, director, writers, star,

image_link, host, content_rating

The table “rating” has 4 columns: User_id, movie_id, rating, date

Trang 24

3.1.2 Dataset analysis

The Customer’s rating-score was stored into a table named “rating” It

records 14,7 million of user's rating, each record represents a single rating of one

movie_id by one user_id, and some additional information as user's rating score,

date All of them are anonymous rates, the lowest rating score is the label of 1

rating-score with 632 thousand rates, after that is the label of 2 rating-scores with

1,4 million rates, the label of 5 rating-scores with 3 million rates and the highest

is the label of 4 scores with 4,7 million rates, and the label of 3

rating-scores has 4 million rates As shown in Fg3.1.2.1

Fg3.1.2 1 Rating-scores statistic

Fg3.1.2.2 shown the statistic of top 10 movies which have number of rating

is highest

Trang 25

Fg3.1.2 2 The rating-scores comparison for top 10 movies have highest number of

Trang 26

As shown in Fg3.2.1.1 (the ANFIS structure) and Fg3.2.1.2 (the ANFIS’s

workflow),The dataset has 4 main steps:

Step 1: Build User profiles level 1 & 2

Step 2:Cluster user profile dataset into PureBoxs

Step 3: Build User profiles by computing the distance between users and 5

groups of Purebox

Step 4:Do Perceptron

Fg3.2.1 2 The main workflow of ANFIS

Định dạng
Số trang	53
Dung lượng	1,46 MB