Hệ tư vấn dựa trên mức độ quan trọng hàm ý thống kê tt tiếng anh

The research still has some unresolved issues: Only focusing on building the models on binary data and not paying the attention to non-binary data; just focusing on the accuracy of the r

Trang 1

UNIVERSITY OF SCIENCE AND TECHNOLOGY

- -

PHAN PHUONG LAN

RECOMMENDATION SYSTEMS BASED ON STATISTICAL IMPLICATIVE MEASURES

Specialization: Computer Science

Code: 9480101

DOCTORAL THESIS SUMMARY

Danang – 2019

Trang 2

UNIVERSITY OF SCIENCE AND TECHNOLOGY -

UNIVERSITY OF DANANG

Academic Instructors:

1 Huynh Xuan Hiep, Assoc Prof., PhD

2 Huynh Huu Hung, PhD

Opponent 1:……… ……… Opponent 2:……… ……… ……… Opponent 3:……… …… ………

The dissertation will be defended before the Board of thesis review Meeting at: ………

At hour day month year

The dissertation is available at:

- National Library

- Information and Learning Center, University of Da Nang

Trang 3

PREFACE

1 The urgency of the thesis

The recommendation system (RS) is considered as one of the effective solutions for the information explosion problem because it can automatically analyze data to predict the ratings

of a user for products, services, etc thereby recommending to that user the list of items with the highest predicted ratings The main techniques used to build a RS are: Content-based, collaborative filtering, knowledge-based, and hybrid methods In particular, collaborative filtering is the most important and commonly used technique Proposing and improving the recommendation models to adapt to the diversity of application areas, the difference of user requirements and the development

of technology are always the main research direction on RSs Applying the statistical implicative analysis method (SIA) to other research fields is being one of the most interesting topics Not much research links that method to RSs The research still has some unresolved issues: Only focusing on building the models on binary data and not paying the attention to non-binary data; just focusing on the accuracy of the recommended good items when evaluating RSs; using the association rules to make the recommendation, as a result, the recommendation time may

be long and the computer may be overloaded; and not noticing the combination among the characteristics of statistical implicative measures to improve the recommendation accuracy Therefore, the PhD thesis "Recommendation systems based

on statistical implicative measures" is conducted to contribute a small part to the research field on RSs and SIA

Trang 4

2 Objectives, objects and scope of research of the thesis

2.1 Research objectives

The objective of the thesis is to understand and apply the statistical implicative measures and the collaborative filtering technique to propose recommendation models as well as improve the accuracy of proposed models Thereby, the thesis contributes

to linking the SIA method to the research on RSs

2.2 Research objects

Two main objects of the study are: Statistical implicative measures; and recommendation models based on statistical implicative measures and collaborative filtering technique

2.3 Research scopes

The scope of the study is: To obtain the understanding on the statistical implicative measures, collaborative filtering technique, and the existing studies on RSs using the SIA method; and to propose new recommendation models that can be applied on both binary and non-binary data and improve the accuracy of recommendation (the list of good items and the predicted ratings)

3 Research methodology

Literature review and experiment are two main research methods to be used by this thesis

4 Contribution of the thesis

- Firstly, two new measures developed on statistical implicative measures: (1) k nearest neighbors/users based implicative rating - KnnUIR; and (2) k nearest neighbors/items based implicative rating - KnnIIR These measures are used to predict the ratings given to items by a user

Trang 5

- Secondly, three new recommendation models: (1) based on the statistical implicative measures and association rules; (2) based on KnnUIR; and (3) based on KnnIIR The proposed models can be applied on both binary data and non-binary data

- Thirdly, the Interestingness software tool including the utility functions and the proposed recommendation models This tool is developed in the R language, and is used for experiment

- Fourthly, the DKHP binary dataset storing the course registration DKHP is collected and used for evaluating the accuracy of recommendation

Trang 6

CHAPTER 1 AN OVERVIEW

1.1 Statistical implicative measures

1.1.1 Definition

Statistical implicative measures (SIM) are measures proposed

by the statistical implicative analysis method SIMs are used to detect trends in a binary attribute set or non-binary attribute set SIMs are asymmetric, probability based and non-linear measures

1.1.2 Statistical implicative measures for binary data

1.1.3 Statistical implicative measures for non-binary data

1.2 Statistical implicative ratings

Statistical implicative rating measures is proposed by the thesis using some existing SIMs We can consider these measures

as SIMs Statistical implicative rating measures are used to predict the rating of a user for an item; thereby contributing to solving recommendation problems

1.3 Recommendation based on statistical implicative

analysis

1.3.1 Recommendation systems and research directions 1.3.2 Collaborative filtering technique

1.3.2.1 Memory based methods

1.3.2.2 Model based methods

1.3.3 Evaluating recommendation systems

1.3.3.1 K-fold cross validation method

1.3.3.2 Classification accuracy metrics

1.3.3.3 Predictive accuracy metrics

Trang 7

1.3.3.4 Rank accuracy metrics

1.3.4 Statistical implicative analysis based recommendation

1.3.4.1 Existing recommendation methods

1.3.4.2 Recommendation based on statistical implicative measures

1.4 Conclusion

Chapter 1 focuses on obtaining the understanding on SIMs, RSs and the accuracy metrics used for evaluating RSs The thesis summarizes SIMs (such as implicative intensity, entropic version

of implicative intensity, cohesion, contribution) and identify which measures should be used by RSs and to improve the accuracy of recommendation result Besides, Chapter 1 also focuses on the collaborative filtering technique and the accuracy metrics to be used for building and evaluating recommendation models Moreover, Chapter 1 also presents the research directions on RSs as well as the existing research related to RSs based on statistical implicative analysis; then identify the scope

of study and sketch the proposal

Trang 8

CHAPTER 2 RECOMMENDATION BASED ON STATISTICAL IMPLICATIVE MEASURES AND

ASSOCIATION RULES

Differing from the existing recommendation models based on the statistical implicative analysis (SIA) and association rules, the proposed model of this chapter: Can be applied on both binary and non-binary data; provides more SIMs (such as implicative intensity, entropic version of implicative intensity, cohesion) to make the recommendation; and enables to combine one of the above measure with the contribution measure to improve the accuracy of RSs

2.1 Statistical implicative rules based model - SIR

The statistical implicative rules based model SIR is developed

on SIMs and association rules The proposed model SIR is shown

in Figure 2.1 This model consists of:

- A finite set of users 𝑈 = {𝑢1, 𝑢2, … , 𝑢𝑛}

- A finite set of items (e.g products, movies, etc.) 𝐼 = {𝑖1,

𝑖2, … , 𝑖𝑚}

- A rating matrix 𝑅 = (𝑟𝑗𝑘)𝑛x𝑚 where 𝑗 = 1 𝑛 and 𝑘 =

1 𝑚 to be used for storing the feedback (ratings) of users on items In binary form, 𝑟𝑗𝑘= 1 if user 𝑢𝑗 likes the item 𝑖𝑘 and

𝑟𝑗𝑘= 0 (or 𝑁𝐴) if 𝑢𝑗 does not like/know 𝑖𝑘 In non-binary form,

𝑟𝑗𝑘∈ [0,1] if 𝑢𝑗 rates 𝑖𝑘 and 𝑟𝑗𝑘= 𝑁𝐴 if 𝑢𝑗 does not rate/know

𝑖𝑘

- A vector 𝑅𝑢𝑎storing the known ratings of the user 𝑢𝑎 who needs the recommendation 𝑅𝑢𝑎 = {𝑟𝑢𝑎𝑘} where 𝑘 = 1, 𝑚̅̅̅̅̅̅; in which, 𝑟𝑢𝑎𝑘= 𝑁𝐴 if 𝑢𝑎 does not rate 𝑖𝑘

Trang 9

Figure 2.1: The statistical implicative rules based model

To reduce the recommendation time, the SIR model in Figure 2.1 is improved by combining the follows simultaneously (directly): Generating association rules, presenting those rules by the set of four values {𝑛, 𝑛𝑎, 𝑛𝑏, 𝑛𝑎𝑏̅}, calculating the implicative value of those rules according to a specific SIM We can solve this problem by using and modifying the rchic package

(𝑢 𝑎 , I, 𝑅 𝑢𝑎) (U, I, R)

Support threshold s

Confidence threshold c

Implicative intensity, Entropic version of implicative Cohesion measure

Maximum length of a rule l {𝑎 → 𝑏 | 𝑎 ∈ 𝐼 𝑘 , 𝑏 ∈ 𝐼, 𝑘 = 1, 𝑙 − 1̅̅̅̅̅̅̅̅̅}

The ruleset is presented by the statistical implicative analysis method

Trang 10

2.2 Operation of the statistical implicative rules based model

The operation of SIR model includes two stages: Building the filtered ruleset presented according to the SIA method; and performing the recommendation as shown in Figure 2.2 To reduce the recommendation time, we can pre-built the learning model (offline)

Figure 2.2: The operational diagram of the SIR model 2.3 Experiment

2.3.1 Data and tool

Three data sets used for the experiment are MSWeb, MovieLens and DKHP (course registration) In which, MSWeb

Presenting rules according to SIA

The list of top N items

u a {i 1 , i 13 ,…, i m-2 }

ing items with the highest implicative values

Recommend-Making recommendation (online)

Trang 11

and DKHP are binary datasets and MovieLens is a non-binary dataset

We developed the Interestingnesslab tool to conduct the experimental scenarios Besides, some recommendation models

of the recommenderlab package are used for comparing with the SIR model These models are: The association rule based on model (AR); the item based collaborative filtering model (IBCF) using Jaccard measure; the popular model (POPULAR) The experimental scenarios are run on the computers with the following configurations: (1) Window 8 OS, 16 GB RAM, and Intel Pentium G630 2.7GHz processor; and (2) Windows 10 OS,

8 GB RAM, and Intel Core i5-6200U 2.5GHz CPU processor

2.3.2 Evaluating the SIR model on binary data

The accuracy of the SIR model is compared with that of some existing models by the 5-folds cross validation method and the classification accuracy metrics (via Precision - Recall curve, ROC curve and the F1 measure combining the precision and the recall) The experimental results show that:

- The simultaneous combination of steps at the learning stage (in the improved SIR model) reduces the recommendation time

- The accuracy of SIR model is the highest when the entropic version of implicative intensity and the contribution measure are combined together to make the recommendation

- The accuracy of the SIR model combining the entropic version of implicative intensity and the contribution measure is higher than that of the compared recommendation models (AR, POPULAR, IBCF); Especially, when the user requiring the recommendation is not a new user (i.e the number of items that were rated by that user, the number of known ratings, is not too low)

Trang 12

2.3.2 Evaluating the SIR model on non-binary data

- The accuracy of SIR model is the highest when (1) the entropic version of implicative intensity and the contribution measure are combined together and the user does require many recommended items In reality, the user will be confused by a lot

of items to be recommended

- The accuracy of SIR model is higher than that of POPULAR

- a recommendation model based on the most popular items

2.4 Conclusion

Chapter 2 proposes the statistical implicative rules based model SIR applied on both binary and non-binary data; and improves the proposed model to reduce the recommendation time The ruleset represented by a set of four values can be pre-built offline and used online when someone needs recommendation The SIR model provides many SIMs and can be expanded by providing other objective interestingness measures The SIR model is coded and integrated in the Interestingnesslab tool The accuracy of SIR model is evaluated: By the classification accuracy metrics such as ROC curve, Precision - Recall curve and F1 measure; on two types of data: Binary (MSWeb, DKHP) and non-binary (MovieLens); according to two groups of scenarios: Internal comparison (using the same SIR model but the different SIMs) and external comparison (the SIR model and some existing recommendation models: AR, POPULAR and IBCF) The experimental results show that the SIR model should: (1) combine the entropic version of implicative intensity with the contribution measure to make the recommendation; (2) be used

to build RSs because the accuracy of SIR model is higher than that of compared models

Trang 13

CHAPTER 3 RECOMMENDATION BASED ON USERS

IMPLICATIVE RATING MEASURE

The SIR model of Chapter 2 uses the association rules and SIMs to recommend the list of good items to users When the number of rules is too large, the SIR model and the existing models - also based the SIA and the association rules - have to face some disadvantages: The recommendation time may be long

if the learning stage is performed online; and the computer may

be overloaded Therefore, the thesis takes attention to the rules with length of 2 to overcome those disadvantages Besides, the

item 𝑖 maybe similar to the ratings given to 𝑖 by the nearest users

the relationship of 𝑢𝑎 and his/her nearest user 𝑢𝑗 As a result, the thesis combines the above characteristics to improve the accuracy of recommendation

3.1 KnnUIR Definition

The k nearest neighbors (i.e users) based implicative rating measure 𝐾𝑛𝑛𝑈𝐼𝑅 is proposed to predict the rating given by a user 𝑢𝑎 for an item 𝑖 ∈ 𝐼 The purpose of this proposal is to increase the recommendation accuracy 𝐾𝑛𝑛𝑈𝐼𝑅 - defined by (3.1) - is based on: (1) the number of nearest users of 𝑢𝑎 - 𝑘𝑛𝑛 (the nearest neighbors 𝑢𝑗 are identified by the implicative intensities of 𝑢𝑎 and 𝑢𝑗); (2) the ratings of item 𝑖 that were rated

by those neighbors - 𝑟𝑢𝑗𝑖; (3) the typicality of 𝑖 contributing to the relationship of 𝑢𝑎 and 𝑢𝑗 - 𝛾(𝑖, 𝑢𝑎→ 𝑢𝑗) The value of

Trang 14

𝐾𝑛𝑛𝑈𝐼𝑅(𝑢𝑎, 𝑖) has to be transformed to the range [0, 1] - the same scale as elements of rating matrix

𝑘𝑛𝑛 𝑗=1

(3.1)

3.2 Users implicative rating based model - UIR

The users implicative rating based model UIR is developed

by using the proposed KnnUIR measure and the user based collaborative filtering method The UIR model shown in Figure 3.1 has the same components as the SIR model However, this UIR model not only predicts the rating given by a user to an item but also recommends the list of top items to a user

Figure 3.1: The users implicative rating based model 3.3 Operation of the users implicative rating based model

The operational diagram of the UIR model is presented in Figure 3.2

(𝑢𝑎, I, 𝑅𝑢𝑎) (U, I, R)

Định dạng
Số trang	26
Dung lượng	1,07 MB