Community learning in location based social networks 4

To do so, we first propose a Bayesianapproach to extract the latent social dimensions for users based on their local behav-iors.. To tacklethe sparseness problem and handle time-dependen

Trang 1

Rank Visualization Profile

Trang 2

4.9 Top Five Communities in New York City

In this section, we show the top five communities detected in New York City Wemanually observe and derive the community profiles from the prominent venuecategories, tip topics and photo concepts of the communities

Trang 3

4.10 Summary

In this chapter, we investigated the problem of community understanding in SNs We proposed a novel and unified framework which models the heterogenousentities and interactions by constructing a heterogenous, non-uniform hypergraph

LB-We then formulated it as a problem to detect dense subgraph over hypergraph,where constraints were added to ensure the interpretability of the detected commu-nities We then proposed an efficient procedure to solve the optimization problem.Extensive experiments have been performed both qualitatively and quantitatively

to verify our proposed approach Meaningful and interpretable communities weredetected in an optimal way while interesting culture differences were revealed byanalyzing the communities in Singapore and New York City

There are a few interesting aspects worth further exploration First, thetime-dependent users’ behaviors allow interest communities to be detected andunderstood in a timely manner For example, it is interesting to mine and profiledifferent interest groups, which are active during different time periods Second,users often participate at various social networks The aggregation of user behaviorsacross multiple sources is expected to lead to more accurate and timely communitiesthat enrich community understanding

Trang 5

In this chapter, we study the problem of community matching in LBSNsacross geographical regions in the context of generating personalized recommenda-tions of locally interesting venues to tourists To do so, we first propose a Bayesianapproach to extract the latent social dimensions for users based on their local behav-iors We then match users’ preferences across geographical regions based on latentglobal interest factors In the experiments, we both validate the quality of the ex-tracted latent social dimensions and the community matching across geographicalregions in the recommendation frameworks.

The rest of the chapter is organized as follows Section 5.1 reiterates themotivation of community matching across geographical regions by describing a real

Trang 6

world application scenario and the use of LBSNs as a solution and discusses thechallenges of the problem Section 5.2 reviews related works, which have not beencovered in the literature review in Chapter 2, and yet are related to the problemand techniques described in this chapter Section 5.3 gives an overview of the pro-posed framework, while Section 5.4 formally defines the problem Sections 5.5 and5.6 detail social dimensions extraction and recommendations generation, respec-tively Section 5.7 reports the experimental results Finally, Section 5.8 gives theconcluding remarks.

When we travel to new places, in additional to sightseeing, we are often interested

in exploring local cultures, which match our personal interests, such as samplinglocal cuisines, understanding local customs, and visiting shops selling local specialitems, etc However, there exists a large gap between what we want and whatare provided by the dominant tourism resources, such as the Wikitravel, LonelyPlanet and official tourism boards of certain countries, such as YourSingapore1 andAustraliaTravel2 The gap is caused by two main reasons First, these sites mainlyprovide information of famous attractions or popular local landmarks instead oflocally interesting places However, many tourists may want to experience localcultures that match their interests in terms of local food, events and shops Theselocally interesting places or activities may not be famous enough to be included

in these tourism resources Second, they generate user-independent contents whilepeople usually have drastically different personal preferences in reality For exam-ple, people who love shopping may want to visit more popular local shops whilefood lovers are more interested in sampling different kinds of local foods, such as

1

http://www.yoursingapore.com/

2

http://www.australiatravel.com/

Trang 7

the local foods in Shilin Night Market in Taipei.

On the other hand, rich location data at fine-grained level is now availablefrom the recently emerging LBSNs They are becoming more and more popularthanks to the recent availability of open mobile platforms, which makes LBSNsmuch more accessible to mobile users These LBSNs are able to provide sufficientresources to bridge the aforementioned gap First, they allow users to voluntarilyannotate the real world with check-ins which indicate the specific times that theusers were at particular locations In addition, LBSNs provide “location-specificdata”, in which users may check in at nearly the same geographical coordinates but

at very different venues For example, users can check in at a cinema or a restaurant

in the same shopping mall where both venues share the same geographical nates In contrast, cell phone data provides coarse location accuracy and cannotdifferentiate users’ presence across different floors in the same building The ac-tive participation of Foursquare users and the fine-grained venue annotations makepersonalized recommendation of locally interesting venues possible

coordi-Collaborative filtering (CF) based approaches [48, 58] seem to be the ble solutions to this problem as demonstrated by their great successes in commercialapplications, such as Amazon [69], Netflix [11], Tivo [2] and eBay [152] and research

plausi-on point-of-interest (POI) recommendatiplausi-ons [150, 24, 156, 151] These approachesautomatically generate recommended items of a user using known preferences ofother users or known items preferred by the target user However, CF-based al-gorithms, being memory or model based, require sufficient overlaps among users

in terms of items rated so that the correspondences among users or items can bereadily identified In LBSNs, however, users usually visit venues that are within asmall geographical distance apart from their homes [25, 27], which makes it hard

if not impossible to correlate users if they visit a very different set of venues withlittle/no overlap Let’s consider the user-venue matrix shown in Table 5.1 where

Trang 8

Table 5.1: User-Venue Matrix (Values indicate number of visits).

be fixed at different times However, users’ visiting behaviours often evolve overtime [97] and exhibit strong temporal patterns, such as daily/weekly patterns andperiodic property [25] For example, people perform more check-ins at restaurantsduring meal time and visit shops mostly during weekend and weekday evenings.Hence, it requires an effective way to incorporate the temporal information

Trang 9

5.2 Related Work

The problem that we are investigating and the techniques we propose are related tofour research areas, namely mobility prediction, location recommendation, travelrecommendation and latent factor models

5.2.1 location Prediction

Location prediction based on cellular network traces has recently spurred lots ofattention in the mobile computing [49, 154, 113, 43, 138] The various proposedmobility models aim to provide an accurate prediction of individual’s future lo-cation, which is an essential requirement for various mobile applications, such ashome heating control [115], urban planning [105, 20], mobile advertising [8, 7] anddemographic prediction [16] The basic concept in this research line is to compare

a current pattern with historical data and to extract similar patterns for predictingthe next location Different from the objective of location prediction, the objec-tive of location recommendation reviewed in Section 5.2.2 aims to recommend newlocations to users to widen their choices though they adopt the similar evaluationstrategies In addition, location semantics are usually not readily available in themobile phone data while venues in user-generated data come with rich annotations

in various aspects, such as categories, comments, photos, etc

5.2.2 Location Recommendation

The recent boom of LBSNs have motivated emerging research on point-of-interest(POI) or more generally location recommendations [150, 151, 156] Location rec-ommendation aims to recommend a list of POIs or locations to a user based on theuser’s past visiting histories These lines of work usually focus on general recommen-

dation tasks in a traditional CF framework For example, Ye et al compared the

Trang 10

influences from user similarity that based on historical behavior, geographical

dis-tance and friend network in POI recommendation task [150] Ying et al proposed

to consider both user preferences and location properties in their recommendation

framework [151] Recently, Zhou et al studied and compared the performances of

different CF recommenders, including user-based, item-based and probabilistic tent semantic analysis in location recommendation [156], where they reported thatthe probabilistic approach gives the optimum performance There are two maindifferences between our work and these related work: (1) we study a new problemwhich aims to provide tourists with recommendations based on their local visits;and (2) none of these work has studied the effects of simultaneously consideringtime, social relations and venue similarities

la-5.2.3 Travel Recommendation

In Web 2.0 communities, people often share their traveling experience in blogs, rums and social networks in terms of travelogues, photos, etc These geo-referencedmedia resources contain rich information of tourism, which motivates research on

fo-generating travel recommendations from these user generated contents Hao et al.

proposed a location-topic model to model travelogue documents and develop a tourdestination recommendation [57] To recommend a destination, a user needs toissue an query and then the system utilizes the topic model to select a destination

with the highest matching score Cheng et al leveraged community-contributed

photos from Flickr to provide personalized travel recommendation based on ple’s attributes, such as gender, race and age in a probabilistic Bayesian learning

peo-framework [23] More recently, Lucchese et al proposed an interactive random

walk approach for personalized recommendations of touristy places based on theknowledge mined from Flickr and Wikipedia [75] While these works all aim toprovide personalized recommendations of touristy points based on users’ past be-

Trang 11

haviors, our work focus on recommending locally interesting venues and aim tosolve a problem of cross region recommendation In addition, we utilize user gen-erated location contents in LBSNs, which better connect the physical world withthe online virtual world.

5.2.4 Latent Factor Models

Latent factor models are shown to be promising in recommendation tasks such asNetflix competition [10], results diversification [118], review helpfulness prediction[85] and web site recommendations [76] The underlying assumption of using latentfactor models is that the entities, such as the users and items (venues, reviews,products, etc) can be modeled by a set of latent representations, which togetherdetermine the preferences of unknown items in a probabilistic way For example,[85] proposed a series of increasingly sophisticated probabilistic graphical modelsbased on tensor factorization and showed their effectiveness in the prediction of

review helpfulness Recent work by Cheng et al has shown a positive influence by

introducing social regularization in POI recommendations performed on Gowalla[24] Our proposed framework differs from these efforts in two main aspects: (1)the framework considers temporal changes of users’ preferences and heterogeneousintra/inter entity relations in a unified manner; and (2) we derive a Bayesian treat-ment to sample latent factors, which both avoids overfitting and tedious parametertuning

In this chapter, we aim to investigate community matching across geographicalregions in LBSNs with the aim to provide tourists with personalized destinationrecommendations leveraging on rich user generated location contents Besides, we

Trang 12

identify locally interesting venues to be those frequently visited by local people but obscure to most foreigners We make use of these digital footprints [46] to under-

stand collective local user behaviors and then provide venue recommendations totourists from a global understanding of cross region communities’ matching Fig-ure 5.1 shows the overall framework which consists of four components To tacklethe sparseness problem and handle time-dependent varied behaviours, we propose

to first extract users’ latent social dimensions [127] to capture users’ preferences

according to their local check-ins at different times, social relations and similaritiesamong the visited venues Social dimensions reflect users’ latent drives of theirsocial behaviors and each dimension represents a plausible interest community a-mong users To accomplish this subtask, we propose a novel framework namedBayesian probabilistic tensor factorization with social and location regularization(BPTFSLR) that puts users’ visiting behaviors, social relations and venue simi-larities into a unified framework Second, we mine local interest communities ineach geographical region using adaptive affinity propagation Third, we representeach local community using global properties, such as venue categories and time

of visits according to the aggregated behaviors of community members Fourth,

we correlate communities at different geographical regions to generate personalizedrecommendations of locally interesting venues to tourists By conducting experi-ments on a representative real-world dataset, we demonstrate that our proposedscheme is effective in generating personalized recommendations in the local setting

Trang 13

6RFLDO'LPHQVLRQ([WUDFWLRQ /RFDO&RPPXQLW\'HWHFWLRQ

Table 5.3: List of notations of variables used in Chapter 4

Ug the set of users in geographical region g

Vg the set of venues in geographical region g

Continued on next page

Trang 14

Table 5.3 – continued from previous page

T the set of location-independent time periods

Cg the set of check-ins in geographical region g

ugi the ith user in geographical region g

vgi the ith venue in geographical region g

ti the ith location-independent time period(ugi, vgj, tk) the i’s user visits the jth venue in geograph-

ical region g during the kth time period

Gg the undirected social network graph in

geo-graphical region g

Eg

1 the edge set representing the social relations

between users in geographical region g

Rg the adjacency matrix representing the social

relations between users in geographical gion g

re-Hg the undirected social network graph in

geo-graphical region g

Eg

2 the edge set representing the affinity

relation-s between venuerelation-s in geographical region g

Bg the adjacency matrix representing the

affin-ity relations between venues in geographicalregion g

BPMF bayesian probabilistic matrix factorization

Q user× venue matrix, where Qij is the

pref-erence of the i towards the j venue

Trang 15

D the latent dimension

U the collection of latent social dimensions of

users

ui a D-dimensional latent social dimensions for

the ith user

V the latent venue feature matrix

vi a D-dimensional latent feature vector for the

ith venueMCMC markov chain monte carlo

Q user× venue × time tensor, where Qk

ij is thepreference of the ith user towards the jthvenue during the kth time period

T the number of different location independent

time periods

T the collection of latent feature vectors of time

periods

ti a D-dimensional latent feature vector for the

ith time periodBPTF bayesian probabilistic tensor factorization

α the user similarity tradeoff parameter

Fi the friend set of the ith user

wi the word-frequency vector of the ith venue

Z the size of the vocabulary in the tips’ corpus

S the auxiliary user factor feature matrix

Trang 16

D the auxiliary venue factor feature matrix

BPTFSLR bayesian probabilistic tensor factorization

with social and location regularization

AP affinity propagation

AAP adaptive affinity propagation

Ci the community representation at

geographi-cal region i

A the collection of latent community interest

factors

X the sparse community representations

Problem Statement: Let Ug = {ug1,· · · , ugN g} be a set of users and Vg ={v1g,· · · , vMg g} be a set of venues in geographical region g Let T = {t1,· · · , tT}

be a set of location-independent time periods We define a set of check-ins Cg ={cg1,· · · , cg

q g}, where each check-in is a tuple: (ugi, vjg, tk) indicating that user ugivisits venue vjg at time tk in region g Let Gg = (Ug, Eg1) be the undirected socialnetwork graph in region g, where Eg1 represents the social relations between user-

s in region g We then define the corresponding adjacency matrix Rg ∈ RN ×N,where Rgri is the strength of the social relation between users r and i in region g.Let Hg = (Vg, Eg2) be the undirected venue relation graph in region g We nextdefine the corresponding adjacency matrix Bg ∈ RM ×M, where Bjlg represents thevenue similarity between venues j and l in region g Given Cg, Gg, Hg and T,where g = 1, 2,· · · , our aim is to recommend a list of locally interesting venues{va

Trang 17

region a, where a is geographically different from b, La is the number of locallyinteresting venues in a, and Nb is the number of users in b.

In LBSNs, users exhibit heterogenous visiting behaviors, which naturally classifythem into different interest groups, such as food lovers, shoppers, etc In addition,even within the similar interest groups, people exhibit different preferences Forexample, sports lovers may have different exercising preferences in terms of venuesand times: some prefer jogging in the morning in their neighbourhoods; somelike to exercise during weekends in nature parks and the others may prefer toexercise in the gyms after work The inherent heterogenous user preferences make

it hard to interpret the connections between people in social networks Towards

gaining insights on the underlying users’ interests, Tang et al formally defined

social dimensions of each user with each dimension representing a latent affiliationamong users in order to approximate direct differentiating connections [127] Inthis section, we present a unified framework for effective extraction of latent socialdimensions for each user by simultaneously considering temporal factors and variousrelations among different entities

5.5.1 Matrix Factorization Model

A simple approach to extract the latent social dimensions is to use probabilisticmatrix factorization (PMF) [84], where the underlying assumption is that bothusers and venues can be modeled by a set of latent representations Let Q∈ RN×M

be the user× venue matrix, where Qij is the preference of user i towards venue jand is computed based on the number of times i visits j as follows

Trang 18

of corresponding latent features, i.e Qij ≈ uT

i vj The conditional probability ofthe observed preferences is defined as:

p(Q|U, V) =

NYi=1

MYj=1

p(U) =

NYi=1

N (ui|0, σ2

UI), p(V) =

MYj=1

U,V p(U, V|Q) = arg max

U,V p(U)p(V)p(Q|U, V) (5.4)

It turns out that the learning procedure corresponds to the following weightedregularized matrix factorization:

Trang 19

U∗, V∗ = arg min

U,V

12

NXi=1

MXj=1

(Qij − uT

i vj)2+ λU

2

NXi=1

kuik22+ λV

2

MXj=1

5.5.2 Tensor Factorization Model

The previous approach assumes that visiting preferences are fixed at

differen-t differen-times However, differen-time facdifferen-tors are sdifferen-trong drives which inherendifferen-tly direcdifferen-t users’movements and users’ visiting behaviors exhibit significantly different temporalpatterns in the real world [33, 25] The visiting preferences are affected by t-

wo temporal aspects First, users visit different venues at different time of theday For example, people often visit food courts or restaurants during meal timesand watch movies during the evening on Friday and weekends Second, user-

s exhibit different lifestyles in weekdays and weekends Noulas et al

report-ed a drastic differences among types of venues visitreport-ed at weekdays and ends [97] To bring in the time factors, we employ probabilistic tensor factor-ization (PTF) to model the time-evolving preferences [148] With the introduc-tion of time factors, the user × venue two-dimensional matrix is converted intothe user× venue × time three-dimensional tensor We consider splitting users’visiting times into eight periods: {morning (5am − 11am), afternoon (12pm −18pm), evening (19pm − 23pm), night (12am − 4am)} × {weekday, weekend}

Trang 20

week-Extended from the relational data in matrix factorization model, let Q ∈

RN×M ×T be the user× venue × time tensor, where Qk

ij is the preference of user itowards venue j at time k and can be computed based on the number of times ivisits j at k as follows

Qk

ij = c

k(i, j)PM

j′=1ck(i, j′

), k = 1, 2,· · · , T, (5.6)where ck(i, j) is the number of times user i visits venue j at time k Extending theidea of PMF, we can approximate Qk

ij with the inner-product of three D-dimensionalvectors:

Qkij ≈ hui, vj, tki =

DXd=1

where tk is the additional latent feature vector for the kth time factor Intuitively,

Eq (5.7) makes the visiting preferences not only depend on how similar a user’spreferences and a venue’s preferences are, but also on how much these preferencesmatch with the current crowd behaviors which are reflected by the time factors

We then extend the conditional probability of the observed preferences as:

p(Q|U, V, T) =

NYi=1

MYj=1

TYk=1

To avoid overfitting, similarly, we impose zero-mean, independent Gaussianpriors on user and venue latent vectors as before Following [148], we assume thatthe time factors change smoothly over time and depends only on their immediatepredecessor where we also assume that the Markov property holds Thus, theconditional prior for T and the initial time feature vector t0 are defined as:

P (tk) =N (tk−1, σT2I), P (t0) =N (µT, σ20I) (5.9)

We can maximize the log-posterior over U, V, T as follows:

Trang 21

U∗, V∗, T∗ = arg max

U,V,Tp(U, V, T|Q) = arg max

U,V,Tp(U, V, T)p(Q|U, V, T) (5.10)With the independence assumption, after mathematical derivations, the optimiza-tion problem becomes:

U∗, V∗, T∗ = arg minU,V,T12 PNi=1PMj=1PTk=1Qk

ij − hui, vj, tki2+λU

2

PNi=1kuik22+λV

2

PMj=1kvjk22+ λT

2

PTk=1ktk− tk−1k22+λ0

2 kt0− µTk22, (5.11)where λU = (τQσ2

U)−1, λV = (τQσ2

V)−1, λT = (τQσ2

T)−1 and λ0 = (τQσ2

0)−1 Wecan adopt the same stochastic gradient descent approach to find local minimums ofthis non-convex optimization problem Similarly, we can also adopt the Bayesiantreatment and use MCMC methods to obtain the posterior distribution of users’latent social dimensions [148] However, PTF does not take users’ social relationsand venue similarities into considerations

5.5.3 Regularized Tensor Factorization

The formulation in Section 5.5.2 has considered the temporal variations of users’visiting behaviours In this section, we further extend the previous formulation bysimultaneously considering the social ties and inter-venue similarities in LBSNs inorder to achieve more accurate extraction of users’ social dimensions

5.5.3.1 Social Relation

Intuitively, “friends” tend to have similar behaviours and preferences For example,

a group of friends may often visit the same restaurants for gathering or hang out towatch movies together A user may also visit certain places which are recommended

by his/her friends These suggest that it is useful to consider social ties to bring

Trang 22

“friends” closer to each other in the latent space Following [150], we considertwo factors when relating users in LBSNs First, friends who have more commonfriends may have better trust in terms of their recommendations, thus we considerthe overlapping levels of their friend sets Second, friends sharing more check-insshould have more similar tastes, thus we consider the overlapping levels of theircheck-in sets.

We define the user similarity as follows Given the user set U ={u1,· · · , uN},their friends set {F1,· · · , FN} and their check-ins set {V1,· · · , VN}, we introduce

α ∈ [0, 1] as a tuning parameter and define the user similarity matrix R ∈ RN ×N,where Rri is computed as follows:

en-at venues For example, a tip left en-at an art museum may recommend a specialexhibition or give positive/negative comments on the museum environment Weargue that tips sometimes provide better evidences than categories to describe v-enues For example, during the examination reading weeks, venues such as libraries,school canteens, study rooms and Starbucks in universities, though belong to dif-ferent categories, tend to have similar social functions: places for preparing exams.Thus, we seek to model venue similarities using the associated tips

We aggregate all tips of a venue and perform the following steps to filter thenoise and reduce the feature space:

• We tokenize text descriptions and put them into lowercase

Trang 23

• We remove all the non-alphanumeric characters.

• We remove rare terms (terms with frequency< 5)

Then, the text descriptions for each venue vj are represented as a frequency vector wj = [wj(1)· · · wj(Z)], where wj(b) denotes the frequency of term

word-b in the text descriptions of venue vj and Z is the vocabulary size We then definethe venue similarity matrix B∈ RM×M where Bjl = wj ·w l

|w j |·|w l |.5.5.3.3 The Complete Formulation

With the introduction of user relations and venue similarities, we now present thecomplete formulation Let S∈ RD×N be the auxiliary user factor feature matrix and

D ∈ RD×M be the auxiliary venue factor feature matrix We have the conditionaldistribution of user and venue similarities as follows

p(R|S, U) =

NYr=1

NYi=1

MYl=1

Trang 24

Maximizing the log posterior with respect to U, V, S, D, T is equivalent to mizing the following objective function with quadratic regularization terms:

mini-L(U, V, S, D, T) = 1

2

NXi=1

MXj=1

TXk=1

NXi=1

MXl=1

kdlk22+λT

2

TXk=1

kuik22+ λS

2

NXr=1

ksrk22+λV

2

MXj=1

U)−1, λS = (τQσ2

S)−1 and λV = (τQσ2

V)−1.The objective function is non-convex, and we may only be able to find a localminimum by iteratively updating the latent feature vectors using methods such asthe stochastic gradient descent One issue with this approach is parameter-tuning.Since there are eight of them, the usual approach of parameter selection, such ascross-validation is infeasible even for a problem of moderate size Here, in the spirit

of [148], we seek a full Bayesian treatment to average out the hyperparameters inthe model, which both help to alleviate overfitting and save us from the painfulproblem of parameter tuning

5.5.3.4 Learning By Markov Chain Monte Carlo

The fully Bayesian treatment integrates all model parameters and hyperparametersand arrives at a predictive distribution of future observations, given the previousobserved data Since this predictive distribution is obtained by averaging all models

Trang 25

ij can then be generated according to Eq (5.13), Eq (5.14) and

Eq (5.8) respectively Figure 5.2 shows the graphical model of the entire process

The key ingredient of the fully Bayesian treatment is to view the rameters: τQ, τR, τB, ΘU ≡ {µU, ΛU}, ΘV ≡ {µV, ΛV}, ΘS ≡ {µS, ΛS}, ΘD ≡{µD, ΛD} and ΘT ≡ {µT, ΛT} as random variables as showed in Figure 5.2 Wechoose the prior distributions for the hyperparameters as follows:

Trang 26

whereW(·|V, n) is a Wishart distribution of a D×D random matrix with n degrees

of freedom and a scale matrix V ∈ RD×D The hyperpriors are: W0, W1, β, v0,v1, µ0 and µ1, which reflect the prior knowledge about the specific problem Inthe Bayesian paradigm, they have little impact on the final predictions as reported

in [4] Next, the prior distributions for U, V, S, D are assumed to be Gaussian asbefore However, the mean and the precision matrix may take arbitrary values:

p(ui) =N (µU, Λ−1U ),i = 1,· · · , N,p(vj) = N (µV, Λ−1V ),j = 1,· · · , M,p(sr) =N (µS, Λ−1S ),r = 1,· · · , N,p(dl) =N (µD, Λ−1D ),l = 1,· · · , M

(5.18)

For the time feature vectors, we make the same Markov assumption and considerthe priors:

p(tk) =N (tk−1, Λ−1T ), k = 2,· · · , T, p(t1) =N (µT, Λ−1T ) (5.19)There are a few different classes of MCMC Here we adopt Gibbs sampling [44],where the target random variables: U, V, S, D, T are decomposed into severalblocks and at each iteration a block of random variables is sampled while all the

Trang 27

others are fixed until the process converges The outline of the sampling procedure

is shown in Algorithm 3 where the explicit updated conditional distributions ofhyperparameters and model parameters are described in Appendix A

The extracted latent social dimensions of each user are expected to reveal an lying partition of local interest groups at each region Local interest communitiesmay exhibit unique local behaviors For example, some users in Singapore mayoften visit Crystal Jade - La Mian Xiao Long Bao while some users in New YorkCity may frequently go to Congee Village While, at the global level, these verydifferent communities are expected to correlate well with each other: where in theprevious example, people in the two groups belong to the same global interest group

under-of Chinese food lovers The correlations between communities then give clues oncorrelations between people across geographical regions In this section, we describehow we make use of users’ social dimensions to provide venue recommendations in

a city other than users’ home city by using cross region community matching

5.6.1 Local Community Profiling

With the extracted users’ underlying social dimensions, we seek to first group themaccording to their latent interests at the regional level There are several approach-

es to detecting communities or dense subgroups, such as clustering or communitydetection However, we do not know the number of communities in a region before-hand Also the number of interest communities may vary across different regions,where there may be more communities in big cities where people exhibit morevaried kinds of behaviors whereas there can be very few communities in small-

er counties where people may have more homogeneous life patterns To alleviate

Trang 28

Algorithm 3 Gibbs sampling for BPTFSLR.

1: Input: Q: The user-venue-time tensor, R: The social relation matrix, B: Thevenue similarity matrix, n: The maximum number of iterations

2: Output: U: The latent social dimensions

3: Initialize model parameters{U(1), V(1), S(1), D(1), T(1)}

4: for a = 1 to n do{/*Sample the hyperparameters*/ }

5: τQ(a) ∼ p(τQ|Q, U(a), V(a), T(a)) (Eq (A.2)),

6: τR(a) ∼ p(τR|R, U(a), S(a)) (Eq (A.3)),

7: τB(a) ∼ p(τB|B, V(a), D(a)) (Eq (A.5)), Θ(a)U ∼ p(ΘU|U(a)) (Eq (A.7)),

8: Θ(a)V ∼ p(ΘV|V(a)) (Eq (A.9)), Θ(a)S ∼ p(ΘS|S(a)) (Eq (A.11)),

9: Θ(a)D ∼ p(ΘD|D(a)) (Eq (A.13)), Θ(a)T ∼ p(Θ(a)T |T(a)) (Eq (A.14))

10: for i = 1 to N do {/*Sample the users’ social dimensions*/ }

11: u(a+1)i ∼ p(ui|Q, V(a), T(a), R, S(a), τQ(a), τR(a), Θ(a)U ) (Eq (A.17))

12: end for

13: for j = 1 to M do {/*Sample the venues’ latent representations*/ }

14: vj(a+1)∼ p(vj|Q, U(a+1), T(a), B, D(a), τQ(a), τB(a), Θ(a)V ) (Eq (A.18))

15: end for

16: for r = 1 to N do {/*Sample the auxiliary user factors*/ }

17: s(a+1)r ∼ p(sr|R, U(a+1), τR(a), Θ(a)S ) (Eq (A.19))

18: end for

19: for l = 1 to M do {/*Sample the auxiliary venue factors*/ }

20: d(a+1)l ∼ p(dl|B, V(a+1), τB(a), Θ(a)D ) (Eq (A.20))

Trang 29

this problem, we resort to affinity propagation (AP), which operates by first multaneously considering all users as potential community centres and then keepexchanging messages among them until a good set of communities emerges [39] Inthis work, to avoid parameter tuning, we use adaptive affinity propagating (AAP)which improves AP by automatically adjusting the damping factor and preferenceduring the learning process [139].

si-Given the set of interest communities detected in each geographical region,

we aim to understand and represent each community by means of its group files [129] so that the correspondences between communities at different regionscan be readily created According to the concept of Homophily, connections occur

pro-at higher rpro-ate between similar people than dissimilar people, which make it sible to profile each group using attributes of its group members We utilize twoglobal properties related to the check-in behaviors First, each Foursquare venue ismapped to one or more categories depending on its social function Second, usersvisit venues at different times, which shows another dimensions related to users’behaviours To utilize these two dimensions of information, we represent each com-munity by a weighted vector, where each dimension represents a visit to a particularvenue category at a particular time period

sen-5.6.2 Venue Recommendation via Cross Region

Communi-ty Matching

While it is possible to directly compare communities located in different regionsusing traditional vector comparison metrics, in this section, we seek a more robustcommunity representation which is able to reduce the noise, which may be caused

by possible occasionally irregular users’ behaviors Let Ca∈ Rl×k a and Cb ∈ Rl×k b

be the community representations at region a and b as mentioned in section 5.6.1,respectively, where l is the dimension of community representation, ka, kb are the

Trang 30

number of local interest communities in region a and b, respectively The jointcommunity representation of communities at these two regions are then Cab =[Ca Cb]∈ Rl×(k a +k b ).

People usually have multiple interests with different strength For example,most of the tourists are interested in local food sampling and shopping but some

of them are more interested in food while others prefer to spend more time onshopping Thus, a community of people is inherently a mixture of users’ interestswith varying weights Motivated by this, we seek to learn a set of p latent interestfactors: A∈ Rl×p and generate more robust community representations on top ofthese factors

Sparse representation has been shown to be effective in noise reductions anddata compression [144] We thus adopt the non-negative matrix factorization withsparseness constraints proposed by [59] to decompose the joint community represen-tation into the latent interests factor A and the sparse community representations

X by solving the following optimization problem:

A, X = arg min

s.t sparseness(ai) = sa,∀i, (5.20b)sparseness(xi) = sx,∀i, (5.20c)where X ∈ Rp×(k a +k b ) is the sparse community representations, sa and sx are thedesired sparseness of A and X, respectively Here sparseness(·) is the sparsenessmeasure as defined in [59]

Let user i belong to community Ca

T at region a, his/her predictive preferencetowards venue j at region b can then be computed as follows:

ˆQij =X

Trang 31

Table 5.4: Properties of sampled popular regions: N is the number of active users,

ML is the number of local venues, CL is the number of local check-ins, MF is thenumber of foreign venues and CF is the number of foreign check-ins

1 The use of time factors helps to improve venue prediction accuracy

2 The social and venue regularization leads to further improvement in the ommendation performance

rec-3 Cross region community matching is able to generate relevant and accuraterecommendation list for tourists

Trang 32

(a) Total check-ins (b) Sampled check-ins

Figure 5.3: Comparison between the popular venue distributions in New York City(JS Divergence = 0.3226)

Figure 5.4: Comparison between the popular venue distributions in Chicago (JSDivergence = 0.3351)

Trang 33

Figure 5.5: Comparison between the popular venue distributions in London (JSDivergence = 0.4066)

Figure 5.6: Comparison between the popular venue distributions in Singapore (JSDivergence = 0.2841)

Trang 34

5.7.1 Dataset Reliability/Representativeness

We conduct the experiments on the dataset described in Section 3.5 and the tics of the dataset is reproduced in Table 5.4 In this section, we aim to investigatewhether the number of check-ins we obtained are reliable This is because we sam-ple the Foursquare check-ins by using Twitter streaming API, while not all usersshare their check-ins through Twitter To verify that the check-ins we sampled arereliable, we count the number of check-ins for each venue of interest and compare itwith its total number of check-ins reported by Foursquare Let M be the number

statis-of venues statis-of interest, we note that each check-in results in exactly one statis-of M sible venues with probabilities p = (Pi c(i,1)

Trang 35

User similarity tradeoff parameter: α

Figure 5.7: Illustration of user similarity tradeoff parameter tuning

• User similarity tradeoff parameter: α We tune α based on the predictionperformance using the friend-based model The predicted preference of user

i towards venue j is computed as: ˆQij =

P

performance using MAP3 for α ∈ [0, 1] with increment as 0.05 Figure 5.7shows the accuracy versus the user similarity tradeoff parameter α Theoptimal performance is achieved when α = 0.35

• Parameters for Bayesian learning: W0, v0, W1, v1, β, µ0 and µ1 These rameters reflect our prior knowledge of the specific problem Since Bayesianlearning is able to adjust them according to the training data within a rea-sonably large range, we set them without tuning, similar to that of [110] and[148] The settings are: W0 = I, v0 = D, W1 = 1, v1 = 1, β = 1, µ0 = 0 and

pa-µ1 = 1, where I is a D× D identity matrix, 0 and 1 is a D × 1 column vector

3

Please refer to section 5.7.3.1 for the definition of MAP.

Trang 36

of 0s and 1s, respectively.

• Number of samples used in Bayesian learning Considering the tradeoff tween the prediction accuracy and the computational cost, we empiricallychoose the number of samples to be 75 in this work

be-• Latent dimension: D We tune D based on the prediction performance usingBPTF on LA users’ local venue prediction We determine D = 60 consideringthe tradeoff between the prediction accuracy and the computational cost

• Parameters for AAP We empirically set the parameters as follows: (1) vergence condition: nconv = 30, (2) Initial damping factor: lam = 0.5 and(3) decreasing step of preferences: pstep = 0.01

Con-• Parameters for learning latent community representation: sa, sx and p Weempirically set sa = 0.5 and sx = 0.85 to allow moderate sparseness on the in-terest factors and high levels of sparseness on the community representations.And we empirically set p = 200

5.7.3 Quality of Latent Social Dimensions

In this section, we investigate the quality of the users’ latent social dimensions

by our proposed BPTFSLR through evaluating the venue prediction accuracy intraditional collaborative filtering settings We consider local venue predictions andaim to predict users’ future visits based on their past visits The details are asfollows For each city, we use users’ check-ins performed from August to October

2012 as training set and check-ins performed during November 2012 as testing set

We remove all check-ins cij from the testing set for user i if there is a non-zero entryfor cij in the training set This filtering ensures that the prediction task is able torecover venues that the users haven’t visited before We then exclude users who donot share any check-ins during the November period

Trang 37

In addition, we further consider two kinds of settings First, we generategeneral venue predictions without considering time factors In this setting, methodsthat consider time information aggregate venue preferences at different time periodsand provide a single ranking list with venues being sorted in the decreasing order

of computed preferences Second, we generate venue predictions for each user atdifferent time periods In this task, methods which do not consider time informationare applied on the training and testing set for each time period individually and wereport the averaged performance for all time periods We name the first setting as

“time-independent” setting and the second as “time-dependent” setting

5.7.3.1 Evaluation Metrics

The aim of venue recommendation is to provide each user a ranked list of venues

to visit Thus, instead of predicting user-venue preferences, we aim to measurethe quality of the ranked list of recommended venues against the ground truth.Similar to traditional information retrieval tasks, we use Precision@K, Recall@Kand mean average precision (MAP) to report the performance The definitions ofthese performance measures are as follows

• Precision@K = Number of correct venues in the first K results

K

• Recall@K = Number of correct venues in the first K results

Total number of correct venues

1mi





|V|

Xj=1rel(vj)

Pjk=1rel(vk)rj

5.7.3.2 Baselines

We compare the recommendation performances with the following approaches:

Trang 38

• Popularity (POP): This approach provides the same recommendation list

of venues to all users according to venues’ popularity in the training set Let

pj be the popularity of venue j, then pj =Pi∈UcTr

ij, where cTr

ij is the number

of check-ins performed by i at j in the training set

• User-based CF (UCF): A basic idea of CF is to recommend users withvenues which a group of similar users like to visit Based on user-based

CF, users’ implicit preferences may be discovered by aggregating the haviours of similar users The predicted rating on venue j by user i isˆ

qP

2 kj

and Vikcontains the venues visited by user i or user k

• SVD: We first compute the SVD decomposition of Q as Q = USVT, where

U∈ RN×k, S∈ Rk×k and V ∈ RM×k The predicted rating of user i towardsvenue j is then computed as ˆQij = (ui√siT

)· (√sivT

j) We choose k = 12 bypreliminary experiments

• Non-negative Matrix Factorization (NMF): We use algorithm described

in [116] and empirically set the latent dimension k = 15

• Bayesian Probabilistic Matrix Factorization (BPMF): The latent resentations are learned using MCMC [110] and the prediction is computed

rep-as the inner-product of the two latent factors

• Bayesian Probabilistic Tensor Factorization (BPTF): The latent resentations are learned using MCMC [148] and the prediction is computedaccording to Eq (5.7)

Trang 39

rep-5.7.3.3 Results of Local Venue Predictions

We report here the performance comparisons in the setting of local venue diction Figures 5.8, 5.9, 5.10, 5.11, 5.12, 5.13, 5.14 and 5.15 show Precision@Kand Recall@K of local venue predictions at each city We observe that methodsconsidering time factors outperform the methods that do not take time into consid-erations This verifies our postulation that users do visit different kinds of venues

at different times and is in line with the intuition BPTFSLR achieves the best diction performance in terms of precision and recall under all values of k computed,which shows the strength of considering heterogeneous inter/intra relations amongusers and venues with the unified model Similar to previous studies (e.g [150]),

pre-we observe that user-based model is a strong baseline, which beats SVD, NMF andpopularity-based approaches and it is comparable to BPMF in different values of

k The use of time factor improves the performance by an average of 2.18% inrecall and 8.8% in precision over the user-based model for Singapore users Theintroduction of social and location regularization further improves the performance

by an average of 4.67% in recall and 8.9% in precision over BPTF when consideringall values of k in venue prediction for Singapore users The corresponding improve-ments are 4.97%, 7%, 1.82%, 6.2% for New York City users, 1.02%, 6.37%, 9.57%,6.49% for Chicago users and 4.55%, 2.32%, 10.58%, 7.3% for London users Inaddition, we observe that performance for Singapore user is generally better thanthat of New York City The reason could be that most Singapore residents performsmost of check-ins in Singapore whereas New York City residents often visit areasother than New York City, which lowers the density of the training tensor

Figure 5.16 and Figure 5.17 show the MAP of all approaches for the four citiesand in two settings: time independent/dependent We observe that methods that

do not consider time factors perform badly in generating time-dependent tions since they operate on sparser datasets where only check-ins performed during

Trang 40

predic-5 10 1predic-5 20 2predic-5 30 3predic-5 40 4predic-5 predic-50 predic-5predic-5 60 6predic-5 70 7predic-5 80 8predic-5 90 9predic-51000

0.05

0.10.15

0.20.25

Figure 5.8: Recall of Local Venue Prediction in New York City (time-independent)

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 951000

Figure 5.9: Precision of Local Venue Prediction in New York City independent)

Định dạng
Số trang	88
Dung lượng	7,49 MB