Among all dominating LBSNs, Foursquare has been reported tohave the highest number of active users with most frequent users’ daily activities.Thus, we choose Foursquare as the testbed in
Trang 1Chapter 3 Overview of Dataset
Since there are no standard datasets available for research in LBSNs, we havedesigned a crawling approach to obtain a sampled, large-scale, representative andreal world dataset Among all dominating LBSNs, Foursquare has been reported tohave the highest number of active users with most frequent users’ daily activities.Thus, we choose Foursquare as the testbed in the thesis
In this chapter, we first give an overview of Foursquare in Section 3.1 andintroduce our crawling method in Section 3.2 We then present an overview of thedata structure of Foursquare dataset in Section 3.3 We next report the first-orderanalysis on the obtained sampled dataset in Section 3.4 Finally, we describe thetwo sub-datasets for evaluations in Section 3.5 and Section 3.6
Foursquare describes their service as “an application that helps you and your friendsmake the most of where you are.” It is a friend-finder, a social city guide and a gamethat challenges users to experience new things and rewards them for doing so byvarious badges As of September, 2013, there are more than 40 million Foursquare
Trang 2Photos
Add accompanying
friends
Share in Twitter and Facebook
Upload Photos Write Tips
Figure 3.1: Additional activities other than checking in in Foursquare (Screencaptured on 30th December, 2013)
users worldwide and more than 4.5 billion accumulative check-ins with millionsmore every day1
Similar to other LBSNs, Foursquare lets users check in to a place when theyrethere, tell friends where they are and track the history of where they have been andwho they have been there with When doing a check-in, Foursquare examines theusers’ current location and shows a list of nearby places Users can also registernew places Location is based on GPS hardware in the mobile device or networklocation provided by the application Each check-in awards the user points andsometimes “badges” The user who checks in the most often to a venue becomesthe “mayor” and users regularly vie for “mayorships”
Foursquare lets people connect to friends, which are equivalent to the concept
of friends on other online social networks Besides finding friends directly usingname search or importing from phone contacts or other social networks, Foursquare
1
https://foursquare.com/about
Trang 3Step 1: User sends his physical location Step 2: Foursquare sends candidate venues Step 3: User sends the venue and optionally push to Twitter
Figure 3.2: Steps of check-in activity in Foursquare
users can also add friends who are currently in the same venues The “Here Now”section in the venue page shows a list of users who are currently at the particularvenue In this way, users are able to find causal friends with similar interests
As mentioned in Chapter 1, users in Foursquare are prompted for providingadditional multimedia information together with their check-ins As Figure 3.1shows, users can upload photos and write comments about the current venue Inaddition, users can share their check-ins in Twitter or Facebook and tag friendswho are currently together with him/her
We aim to obtain a set of active users and their activities in Foursquare The tivities of interest include checking in at venues, posting tips and uploading photos,where the most popular and dominant activity is checking in Here, we seek to firstobtain users who frequently perform check-ins and then crawl the other activities
ac-of these active users
Foursquare have provided rich endpoints for data access However, to protectusers’ privacy, Foursquare limits the access to users’ check-in records, which areonly available for the current acting user Fortunately, the connection betweenLBSNs and microblogging services provides an alternative way to access such data.Figure 3.2 shows the check-in process When a user tries to check in at a certain
Trang 4Figure 3.3: Check-in page in Foursquare.
venue, he/she first sends the current exact physical location in terms of latitudeand longitude (Step 1) Then Foursquare compares the received location with theirhuge venue database and suggests a few names of places in the order of decreasinggeographical distance (Step 2) After that, the user selects one place name forhis/her current locations and sends it back to Foursquare and optionally he/shemay push the check-in information through Twitter (Step 3) In addition, Twitterprovides a glance into its millions of users and billions of tweets through a StreamingAPI2 which provides a sample of all tweets matching some keywords selected bythe API user
We monitor Twitter streams with the key words specified as “4sq.com” intwo time periods: January to March 2012 and August to November 2012 Eachsampled check-in message contains a short link (Figure 3.2) to the original check-inpage (Figure 3.3), where the details of the check-in, such as user ID, venue ID,
2
https://dev.twitter.com/docs/streaming-api
Trang 5Table 3.1: Key information retrieved for users.
ID First Name Last Name Gender Profile Photo Home City
check-in time, etc are available
When we have obtained the list of users who share their check-in activitiesthrough Twitter during the crawling periods, we are able to retrieve their otheractivities, such as tip posting3, photo uploading4 and friendship information5 usingthe Foursquare APIs
Trang 6Table 3.2: Key information retrieved for venues.
4b053 VivoCity Harbour Walk (1.26,103.8) Singapore Singapore Mall
usually unimportant venues with very few check-ins or meaningless venues
Figure 3.4: Venue category hierarchy in Foursquare (selected)
Table 3.3 lists the key information retrieved for check-ins, including user IDs,venue IDs and the times of the check-ins All times are recorded in Greenwich MeanTime (GMT), which is also referred to as the Universal Time Coordinated (UTC)
Table 3.3: Key information retrieved for check-ins
5004f 5062890 26de4 2012-09-17T05:04:46Z
Table 3.4 and Table 3.5 list the key information retrieved for tips and images,respectively We can regard tips and photos as special check-ins by enriching thecheck-ins with multimedia contents Similarly, all times are recorded in GMT
Trang 7Table 3.4: Key information retrieved for tips.
4e3474 3369312 4b59e1 2012-12-05-26T00:24:37Z Love the sky park
Table 3.5: Key information retrieved for photos
4e7d3 12250919 4a73e8 2012-08-19T03:04:46Z
In data preprocessing, we remove two kinds of suspicious check-ins First, weremove check-ins from users who have performed more than ten check-ins within aminute Second, we remove “sudden moves” where the two check-ins implies that
a user is travelling at a speed faster than 1, 000km/hour (Faster than the speed ofnormal commercial jet airplanes) In addition, we notice that certain venues aredeleted by Foursquare in the housekeeping process We remove all check-ins whichwere performed on these deleted venues
Finally, Table 3.6 shows the statistics of the sampled Foursquare dataset Weregard users’ declared “homecity” in Foursquare as users’ true home city, where weremove a user if more than 50% of his/her check-is are not in his/her declared homecity
Trang 8Table 3.6: Statistics of the Foursquare dataset.
New York City Singapore New York City Singapore Chicago London
Time distribution of the check-ins
First of all, we visualize the global distribution of sampled Foursquare venues visitedfrom January to March 2012 and August to November 2012 in Figure 3.5, wherecolors represent the popularity of venues with “red”: number of check-ins > 100,
“green”: 50≤ number of check-ins ≤ 100 and “blue”: 10 ≤ number of check-ins <
50 We see that while check-ins are globally distributed, the density of check-ins
is highest in U.S., especially in New York, where Foursquare was launched Otherhot areas include cities in West Europe, South East Asia, Japan and South Korea.Though most areas in China are currently blank, we see an increasing trend ofusing Foursquare check-in services in the east coast, especially in big cities, such asShanghai
While Figure 3.5 convey the scale and density of the sampled Foursquaredataset, we can further explore the nature of these check-ins by aggregating venuecategories across all check-ins The aggregated view in Figure 3.6 shows that the
Trang 9Figure 3.5: Global distribution of sampled Foursquare venues visited from uary to March 2012 and August to November 2012 Colors represent thepopularity of venues with “red”: number of check-ins > 100, “green”: 50 ≤number of check-ins≤ 100 and “blue”: 10 ≤ number of check-ins < 50.
Jan-most popular check-in venues are restaurants, homes, shops/stores/malls
Geographical distribution of the check-ins
Considering the temporal distribution of check-ins, we show both the aggregatedaily patterns and weekly patterns of users’s check-ins To resolve the time differ-ences across geographical regions, we first obtain the time zones of all venues byEarthTools8 and then convert each check-in time to the corresponding local timeaccording to the geographical location
Figure 3.7a shows the aggregated check-in pattern per day This patternprovides a glimpse into the globally daily “heartbeat”, with two major peaks: onearound 12pm and one around 6pm, where people are out at restaurants or foodcourts for lunch/dinner This correlate with the observation that most check-insare performed at venues, which belong to “Food” categories The similar patterns
8
http://www.earthtools.org/
Trang 10Figure 3.6: Venue category cloud for check-ins.
Saturday Friday Thursday
W ednesday Tuesday Monday
(b) Weekly check-in pattern.
Figure 3.7: Check-in patterns
were reported in [79]
Figure 3.7b shows the aggregated check-in pattern per week As expected,weekdays clearly show two peaks during lunch time and dinner time, while overthe weekends these two peaks blend, reflecting a fundamentally different weekendschedule for most Foursquare users
Finally, Figure 3.8 shows the distribution patterns of user behaviors in thesampled dataset In Figures 3.8a, 3.8b, 3.8c and 3.8d, we report the proportion
of users v.s the number of check-ins performed, the number of venues visited, thenumber of photos uploaded and the number of tips written Similar to previousreported observations, the four distributions exhibit similar trend, where only a few
Trang 11users are extremely active in various activities, while a large number of users haveonly few activities.
(d) Number of tips written per user.
Figure 3.8: Distribution patterns of user behaviors in the sampled dataset
We construct the dataset based on the data crawled from January to March 2012.The task of community understanding aims to mine communities, which are in-terpretable and exhibit clear community profiles in terms of multimedia contents.Thus we aim to select users who contribute more check-ins, tips and photos We
Trang 12Table 3.7: Dataset for Community Understanding.
Users Checkins Tips ImagesGlobal 13,068 86,302 335,877 69,510Singapore 8,736 32,156 156,761 9,775New York City 9,918 51,043 213,302 22,135
select the candidate users as follows First, we compute the activeness score foruser u as:
activeness(u) = a× C(u) + b × T (u) + c × P (u), (3.1)where C(u), T (u) and P (u) are the number of check-ins, tips and photos of u, re-spectively a, b and c adjust the importance of each contribution and we empiricallyset a = b = c = 1
3 We then select the top 80% of users ranked by the users’ ness scores at the global scale and two city scales, respectively Users at the globalscale include but not limited to users in Singapore and users in New York City.They also include active English speaking Foursquare users from other cities, such
active-as San Francisco, Chicago, London, etc Table 3.7 summarizes the datactive-aset Weregard users’ actual behaviors as the ground truth for evaluations For example, inevaluating the prediction accuracy of users’ visiting preferences, the actual venuesthat are visited by the testing users are regarded as the ground truth
We construct the dataset based on the data crawled from August to November 2012.Since our focus is to match communities across geographical regions for locallyinteresting venue recommendation to tourists, we select users from Chicago (CHI),London (LDN), New York City (NYC) and Singapore (SG) and aim to recommendvenues to them when they are in a city other than their home cities We locate
Trang 13Table 3.8: Properties of sampled popular regions: N is the number of active users,
ML is the number of local venues, CL is the number of local check-ins, MF is thenumber of foreign venues and CF is the number of foreign check-ins
9
https://developers.google.com/maps/documentation/geocoding/
Trang 15Chapter 4 Community Understanding
In LBSNs, users implicitly interact with each other by visiting places, issuing ments and/or uploading photos These heterogeneous interactions convey the latentinformation for identifying meaningful and interpretable social communities, whichexhibit unique location-oriented characteristics
com-In this chapter, we present an approach to simultaneously detect and stand interest communities in LBSNs by representing the heterogeneous interactionswith a multi-modal non-uniform hypergraph Here the vertices of the hypergraphare users, venues, comments or photos and the hyperedges characterize the k-partiteheterogeneous interactions such as posting certain comments or uploading certainphotos while visiting certain places We then view each detected social community
under-as a dense subgraph within the heterogeneous hypergraph, where the user
commu-nity is constructed by the vertices and edges in the dense subgraph and the profile1
of the community is characterized by the vertices related with venues, commentsand photos and their inter-relations We present an efficient algorithm to detec-
t the overlapped dense subgraphs, where the profile of each social community isguaranteed to be available by constraining the minimal number of vertices in each
1
The profile of a community shows the common characteristics of its community members.
Trang 16modality Extensive experiments on the selected subset of our sampled Foursquaredataset well validated the effectiveness of the proposed framework in terms of de-tecting meaningful social communities and uncovering their underlying profiles inLBSNs.
The rest of the chapter is organized as follows Section 4.1 introduces themotivation and challenges of community understanding in LBSNs Section 4.2 re-views the related work on community understanding Section 4.3 gives an overview
of the proposed framework for community understanding in LBSNs Section 4.4 tails the hypergraph construction Section 4.5 formulates the community detectionand understanding task as a dense subgraph detection over heterogeneous hyper-graph problem and introduces an effective algorithm to solve the problem Wethen report the empirical evaluation in Section 4.6, which gives both qualitativeand quantitative results Finally, Section 4.10 gives the concluding remarks
In the era of Web 2.0, social networking has emerged to be a popular way for ple to connect, communicate and share information with each other, and this hasled to an explosion of multimedia information Users in social networks interactwith each other by contributing and consuming multimedia data (photos, locations,texts, etc.) and there exist various types of objects and heterogeneous relations inthe networks Specifically, with the high penetration of GPS-enabled smart phones
peo-in recent years, we have witnessed the boom of LBSNs, where users can check peo-in
at venues, write tips and upload photos while these information can be ately disseminated via social graphs to their friends and public While tips maycover a variety of diverse topics related to venue activities and recommendations.Photos, on the other hand, visually present the interesting aspects of venues visit-
immedi-ed These location-tagged multimedia data generated by Foursquare users provide
Trang 17us unprecedented opportunities to understand the collective user behaviours on alarge scale The voluminous amount and heterogeneity of user generated contentsand fast expansion of network diameter challenge us to perform community detec-tion and community understanding on hundreds of thousands to even millions ofentities.
One of the fundamental tasks in social media network analysis is to stand human collective behaviours by identifying people’ social positions based onthe detected cohesive subgroups whose group members interact with each othermore frequently than those outside the group [47, 93, 37] While previous researchefforts have reported promising results on clustering communities from traditionalsocial networks [143, 47], the heterogeneous user behaviours in LBSNs bring togeth-
under-er both “virtual” and “physical” intunder-eractions, which makes it vunder-ery challenging todevelop new frameworks to model the network in a natural and unified manner forcommunity detection and understanding Figure 4.1 shows a snapshot of typicaluser behaviours in Foursquare, which might correspond to a certain overlappingcommunities we aim to detect and understand
In LBSNs, the heterogeneous interactions among multiple types of entitiessuch as users, tips, venues and photos naturally form a multi-modal and non-uniform hypergraph, where each modality corresponds to one type of entities andeach hyperedge connects a varying number of entities from different modalities Forexample, check-ins connect venues to users while tips/photos connect textual top-ics/visual concepts to users and venues These kinds of interactions are naturallyrepresented by hyperedges with non-uniform affinity relations In addition, therealso exist latent relations among venues For example, grocery stores are moresimilar to supermarkets than to parks These interactions and relations naturallyconverge into a non-uniform and heterogeneous hypernetwork (Figure 4.2d) How-ever, the state-of-the-art community mining approaches usually handle standard
Trang 18Upload a photo Post a tip Perform a check-in
Figure 4.1: Overview of the community understanding problem (Left) neous users’ behaviours in Foursquare, where users can check in at venues (blackarrows), write tips (blue arrows) and upload photos (red arrows) at various venues.(Right) The detected overlapping communities, where each ellipse represents a pro-filable community whose characteristics are showed through the tripartite inter-entity relation graph Each user may belong to one or more communities (Bestview in color)
Heteroge-network types, such as those shown in Figures 4.2a,4.2b and 4.2c, while we aim
to tackle community mining and understanding problem from heterogeneous andnon-uniform hypergraphs (Figure 4.2d) If we would utilize traditional communitydetection approaches such as modularity optimization [93] or other heuristics drivenmethods [47, 31], we need to reduce the complex hypergraph to simpler bipartite
or one-modal graph through processes such as “flattening” or “projection” withpossible information loss [90, 157]
In addition, as is also mentioned in Chapter 1, most of the state-of-the-arttechniques extract community structures by minimizing certain objective functionswhile ignoring the equally important task of “understanding” the characteristics ofthe groups [67] However, as pointed out by Fortunato [37], there is no guarantee
that these approaches can provide good quality detection Guimera et al also
re-vealed that a maximum modularity may not imply that true community structure
is discovered, since random networks may also contain high modularity
partition-s [52] Though Tang and Liu have attempted to profile the mined communitiepartition-s
Trang 194 5
C D E
I II III IV
V
(d)
Figure 4.2: A summary of the different network types (a) A one-modal edge graph (b) A one-modal multi-edge graph (c) A tripartite three-uniformhypergraph (d) A heterogeneous non-uniform hypergraph In (d), there exist thefollowing hyperedges: (1,A), (2,C,III), (3,C),(4,B,I),(5,D,II),(5,D,II),(E,IV),(I,II),(II,III,IV) and (IV,V) (Best view in color)
single-by extracting descriptive features single-by using some heuristics [129], it remains unclearwhat are the underlying reasons to bind the members together and how to interpretthe community profiles in terms of the extracted features
Some works attempts to understand the group formation based on statistical
struc-tural analysis Backstrom et al studied prominent online groups in the digital
domain, aiming at answering some basic questions about the evolution of groups,like what are the structural features that influence whether individuals will joincommunities [5] They found that the number of friends in a group is the mostimportant factor to determine whether a new user would join the group Theirfindings provide a global level of structural analysis to help understand how com-munities attract new members However, more efforts are required to understandthe formation of a particular community
Some works try to extract the annotations from relational data from text
For example, Roy et al constructed a hierarchical structure as well as corresponding
annotations based on a complicated generative process [109] The model complexity
Trang 20and scalability hinder its application to group profiling in large scale networks.
Chang et al proposed NUBBI to infer description of its entities and of relationships
between these entities from a text corpora [22] The probabilistic topic modelassumes the words are generated based on the topics associated with an entity orthe topics of the pairwise relationship of entities
Some other works extract communities based on relation and text mation together where each topic represents a distribution of words and can beconsidered as the words associated with a group Link-LDA treats the citationsthe same way as normal words, i.e., the citation generated based on a multinomialdistribution over the documents [34] Pairwise Linked-LDA [88] essentially com-bines the topic model [12] and the mixed membership stochastic block model [1]via forcing the latent mixture of communities to be the same for both word topics
infor-and relation topics Mei et al enforced the connected documents to share similar
topics and used the network information as regularization while extracting the ics of texts All these methods extracted the topics associated with text instead ofcapturing the corresponding text associated with a given community [82]
top-According to the concept of Homophily [81], a connection occurs at a higherrate between similar people than dissimilar people Homophily is one of the firstcharacteristics studied by early social science researchers and holds for a wide vari-ety of relationships and the phenomenon is also observed in social media [36, 133]
In order to understand the formation of a community, some works aim to gate the inverse problem: given a group of users, can we figure out why they are
investi-connected? And what are their shared similarities? Tang et al proposed a group
profiling approach by extracting shared attributes of community members [132].One example of shared attribute could be a set of topics commonly contributed bythe community members Since a group consists of people with shared interests, oneintuitive way of group profiling is to clip a community with “some topics” shared
Trang 21Figure 4.3: Group profile of a community of people who enjoy night life.
by most members in the community For example, in blogosphere2, bloggers uploadblog posts; in content sharing sites such as Digg and Del.icio.us, users post news orbookmarks and provide tags on the shared contents These content information andtags essentially represents the latent interests of individuals and can be regarded astopics For example, Figure 4.3 shows an example profile of people who enjoy nightlife by both textual and visual “topics” in Foursquare However, the aggregation-based approaches separate the community detection and understanding into twosteps, while our proposed framework does both simultaneously
To tackle these challenges, we propose a novel and unified framework which forms both community detection and community understanding in LBSNs To de-tail the process, we first construct a heterogeneous, multi-modal and non-uniformhypergraph which naturally captures various kinds of interactions, such as check-in
per-2
This term was first coined by Brad L Graham on 10 September, 1999 and implies that blogs exist together as a connected community or as a social network in which everyday authors can publish their opinions.
Trang 22actions or tip-posting actions in LBSNs We then propose an efficient algorithm todiscover multiple overlapping communities by constraining the minimum number
of entities in each modality
The advantages of our proposed method are multifold: (1) the method isgeneral, which can be used in any community detection and understanding tasks aslong as the network structure is represented as a graph or hypergraph; (2) it allowsnew types of interactions and modalities to be easily added with the technologyadvancement and emerging of new services; (3) the approach is able to automati-cally determine the number of interest communities with overlapping entities; and(4) community understanding is straightforward since the final computed commu-nity contains both users and the “reasons” why they are put in that particularcommunity In the context of LBSNs, the “reasons” are the combination of venuesthey visit, tips they post and photos they upload as well as the strengths of theinter-relations among entities from different modalities
The heterogeneity of activities in LBSNs naturally brings multiple types of entitiesand interactions into the same network, which we call a multi-modal hypernetworkwhere each modality corresponds to a type of entity In Foursquare network, thereare four modalities: user, venue, tip and photo, and three types of interaction-s: a user checks in at a venue, posts a tip and uploads a photo at a venue Inaddition, entities of same modality may also be related Figure 4.4 illustrates thetypical interactions and inter-venue connections in Foursquare These heterogenousinteractions naturally lead to the construction of a non-uniform and multi-modalhypergraph In this section, we first introduce the different types of vertices (Sec-tion 4.4.1) and hyperedges (Section 4.4.2) involved in the hypergraph before givingdetails on how we construct each type of hyperedges (Section 4.4.3, 4.4.4, 4.4.5 and
Trang 234.4.6) For clarity and convenience, we list the variables used in this section inTable 4.1.
Table 4.1: List of notations of variables used in Chapter 3
V the set of all vertices
Vi the ith subset of vertices
E the set of all hyperedges
Ei the set of hyperedges with ni-ary affinity
re-lation
li the ith venue category
di the ith photo concept
(ui, lj) (hyperedge) The ith user visits the j’s venue
category(ui, dk, lj) (hyperedge) The ith user contributes the k’s
image concept at the j’s venue category(ui, tk, lj) (hyperedge) The ith user contributes the k’s
tip topic at the j’s venue category
x the probability vector, with each xi
repre-senting the probability of choosing the ithvertex of V in a community
wi the weight of hyperedge i involving vertices:
(xi 1,· · · , xini)
C a community of vertices and C⊆ V
Continued on next page
Trang 24Table 4.1 – continued from previous page
ε the variable to control community size
ci the lower bound of the existence probability
of modality i within the final detected munities
com-λi, αi, βi and πi the Lagrangian multipliers
U the set of pairs (xi, xj) which can increase the
There are four types of vertices involved in interactions in LBSNs: user, venue,tip and photo, as is shown in Figure 4.4 Formally, let V be the vertex set, whichcan be divided into g subsets, i.e., V = Sga=1Va In Foursquare network, g = 4
Trang 25and each subset Va corresponds to set of vertices of modalities: user, venue, tipand photo, respectively Let V1 = U = {u1,· · · , un u}, V2 = L = {l1,· · · , lnl},
V3 = T ={t1,· · · , tn t}, and V4 = D ={d1,· · · , dn d} be sets of users, venues, tipsand photos respectively, where na is the number of vertices in Va, a∈ {1, 2, 3, 4}
There are three types of interactions involved in the Foursquare network: a userchecks in at a venue, posts a tip and uploads a photo at a venue Thus, the first threetypes of hyperedges correspond to each interaction, respectively Formally, let E bethe hyperedge set, which can be divided into s subsets, i.e , E =Ssb=1Eb, with eachhyperedge representing a nb-ary affinity relation In Foursquare network, s = 4 Webuild three sets of hyperedges corresponding to three interactions: E1 ={(ui, lj)}representing a check-in performed by user ui at venue lj, E2 = {(ui, tj, lk)} repre-senting tip tj is posted by user ui at venue lk and E3 = {(ui, dj, lk)} representingphoto dj is uploaded by user ui at venue lk In addition, we also want to modelinter-venue similarities and define hyperedge set E4 ={(l1,· · · , lh)} We can thendenote the hypergraph as G ={V, E, w}, where w : E → R is a weighting functionwhich associates a real value with each hyperedge, with larger weights represent-ing stronger affinity relations Figure 4.5 illustrates the process for hypergraphconstructions for Foursquare hypernetwork
Given the enormous number of entities in each modality, the hypergraph isextremely sparse, which unavoidably weakens the structure information Thus, tobetter characterize different types of interactions, we seek to first group semanti-cally similar entities together in each modality to construct a denser hypergraph.The following subsections describe the constructions of each type of hyperedges indetails
Trang 26Photo Tip
E 1 : Users visiting venues
E 2 : Users posting tips at venues
E 3 : Users uploading photos at venues
E 4 : Venues with similar function
Figure 4.5: Illustration of hypergraph construction Vertices of circle, triangle,square and pentagon represent entities of type user, tip, photo and venue, respec-tively Hyperedges represented by ellipses of red, green, blue and tan representinteractions of venues being checked in, tip posting, photo uploading and similarvenues, respectively
To group similar venues, we refer to their corresponding venue categories Weconsider the two venues to be the same if they belong to the same leaf venuecategory For simplicity, we use l to represent venue category in the rest of thechapter Thus, each edge (ui, lj) ∈ E1 indicates a check-in performed by user ui
at venue category lj and w((ui, lj)) = c(ui, lj), where c(ui, lj) is the number ofcheck-ins logged by ui and lj
The voluminous number of tips make it difficult to model strong correlations mong users without modelling tips’ similarity explicitly We seek to first extract amiddle-level representation of tips and then directly relate users to the extractedrepresentation In this way, we are able to reduce the number of noisy hyperedgessignificantly and obtain better interpretations of heterogenous comments posted byusers at various venues To do so, we first project each tip to a latent topic space