Building Web Reputation Systems- P13 pdf

Sports team had initially assumed, that each user should have one posting karma: other users would flag the quality of a post and that would roll up to their all-sports-message-boards us

Trang 1

Constraining Scope

When you’re considering all the objects that your system will interact with, and all the interactions between those objects and your users, it’s critical to take into account an idea that we have been reinforcing throughout this book: all reputation exists within a limited context, which is always specific to your audience and application Try to de-termine the correct scope, or restrictive context, for the reputations in your system Resist the temptation to lump all reputation-generating interactions into one score— the score will be diluted to the point of meaninglessness The following example from Yahoo! makes our point perfectly

Context Is King

This story tells how Yahoo! Sports unsuccessfully tried to integrate social media into its top-tier website Even seasoned product managers and designers can fall into the trap of making the scope of an application’s objects and interactions much broader than it should be

Yahoo!’s Sports product managers believed that they should integrate user-generated

content quickly across their entire site They did an audit of their offering, and started

to identify candidate objects, reputable entities, and some potential inputs

The site had sports news articles, and the product team knew that it could tell a lot about what was in each article: the recognized team names, sport names, player names,

Figure 6-13 The video responses on YouTube certainly indicate users’ desire to be associated with popular videos However, they may not actually indicate any logical thread of association.

Trang 2

cities, countries, and other important game-specific terms—in other words, the objects.

It knew that users liked to respond to the articles by leaving text comments—the inputs

It proposed an obvious intersection of the objects and the inputs: every comment on a news article would be a blog post, tagged with the keywords from the article, and optionally by user-generated tags, too Whenever a tag appeared on another page, such

as a different article mentioning the same city, the user’s comment on the original article could be displayed

At the same time, those comments would be displayed on the team- and player-detail pages for each tag attached to the comment The product managers even had aspira-tions to surface comments on the sports portal, not just for the specific sport, but for all sports

Seems very social, clever, and efficient, right?

No It’s a horrible design mistake Consider the following detailed example from British football

An article reports that a prominent player, Mike Brolly, who plays for the Chelsea team, has been injured and may not be able to play in an upcoming championship football match with Manchester United Users comment on the article, and their comments are tagged with Manchester United, Chelsea, and Brolly

Those comments would be surfaced—news feed–style—on the article page itself, the sports home page, the football home page, the team pages, and the player page One post, six destination pages, each with a different context of use, different social norms, and different communities that they’ve attracted

Nearly all these contexts are wrong, and the correct contexts aren’t even considered:

• There is no all-of-Yahoo! Sports community context At least, there’s not one with any great cohesion—American tennis fans, for example, don’t care about British football When an American tennis fan is greeted on the Yahoo! Sports home page with comments about British football, they regard that about as highly as spam

• The team pages are the wrong context for the comments because the fans of dif-ferent teams don’t mix At a European football game, the fans for each team are kept on opposite sides of the field, divided by a chain link fence, with police wield-ing billy clubs alongside The police are there to keep the fan communities apart Online, the cross-posting of the comments on the team pages encourages conflict between fans of the opposing teams Fans of opposing teams have completely op-posite reactions to the injury of a star player, and intermixing those conversations would yield anti-social (if sometimes hilarious) results

• The comments may or may not be relevant on the player page It depends on whether the user actually responded to the article in the player-centric context—

an input that this design didn’t account for

Constraining Scope | 147

Trang 3

• Even the context of the article itself is poor, at least on Yahoo! Its deal with the news feed companies, AP and Reuters, limits the amount of time an article may appear on the site to less than 10 days Attaching comments (and reputation) to such transient objects tells users that their contributions don’t matter in the long run (See “The entity should persist for some length of time” on page 130.) Comments, like reputation statements, are created in a context In the case of com-ments, the context is a specific target audience for the message Here are some possible

correct contexts for cross-posting comments:

• Cross-post when the user has chosen a fan or team page and designated it to be a secondary destination for the comment Your users will know, better than your system, what some legitimate related contexts are (Though, of course, this can be abused; some decry the ascension of cross-posting to be a significant event in the devolution of the Usenet community.)

• Cross-post back to the commenter’s user profile (with her permission, of course)

Or allow her to post it to her personal blog, or send it to a friend—all of these

approaches put an emphasis on the user as the context If someone interests you

enough for you to visit her user profile or blog, it’s likely that you might be inter-ested in what she has to say over on Yahoo! Sports

• Cross-post automatically only into well-understood and obviously related contexts.

For example, Yahoo! Sports has a completely different context that is still deeply relevant: a Fantasy Football league, where 12 to 16 people build their own virtual teams out of player-entities based on real-player stats

In this context—where the performance and day-to-day circumstances of real-life players affect the outcome of users’ virtual teams—it might be very useful infor-mation to have cross-posted right onto a league’s page

Don’t assume that because it’s safe and beneficial to cross-post in one direction, it’s automatically safe to do so in the opposite di-rection What if Yahoo! auto-posted comments made in a Fantasy Sports league over to the more staid Sports community site? That would be a huge mistake.

The terms of service for Fantasy Football are so much more lax than the terms of service for public-facing posts These players swear and taunt and harass each other A post such as “Ha, Chris—

you and the Bay City Bombers are gonna suck my team’s dust to-morrow while Brolly is home sobbing to his mommy!” clearly should not be automatically cross-posted to the main portal page.

Limit Scope: The Rule of Email

When thinking about your objects and user-generated inputs and how to combine them, remember the rule of email: you need a “subject” line and a “to” line (an addressee

or a small number of addressees)

Trang 4

Tags for user-generated content act as subject identifiers, but not as addressees Making your addressees as explicit as possible will encourage people to participate in many different ways

Sharing content too widely discourages contributions and dilutes content quality and value

Applying Scope to Yahoo! EuroSport Message Board Reputation

When Yahoo! EuroSport, based in the UK, wanted to revise its message board system

to provide feedback on which discussions were the highest quality and incentives for users to contribute better content, it turned for help to reputation systems

It was clear that the scope of reputation was different for each post and for all the posts

in a thread and, as the American Yahoo! Sports team had initially assumed, that each user should have one posting karma: other users would flag the quality of a post and that would roll up to their all-sports-message-boards user reputation

It did not take long for the product team to realize, however, that having Chelsea fans rate the posts of Manchester fans was folly: users would employ ratings to disagree with any comment by a fan of another team, not to honestly evaluate the quality of the posting

The right answer, in this case, ended up being a tighter definition of scope for the context: rather than rewarding “all message boards” participation, or “everything within a particular sport,” instead an effort was made to identify the most granular, cohesive units of community possible on the boards, and reward participation only within those narrow scopes

Yahoo! EuroSport implemented a system of karma medallions (bronze, silver, and gold)

rewarding both the quantity and quality of a user’s participation on a per-board basis.

This carried different repercussions for different sports on the boards

Each UK football team has it’s own dedicated message board, so theoretically an active contributor could earn medallions in any number of football contexts: a gold for par-ticipating on the Chelsea boards, a bronze for Manchester, etc

Bear in mind, however, that it’s the community response to a

contrib-utor’s posts that determines reputation accrual on the boards We did

not anticipate that many contributors would acquire reputation in many

different team contexts; it’s a rare personality that can freely intermix,

and makes friends, among both sides of a rivalry No, this system was

intended to reward and identify good fans and encourage them to keep

among themselves.

Tennis and Formula 1 Racing are different stories Those sports have only one message board each, so contributors to those communities would be rewarded for participating

Constraining Scope | 149

Trang 5

in a sport-wide context, rather than for their team loyalty Again, this is natural and healthy: different sports, different fans, different contexts.

Many users have only a single medallion, participating mostly on a single board, but some are disciplined and friendly enough to have bronze badges or better in each of multiple boards, and each badge is displayed in a little trophy case when you mouse over the user’s avatar or examine the user’s profile (see Figure 6-14)

Figure 6-14 Each Yahoo! EuroSport message board has its own karma medallion display to keep reputation in a tightly bound context.

Generating Reputation: Selecting the Right Mechanisms

Now you’ve established your goals, listed your objects, categorized your inputs, and taken care to group the objects and inputs in appropriate contexts with appropriate scope You’re ready to create the reputation mechanisms that will help you reach your goals for the system

Though it might be tempting to jump straight to designing the display of reputation to your users, we’re going to delay that portion of the discussion until Chapter 7, where

we dig into the reasons not to explicitly display some of your most valuable reputation information Instead of focusing on presentation first, we’re going to take a goal-centered approach

Trang 6

The Heart of the Machine: Reputation Does Not Stand Alone

Probably the most important thing to remember when you’re thinking about how to generate reputations is the context in which they will be used: your application You might track bad-user behavior to save money in your customer care flow by prioritizing the worst cases of apparent abuse for quick review You might also deemphasize cases involving users who are otherwise strong contributors to your bottom line Likewise,

if users evaluate your products and services with ratings and reviews, you will build significant machinery to gather users’ claims and transform your application’s output

on the basis of their aggregated opinions

For every reputation score you generate and display or use, expect at least 10 times as much development effort to adapt your product to accommodate it—including the user interface and coding to gather the events and transform them into reputation in-puts, and all the locations that will be influenced by the aggregated results

Common Reputation Generation Mechanisms and Patterns

Though all reputation is generated from custom-built models, we’ve identified certain common patterns in the course of designing reputation systems and observing systems that others have created These few patterns are not at all comprehensive, and never could be We provide them as a starting point for anyone whose application is similar

to well-established patterns We expand on each reputation generation pattern in the rest of this chapter

What Comes in Is Not What Goes Out

Don’t confuse the input types with the reputation generation patterns—what comes

in is not always what goes out In our example in the section “User Reviews with Karma” on page 75, the inputs were reviews and helpful votes, but one of the generated reputation outputs was a user quality karma score—which had no display symmetry with the inputs, since no user was asked to evaluate another user directly

Roll-ups are often of a completely different claim type from their component parts, and sometimes, as with karma calculations, the target object of the reputation changes drastically from the evaluator’s original target; for example, the author (a user-object)

of a movie review gets some reputation from a helpful score given to the review that the author wrote about the movie-object

This section focuses on calculating reputation, so the patterns don’t describe the meth-ods used to display any user’s inputs back to the user Typically, the decision to store users’ actions and display them is a function of the application design—for example, users don’t usually get access to a log of all of their clicks through a site, even if some

of them are used in a reputation system On the other hand, heavyweight operations, such as user-created reviews with multiple ratings and text fields, are normally at least readable by the creator, and often editable and/or deletable

Generating Reputation: Selecting the Right Mechanisms | 151

Trang 7

Generating personalization reputation

The desire to optimize their personal experience (see the section “Fulfillment incen-tives” on page 119) is often the initial driver for many users to go through the effort required to provide input to a reputation system For example, if you tell an application what your favorite music is, it can customize your Internet radio station, making it worth the effort to teach the application your preferences The effort required to do this also provides a wonderful side effect: it generates voluminous and accurate input into aggregated community ratings

Personalization roll-ups are stored on a per-user basis and generally consist of prefer-ence information that is not shared publicly Often these reputations are attached to very fine-grained contexts derived from metadata attached to the input targets and therefore can be surfaced, in aggregate, to the public (see Figure 6-15) For example, a song by the Foo Fighters may be listed in the “alternative” and “rock” music categories When a user marks the song as a favorite, the system would increase the personalization reputation for this user for three entities: “Foo Fighters,” “alternative,” and “rock.” Personalization reputation can require a lot of storage, so plan accordingly, but the benefits to the user experience, and your product offering, may make it well worth the investment See Table 6-1

Table 6-1 Personalization reputation mechanisms

Reputation models Vote to promote, favorites, flagging, simple ratings, and so on.

Processes Counters, accumulators.

Common uses Site personalization and display.

Input to predictive modeling.

Personalized search ranking component.

Pros A single click is as low-effort as user-generated content gets.

Computation is trivial and speedy.

Intended for personalization, these inputs can also be used to generate aggregated community ratings

to facilitate nonpersonalized discovery of content.

Cons It takes quite a few user inputs before personalization starts working properly, and until then the user

experience can be unsatisfactory (One method of bootstrapping is to create templates of typical user profiles and ask the user to select one to autopopulate a short list of targeted popular objects to rate quickly.) Data storage can be problematic Potentially keeping a score for every target and category per user is very powerful but also very data intensive.

Trang 8

Generating aggregated community ratings

Generating aggregated community ratings is the process of collecting normalized nu-merical ratings from multiple sources and merging them into a single score, often an average or a percentage of the total, as in Figure 6-16 See Table 6-2

Table 6-2 Aggregated community ratings mechanisms

Reputation models Vote to promote, favorites, flagging, simple ratings, and so on.

Inputs Quantitative (normalized, scalar).

Processes Counters, averages, and ratios.

Common uses Aggregated rating display.

Search ranking component.

Quality ranking for moderation Pros A single click is as low-effort as user-generated content gets.

Computation is trivial and speedy.

Figure 6-15 Netflix uses your movie preferences to generate recommendations for other movies that you might want to watch It also averages your ratings against other movies you’ve rated in that category, or by that director, or….

Trang 9

Cons Too many targets can cause low liquidity.

Low liquidity limits accuracy and value of the aggregate score See “Liquidity: You Won’t Get Enough Input” on page 58

Danger exists of using the wrong scalar model See “Bias, Freshness, and Decay” on page 60

Figure 6-16 Recommendations work best when they’re personalized, but how do you help someone who hasn’t yet stated any preferences? You average the opinions of those who have.

One specific form of aggregate community rat-ings requires special mechanisms to get useful results: when an application needs to rank a large data set of objects completely and only a small number of evaluations can

be expected from users For example, a special mechanism would be required to rank the current year’s players in each sports league of an annual fantasy sports draft Hun-dreds of players would be involved, and there would be no reasonable way that each individual user could evaluate each pair against the others Even rating one pair per second would take many times longer than the available time before the draft The same

is true for community-judged contests in which thousands of users submit content Letting users rate randomly selected objects on a percentage or star scale doesn’t help

at all (See “Bias, Freshness, and Decay” on page 60.)

Ranking large target sets (preference orders).

Trang 10

This kind of ranking is called preference ordering When this kind of ranking takes place online, users evaluate successively generated pairs of objects and choose the most appropriate one in each pair Each participant goes through the process a small number

of times, typically less than 10

The secret sauce is in selecting the pairings At first, the ranking engine looks for pairs that it knows nothing about, but over time it begins to select pairings that help users sort similarly ranked objects It also generates pairs to determine whether the user’s evaluations are consistent or not Consistency is good for the system, because it indi-cates reliability; if a users evaluations fluctuate wildly or don’t have a consistent pattern, this indicates a pattern of abuse or manipulation of the ranking

The algorithms for this approach are beyond the scope of this book, but if you are interested, you can find out more in Appendix B This mechanism is complex and requires expertise in statistics to build, so if a reputation model requires this function-ality, we recommend using an existing platform as a model

Generating participation points

Participation points are typically a kind of karma in which users accumulate varying amounts of publicly displayable points for taking various actions in an application Many people see these points as a strong incentive to drive participation and the crea-tion of content But remember, using points as the only motivacrea-tion for user accrea-tions can push out desirable contributions in favor of lower-quality content that users can submit quickly and easily (see “First-mover effects” on page 63) Also see “Leaderboards Con-sidered Harmful” on page 194 for a discussion of the challenges associated with com-petitive displays of participation points

Participation points karma is a good example of a pattern in which the inputs (various, often trivial, user actions) don’t match the process of reputation generation (accumu-lating weighted point values) or the output (named levels or raw score); see Tables

6-3 and 6-4

Table 6-3 ShareTV.org is one of many web applications that uses participation points karma as incentive for users to add content

Activity Point award Maximum/time

Add show or character to profile +1 +25

Định dạng
Số trang	15
Dung lượng	567,67 KB