Building Web Reputation Systems- P23 ppt

Since changes to the rep-utation database come only by sending messages to the reprep-utation framework, limiting application access to the dispatcher that knows the names and addresses

Trang 1

message to the Yahoo! Profiles karma model without knowing the address for the dispatcher; it can just send a message to the one for its own framework and know that the message will get relayed to the appropriate servers Note that a registration service, such as the one described for the dispatch consumer, is required to support this functionality

There can be many message dispatchers deployed, and this layer is a natural location

to provide any context-based security that may be required Since changes to the rep-utation database come only by sending messages to the reprep-utation framework, limiting application access to the dispatcher that knows the names and addresses of the context-specific models makes sense As a concrete example, only Yahoo! Travel and Local had the keys needed to contact, and therefore make changes to, the reputation framework that ran their shared model, but any other company property could read their ratings and reviews using the separate reputation query layer (see “Reputation query inter-face” on page 298)

The Yahoo! Reputation Platform’s dispatcher implementation was optimistic: all application API calls return immediately without waiting for model execution The messages were stored with the dispatcher until they could be forwarded to a model execution engine

The transport services used to move messages to the dispatcher varied by application, but most were proprietary high-performance services A few models, such as Yahoo! Mail’s Spam IP reputation, accepted inputs on a best-effort basis, which uses the fastest available transport service

The Yahoo! Reputation Platform high-level architectural layer cake

shown in Figure A-1 contains all the required elements of a typical

rep-utation framework New framework designers would do well to start

with that design and design/select implementations for each component

to meet their requirements.

Figure A-3 shows the heart of the reputation framework, the

model execution engine, which manages the reputation model processes and their state.

Messages from the dispatcher layer are passed into the appropriate model code for immediate execution The model execution engine reads and writes its state, usually

in the form of reputation statements via the reputation database layer (See “Reputation repository” on page 298.) Model processes run to completion, and if cross-model execution or optimism is desired, may send messages to the dispatcher for future processing

The diagram also shows that models may use the external event signaling system to notify applications of changes in state See the section “External signaling inter-face” on page 297

Model execution engine.

296 | Appendix A: The Reputation Framework

Trang 2

This platform gets much of its performance from parallel processing, and the Yahoo!

Reputation Platform uses this approach by implementing an Engine Proxy that routes

all incoming message traffic to the engine that is currently running the appropriate model in a concurrent process This proxy is also in charge of loading and initializing any model that is not currently loaded or executing

The Yahoo! Reputation Platform implemented models in PHP with many of the mes-saging lines within the model diagram implemented as function calls instead of higher-overhead messages See “Your Mileage May Vary” on page 300 for a discussion of the rationale The team chose PHP mostly due to its members’ personal expertise and tastes (there was no particular technical requirement that drove this choice)

In optimistic systems, such as the Yahoo! reputation platform,

output happens passively: the application has no idea when a change happened or what the results were of any input event Some unknown time after the input, a query to the database may or may not reflect a change In high-volume applications, this is a very

good thing because it is just impractical to wait for every side effect of every input to

External signaling interface.

Figure A-3 Yahoo! Reputation Platform model engine.

Framework Designs | 297

Trang 3

propagate across dozens of servers But when something important (read valuable)

happens, such as an IP address switching from good-actor to spammer, the application needs to be informed ASAP

This is accomplished by using an external signaling interface For smaller systems, this

can just be hardcoded calls in the reputation model implementation But larger envi-ronments normally have signalling services in place that typically log signal details and have mechanisms for executing processes that take actions, such as changing user ac-cess or contacting supervisory personnel

Another kind of signaling interface can be used to provide a layer of request-reply se-mantics to an optimistic system: when the model is about to complete, a signal gets sent to a waiting thread that was created when the input was sent The thread identifier

is sent along as a parameter throughout the model as it executes

On the surface, the reputation repository layer looks like any other high-performance, partitioned, and redundant database The specific features for the repository in the Yahoo! reputation platform are:

• Like the other layers, the repositories may themselves be managed by a proxy manager for performance

• The reputation claim values may be normalized by the repository layer so that those reading the values via the query interface don’t have to know the input scale

To improve performance, many read-modify-write operations, such as increment and addToSum, are implemented as stored procedures at the database level, instead of being

code-level mathematic operations at the model execution layer This significantly re-duces interprocess message time as well as the duration of any lock contention on highly modified reputation statements

The Yahoo! Reputation Platform also contains features to dynamically scale up by adding new repository partitions (nodes) and cope gracefully with data migrations Though those solutions are proprietary, we mention them here for completeness and

so that anyone contemplating such a framework can consider them

The main purpose for all of this infrastructure is to provide speedy access to the best possible reputation statements for diverse display and other

corporate use patterns The reputation query interface provides this service It is

sepa-rated from the repository service because it provides read-only access, and the data access model is presumed to be less restrictive For example, every Yahoo! application could read user karma scores, even if they could only modify it via their own context-restricted reputation model Large-scale database query service architectures are well understood and well documented on the Web and in many books Framework design-ers are reminded that the number of reputation queries in most applications is one or two orders of magnitude larger than the number of changes Our short treatment of the subject here does not reflect the relative scale of the service

Reputation repository.

Reputation query interface.

Trang 4

Yahoo! used context-specific entity identifiers (often in the form of database foreign keys) as source and target IDs So, even though Yahoo! Movies might have permission

to ask the reputation query service for a user’s restaurant reviews, it might do them no good without a separate service from Yahoo! Local to map the reviews’ local-specific target ID back to a data record describing the eatery The format used is context.for eignKeyValue; the reason for the context is to allow for context-specific wildcard search (described later) There is always at least one special context: user., which holds karma

In practice, there is also a source-only context, roll-up., used for claims that aggregate the input of many sources

Claim type identifiers are of a specific format—context.application.claim An exam-ple is YMovies.MovieReviews.OverallRating to hold the claim value for a user’s overall rating for a movie

Queries are of the form: Source: [SourceIDs], Claim: [ClaimIDs], Target: [Targe tIDs] Besides the obvious use of retrieving a specific reputation statement, the iden-tifier design used in this platform supports wildcard queries (*) to support various mul-tiple return results:

Source:*, Claim: [ClaimID], Target: [TargetID]

Returns all of a specific type of claim for a particular target e.g., all of the reviews

for the movie Aliens.

Source: [SourceID], Claim: context.application.*, Target: *

Returns all of the application-specific reputation statements for any targets by a source, e.g., all of Randy’s ratings, reviews, and helpful votes on other user reviews Source: *, Claim: [ClaimID], Target: [TargetID, TargetID, ]

Returns all reputation statements with a certain claim type of multiple targets The application is the source of the list of targets, such as a list of otherwise qualified search results, e.g., What have users given as overall ratings for the movies that are currently in theaters near my house?

There are many more query patterns possible, and framework designers will need to predetermine exactly which wildcard searches will be supported, as appropriate in-dexes may need to be created and/or other optimizations might be required

Yahoo! supports both RESTful interfaces and JSON protocol requests, but any reliable protocol would do It also supports returning a paged window of results, reducing interprocess messaging to just the number of statements required

Yahoo! lessons learned

During the development of the Yahoo! Reputation Platform, the team wandered down many dark alleys and false paths Presented next are some of warning signs and insights gained They aren’t intended as hard-and-fast rules, just friendly advice:

• It is not practical to use prebuilt code blocks to build reputation models, because

every context is different, so every model is also significantly different Don’t try

Framework Designs | 299

Trang 5

to create a reputation scripting language Certainly there are common abstractions,

as represented in the graphical grammar, but those should not be confused with actual executing code To get the desired customization, scale, and performance, the reputation processes should be expressed directly in native code The Yahoo! Reputation Platform expressed the reputation models directly in PHP After the first few models were complete, common patterns were packaged and released as code libraries, which decreased the implementation time for each model

• Focus on building only on the core reputation framework itself, and use existing toolkits for messaging, resource management, and databases No need to reinvent the wheel

• Go for the performance over slavishly copying the diagrams’ inferred modularity For example, even the Simple Accumulator process is probably best implemented primarily in the database process as a stored procedure Many of the patterns work out to be read-modify-write, so the main alternatives are stored procedures or deferring the database modifications as long as possible given your reliability requirements

• Creating a common platform is necessary, but not sufficient, to get applications to share data In practice, it turned out that the problem of reconciling the entity identifiers between sites was a time-intensive task that often was deprioritized Often merging two previously existing entity databases was not 100% automatic and required manual supervision Even when the data was merged, it typically required each sharing application to modify existing user-facing application code, another expense This latter problem can be somewhat mitigated in the short-term

by writing backward-compatible interfaces for legacy application code

Your Mileage May Vary

Given the number of variations on reputation framework requirements and your ap-plication’s technical environment, the two examples just presented represent extremes that don’t exactly apply to your situation Our advice is to design in favor of adapta-bility, a constraint we intentionally left off the choice list

It took three separate tries to implement the Yahoo! Reputation Platform

Yahoo! first tried to do it on the cheap, with a database vendor creating a request-reply, all database-procedure-based implementation That attempt surfaced an unacceptable performance/reliability trade-off and was abandoned

The second attempt taught us about premature reputation model compilation and optimization and that we could loosen the strongly typed and compiled language re-quirement in order to make reputation model implementation more flexible and accessable to more programmers

The third platform finally made it to deployment, and the lessons are reflected in the previous section It is worth noting that though the platform delivers on the original

Trang 6

requirements, the sharing requirement—listed as a primary driver for the project—is not yet in extensive use Despite the repeated assertions by senior product management, the applications designers end up requiring orientation in the benefits of sharing their data as well as leveraging the shared reputations of other applications Presently, only customer care looks at cross-property karma scores to help determine whether an ac-count that might otherwise be automatically suspended should get additional, high-touch support instead

Recommendations for All Reputation Frameworks

Reputation is a database Reputation statements should be stored and indexed sepa-rately so that applications can continue to evolve new uses for the claims

Though it is tempting to mix the reputation process code in with your application,

don’t do it! You will be changing the model over time to either fix bugs, achieve the

results you were originally looking for, or to mitigate abuse, and this will be all but impossible unless reputation remains a distinct module

Sources and targets are foreign keys, and generally the reputation framework has little

to no specific knowledge of the data objects indexed by those keys Everything the reputation model needs to compute the claims should be passed in messages or remain directly accessible to each reputation process

Discipline! The reputation framework manages nothing less than the code that sets the valuation of all user-generated and user-evaluated content in your application As such,

it deserves the effort of regular critical design and code reviews and full testing suites Log and audit every input that is interesting, especially any claim overrides that are logged during operations There have been many examples of employees manipulating reputation scores in return for status or favors

Your Mileage May Vary | 301

Trang 8

APPENDIX B

Related Resources

There are many readings on the broad topic of reputation systems We list a few here and encourage readers who have additional resources to contribute or want to read the most up-to-date list to visit this book’s website at http://buildingreputation.com

Further Reading

The Web contains thousands of white papers and blog postings related to specific reputation issues, such as ratings bias and abusing karma The list here is a represen-tative sample We maintain an updated, comprehensive list on their Delicious book-marks: http://delicious.com/frandallfarmer/reputation and http://delicious.com/soldier ant/reputation

A Framework for Building Reputation Systems, by Phillip J Windley, Ph.D., Kevin Tew, Devlin Daley, dept of computer science Brigham Young University One of the few papers that proposes a platform approach to reputation systems

Designing Social Interfaces, by Christian Crumlish and Erin Malone from O’Reilly and Yahoo! Press It covers not only the reputation patterns, but social patterns of all types

—a definite companion for our book

“Designing Your Reputation System,” a slideshow presentation by Bryce Glass, initially presented before we started on this: book

“Reputation As Property in Virtual Economies,” by Joseph Blocher, discusses the idea that online reputation may become real-world property

The Reputation Pattern Library at the Yahoo! Developer Network, where some of our thoughts were first refined into clear patterns

The Reputation Research Network, a clearinghouse for some older reputation systems research papers

“Who Is Grady Harp? Amazon’s Top Reviewers and the fate of the literary amateur,”

by Garth Risk Hallberg One of many articles talking about the side effects of having

303

Trang 9

karma associated with commercial gain See our Delicious bookmarks for similar arti-cles about YouTube, Yelp, SlashDot, and more

Recommender Systems

Though only briefly mentioned in this book, recommender systems are an important form of web reputations, especially for entities There are extensive libraries of research papers available on the Web In particular, you should check out the following resources:

Visit http://presnick.people.si.umich.edu/ The site is maintained by Paul Resnick, pro-fessor at the University of Michigan School of Information He is one of the lead re-searchers in reputation and recommender systems and is a prolific author of relevant works

GroupLens is a research lab at the University of Minnesota with a focus in recommender systems

Robert E Kraut is another important researcher who focuses on recommender and collaboration systems Visit his site at http://www.cs.cmu.edu/~kraut/RKraut.site.files/ research/research.html

The ACM Recommender Systems conference site contains some great links to support materials, including slide decks

Social Incentives

The “Broken Windows” effect is cited in this book in several chapters There is some popular debate about its effect on human behavior, highlighted in two popular books:

Gladwell, Malcolm The Tipping Point: How Little Things Can Make a Big Difference.

MA: Back Bay Books, 2002

Levitt, Steven D., and Stephen J Dubner Freakonomics: A Rogue Economist Explores the Explores the Hidden Side of Everything NY: Harper Perennial, 2009.

They focus on the question of the effects (or lack thereof) on crime based on the New York Police Department’s strict enforcement Though we don’t take a position on that specific example, we want to point out a few additional references that support the broken windows effect in other contexts:

Johnson, Carolyn Y “Breakthrough on Broken Windows.” The Boston Globe, February

8, 2009

“The Broken Windows Theory of Crime is Correct.” The Economist, November 20,

2008

304 | Appendix B: Related Resources

Trang 10

The emerging field of behavioral economics is deeply relevant to using reputation as

user incentive Papers and books are starting to emerge, but we recommend this primer for all readers:

Ariely, Dan Predictably Irrational NY: Harper Perennial, 2010.

Howe, Jeff Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Busi-ness NY: Three Rivers Press, 2009 This book provides some useful insight into group

motivation

Patents

Several patent applications were cited in this book, and we’ve gathered their references here for convenience Contributors to this section are encouraged to include other relevant intellectual property for consideration by their peers

U.S Patent Application 11/774,460:Detecting Spam Messages Using Rapid Sender Rep-utation Feedback Analysis, Miles Libbey, F Randall Farmer, Mohammad

Mohsenza-deh, Chip Morningstar, Neal Sample

U.S Patent Application 11/945,911:Real-Time Asynchronous Event Aggregation Sys-tems, F Randall Farmer, Mohammad Mohsenzadeh, Chip Morningstar, Neal J Sample U.S Patent Application 11/350,981:Interestingness ranking of media objects, Daniel S.

Butterfield, Caterina Fake, Callum James Henderson-Begg, Serguei Mourachov

U.S Patent Application 11/941,009:Trust Based Moderation, Ori Zaltzman and Quy

Dinh Le

Patents | 305

Tiêu đề	Building Web Reputation Systems
Trường học	Yahoo! University
Chuyên ngành	Reputation Systems
Thể loại	Thesis
Năm xuất bản	2023
Thành phố	Sunnyvale

Định dạng
Số trang	15
Dung lượng	394,12 KB