The Skills to Turn Data into Actionable Information The ideal data mining environment is staffed by people whose superior skills in data processing and data mining are only surpassed by
Trang 1data is not readily available, there is a team of people whose job it is to make it available—even when that means redesigning an application form, reprogramming an automated switch—or simply loading the data correctly in the first place
The Skills to Turn Data into Actionable Information
The ideal data mining environment is staffed by people whose superior skills
in data processing and data mining are only surpassed by their intimate understanding of how the business operates and its goals for the future The data mining group includes database experts, programmers, statisticians, data miners, and business analysts, all working together to ensure that business decisions are based on accurate information This team of people has the communication skills to spread whatever they may learn to the appropriate parts
of the organization, whether that is marketing, operations, management, or strategy
All the Necessary Tools
The ideal data mining environment includes sufficient computing power and database resources to support the analysis of the most detailed level of customer transactions It includes software for manipulating all that data and creating model sets from it And, of course, it includes a rich collection of data mining software so that all the techniques from Chapters 5–13 can be applied
Back to Reality
Readers will not be shocked to learn that we have never seen the ideal data mining environment just described We have, however, worked with many companies that are moving in the right direction These companies are taking steps to transform themselves into customer-centric organizations They are building data mining groups They are gathering customer data from operational systems and creating a single customer view Many of them are already reaping substantial benefits
Building a Customer-Centric Organization
The first component of the utopian vision that opened the chapter was a truly customer-centric organization In terms of data, one of the hardest parts of building a customer-centric organization is establishing a single view of the customer shared across the entire enterprise that informs every customer
Trang 2interaction The flip side of this challenge is establishing a single image of the company and its brand across all channels of communication with the cus
tomer, including retail stores, independent dealers, the Web site, the call cen
ters, advertising, and direct marketing The goal is not only to make more informed decisions; the goal is to improve the customer experience in a mea
surable way In other words, the customer strategy has both analytic and oper
ational components This book is more concerned with the analytic component,
but both are critical to success
T I P Building a customer-centric organization requires a strategy with both
Building a customer-centric organization requires centralizing customer information from a variety of sources in a single data warehouse, along with a set of common definitions and well-understood business processes describing
the source of the data This combination makes it possible to define a set of cus
tomer metrics and business rules used by all groups to monitor the business and
to measure the impact of changing market conditions and new initiatives
The centralized store of customer information is, of course, the data warehouse described in the previous chapter As shown in Figure 16.1, there is two-way traffic between the operational systems and the data warehouse Operational systems supply the raw data that goes into the data warehouse, and the warehouse in turn supplies customer scores, decision rules, customer segment definitions, and action triggers to the operational system As an example, the operational systems of a retail Web site capture all customer orders These orders are then summarized in a data warehouse Using data from the data warehouse, association rules are created and used to generate cross-sell recommendations that are sent back to the operational systems The end result: a customer comes to the site to order a skirt and ends up with sev
eral pairs of tights as well
Creating a Single Customer View
Every part of the organization should have access to a single shared view of the customer and present the customer with a single image of the company In practical terms that means sharing a single customer profitability model, a sin
gle payment default risk model, a single customer loyalty model, and shared
definitions of such terms as customer start, new customer, loyal customer, and
valuable customer
Trang 3Operational Systems
Operational Data (billing, usage, etc.)
Se gments, Actions, Common Def
initions
Business Users
Common Metadata
Common Repository
of Customer Information
Figure 16.1 A customer-centric organization requires centralized customer data
It is natural for different groups to have different definitions of these terms
At one publication, the circulation department and the advertising sales department have different views on who are the most valuable customers because the people who pay the highest subscription prices are not necessarily the people of most interest to the advertisers The solution is to have an advertising value and a subscription value for each customer, using ideas such as advertising fitness introduced in Chapter 4
At another company, the financial risk management group considers a cus
tomer “new” for the first 4 months of tenure, and during this initial probation
ary period any late payments are pursued aggressively Meanwhile, the
customer loyalty group considers the customer “new” for the first 3 months and during this welcome period the customer is treated with extra care So which
is it: a honeymoon or a trial engagement? Without agreement within the company, the customer receives mixed messages
For companies with several different lines of business, the problem is even trickier The same company may provide Internet service and telephone service, and, of course, maintain different billing, customer service, and operational systems for the two services Furthermore, if the ISP was recently acquired by the telephone company, it may have no idea what the overlap is between its existing telephone customers and its newly acquired Internet customers
Trang 4Defining Customer-Centric Metrics
On September 24, 1929, Lieutenant James H Doolittle of the U.S Army Air Corps made history by flying “blind” to demonstrate that with the aid of newly invented instruments such as the artificial horizon, the directional gyro
scope, and the barometric altimeter, it was possible to fly a precise course even with the cockpit shrouded by a canvas hood Before the invention of the artifi
cial horizon, pilots flying into a cloud or fog bank would often end up flying upside down Now, thanks to all those gauges in the cockpit, we calmly munch pretzels, sip coffee, and revise spreadsheets in weather that would have grounded even Lieutenant Doolittle Good business metrics are just as crucial
to keeping a large business flying on the proper course
Business metrics are the signals that tell management which levers to move
and in what direction Selecting the right metrics is crucial because a business
tends to become what it is measured by A business that measures itself by the number of customers it has will tend to sign up new customers without regard
to their expected tenure or prospects for future profitability A business that measures itself by market share will tend to increase market share at the expense of other goals such as profitability The challenge for companies that want to be customer-centric is to come up with realistic customer-centric mea
sures It sounds great to say that the company’s goal is to increase customer loyalty; it is harder to come up with a good way to measure that quality in cus
tomers Is merely having lasted a long time a sign of loyalty? Or should loyalty
be defined as being resistant to offers from competitors? If the latter, how can
it be measured?
Even seemingly simple metrics such as churn or profitability can be surpris
ingly hard to pin down When does churn actually occur:
■■ On the day phone service is actually deactivated?
■■ On the day the customer first expressed an intention to deactivate?
■■ At the end of the first billing cycle after deactivation?
■■ On the date when the telephone number is released for new customers? Each of these definitions plays a role in different parts of a telephone business For wireless subscribers on a contract, these events may be far apart And, which churn events should be considered voluntary? Consider a sub
scriber who refuses to pay in order to protest bad service and is eventually cut off; is that voluntary or involuntary churn? What about a subscriber who stops voluntarily and then doesn’t pay the final amount owed? These questions do not have a right answer; they do suggest the subtleties of defining the cus
tomer relationship
As for profitability, which customers are considered profitable depends a great deal on how costs are allocated
Trang 5Collecting the Right Data
Once metrics such as loyalty, profitability, and churn have been properly
defined, the next step is to determine the data needed to calculate them correctly This is different from simply approximating the definition using whatever data happens to be available Remember, in the ideal data mining environment, the data mining group has the power to determine what data is made available!
Information required for managing the business should drive the addition of new tables and fields to the data warehouse For example, a customer-centric company ought to be able to tell which of its customers are profitable In many companies this is not possible because there is not enough information available to sensibly allocate costs at the customer level One of our clients, a wireless phone company, approached this problem by compiling a list of questions that would have to be answered in order to decide what it costs to provide service to a particular customer They then determined what data would be required to answer those questions and set up a project to collect it
The list of questions was long, and included the following:
■■ How many times per year does the customer call customer care?
■■ Does the customer pay bills online, by check, or by credit card?
■■ What proportion of the customer’s airtime is spent roaming?
■■ On which outside networks does the customer roam?
■■ What is the contractual cost for these networks?
■■ Are the customer’s calls to customer care handled by the IVR or by human operators?
Answering these cost-related questions required data from the call-center system, the billing system , and a financial system Similar exercises around other important metrics revealed a need for call detail data, demographic data, credit data, and Web usage data
From Customer Interactions to Learning Opportunities
A customer-centric organization maintains a learning relationship with its customers Every interaction with a customer is an opportunity for learning, an opportunity that can be siezed when there is good communication between data miners and the various customer-facing groups within the company Almost any action the company takes that affects customers—a price change, a new product introduction, a marketing campaign—can be designed
so that it is also an experiment to learn more about customers The results of these experiments should find their way into the data warehouse, where they
Trang 6will be available for analysis Often the actions themselves are suggested by data mining
As an example, data mining at one wireless company showed that having
had service suspended for late payment was a predictor of both voluntary and
involuntary churn That late payment is a predictor of later nonpayment is
hardly a surprise, but the fact that late payment (or the company’s treatment
of late payers) was a predictor of voluntary churn seemed to warrant further investigation
The observation led to the hypothesis that having had their service suspended lowers a customers’ loyalty to the company and makes it more likely that they will take their business elsewhere when presented with an opportu
nity to do so It was also clear from credit bureau data that some of the late payers were financially able to pay their phone bills This suggested an exper
iment: Treat low-risk customers differently from high-risk customers by being more patient with their delinquency and employing gentler methods of per
suading them to pay before suspending them A controlled experiment tested whether this approach would improve customer loyalty without unacceptably driving up bad debt Two similar cohorts of low-risk, high-value customers received different treatments One was subjected to the “business as usual” treatment, while the other got the kinder, gentler treatment At the end of the trial period, the two groups were compared on the basis of retention and bad debt in order to determine the financial impact of switching to the new treat
ment Sure enough, the kinder, gentler treatment turned out to be worthwhile for the lower risk customers—increasing payment rates and slightly increas
ing long term tenure
Mining Customer Data
When every customer interaction is generating data, there are endless oppor
tunities for data mining Purchasing patterns and usage patterns can be mined
to create customer segments Response data can be mined to improve the tar
geting of future campaigns Multiple response models can be combined into best next offer models Survival analysis can be employed to forecast future customer attrition Churn models can spot customers at risk for attrition Cus
tomer value models can identify the customers worth keeping
Of course, all this requires a data mining group and the infrastructure to support it
The Data Mining Group
The data mining group is specifically responsible for building models and using data to learn about customers—as opposed to leading marketing efforts,
Trang 7Outsourcing Data Mining
Companies have varying reasons for considering outsourcing data mining For some, data mining is only an occasional need and so not worth investing
in an internal group For others, data mining is an ongoing requirement, but the skills required seem so different from the ones currently available in the company that building this expertise from scratch would be very challenging Still others have their customer data hosted by an outside vendor and feel that the analysis should take place close to the data
Outsourcing Occasional Modeling
Some companies think they have little need for building models and using data to understand customers These companies generally fall into one of two types The first are the companies with few customers, either because the company is small or because each customer is very large As an example, the private banking group at a typical bank may serve a few thousand customers, and the account representatives personally know their clients In such an environment, data mining may be superfluous, because people are so intimately involved in the relationship
However, data mining can play a role even in this environment In particular, data mining can make it possible to understand best practices and to spread them For instance, some employees in the private bank may do a better job in some way (retaining customers, encouraging customers to recommend friends, family members, colleagues, and so on) These employees may have best practices that should be spread through the organization
T I P Data mining may be unncessary for companies where dedicated staff maintain deep and personal long-term relationships with their customers
Team-Fly®
Trang 8Data mining may also seem unimportant to rapidly growing companies in a new market In this situation, customer acquisition drives the business, and advertising, rather than direct marketing, is the principal way of attracting new customers Applications for data mining in advertising are limited, and,
at this stage in their development, companies are not yet focused on customer relationship management and customer retention For the limited direct mar
keting they do, outsourced modeling is often sufficient
Wireless communications, cable television, and Internet service providers all went through periods of exponential growth that have only recently come
to an end as these markets matured (and before them, wired telephones, life insurance, catalogs, and credit cards went through similar cycles) During the initial growth phases, understanding customers may not be a worthwhile investment—an additional cell tower, switch, or whatever may provide better return Eventually, though, the business and the customer base grow to a point where understanding the customers takes on increased importance In our experience, it is better for companies to start early along the path of customer insight, rather than waiting until the need becomes critical
Outsourcing Ongoing Data Mining
Even when a company has recognized the need for data mining, there is still the possibility of outsourcing This is particularly true when the company is built around customer acquisition In the United States, credit bureaus and household data suppliers are happy to provide modeling as a value added ser
vice with the data they sell There are also direct marketing companies that handle everything from mailing lists to fulfillment—the actual delivery of products to customers These companies often offer outsourced data mining
Outsourcing arrangements have financial advantages for companies The problem is that customer insight is being outsourced as well A company that relies on outsourcing customers analytics runs the risk that customer under
standing will be lost between the company and the vendor
For instance,one company used direct mail for a significant proportion of its customer acquisition and outsourced the direct mail response modeling work
to the mailing list vendors Over the course of about 2 years, there were several direct mail managers in the company and the emphasis on this channel decreased What no one had realized was that direct mail was driving acquisi
tion that was being credited to other channels Direct mail pieces could be filled in and returned by mail, in which case the new acquisition was credited
to direct mail However, the pieces also contained the company’s URL and a free phone number Many prospects who received the direct mail found it more convenient to respond by phone or on the Web, often forgetting to pro
vide the special code identifying them as direct mail prospects Over time, the response attributed to direct mail decreased, and consequently the budget for
Trang 9direct mail decreased as well Only later, when decreased direct mail led to decreased responses in other channels, did the company realize that ignoring this echo effect had caused them to make a less-than-optimal business decision
Insourcing Data Mining
The modeling process creates more then models and scores; it also produces insights These insights often come during the process of data exploration and data preparation that is an important part of the data mining process For that reason, we feel that any company with ongoing data mining needs should develop an in-house data mining group to keep the learning in the company
Building an Interdisciplinary Data Mining Group
Once the decision has been made to bring customer understanding in-house, the question is where In some companies, the data mining group has no permanent home It consists of a group of people seconded from their usual jobs
to come together to perform data mining By its nature, such an arrangement seems temporary and often it is the result of some urgent requirement such as the need to understand a sudden upsurge in customer defaults While it lasts, such a group can be very effective, but it is unlikely to last very long because the members will be recalled to their regular duties as soon as a new task requires their attention
Building a Data Mining Group in IT
A possible home is in the systems group, since this group is often responsible for housing customer data and for running customer-facing operational systems Because the data mining group is technical and needs access to data and powerful software and servers, the IT group seems like a natural location In fact, analysis can be seen as an extension of providing databases and access tools and maintaining such systems
Being part of IT has the advantage that the data mining group has access to hardware and data as needed, since the IT group has these technical resources and access to data In addition, the IT group is a service organization with clients in many business units In fact, the business units that are the “customers” for data mining are probably already used to relying on IT for data and reporting
On the other hand, IT is sometimes a bit removed from the business problems that motivate customer analytics Since very slight misunderstandings of the business problems can lead to useless results, it is very important that people from the business units be very closely involved with any IT-based data mining projects
Trang 10Building a Data Mining Group in the Business Units
The alternative to putting the data mining group where the data and comput
ers are is to put it close to the problems being addressed That generally means the marketing group, the customer relationship management group (where such a thing exists), or the finance group Sometimes there are several small data mining groups, one in each of several business units A group in finance building credit risk models and collections models, one in marketing building response models, and one in CRM building cross-sell models and voluntary churn models
The advantages and disadvantages of this approach are the inverse of those for putting data mining in IT The business units have a great understanding
of their own business problems, but may still have to rely on IT for data and computing resources Although either approach can be successful, on balance
we prefer to see data mining centered in the business units
What to Look for in Data Mining Staff
The best data mining groups are often eclectic mixes of people Because data mining has not existed very long as a separately named activity, there are few people who can claim to be trained data miners There are data miners who used to be physicists, data miners who used to be geologists, data miners who used to be computer scientists, data miners who used to be marketing man
agers, data miners who used to be linguists, and data miners who are still statisticians
This makes lunchtime conversation in a data mining group fairly interesting, but it doesn’t offer much guidance for hiring managers The things that make good data miners better than mediocre ones are hard to teach and impossible to automate: good intuition, a feel for how to coax information out
of data, and a natural curiosity
No one indivdiual is likely to have all the skills required for completing
a data mining project Among them, the team members should cover the following:
■■ Database skills (SQL, if the data is stored in relational databases)
■■ Data transformation and programming skills (SAS, SPSS, S-Plus, PERL,
Trang 11A new data mining group should include someone who has done commercial data mining before—preferably in the same industry If necessary, this expertise can be provided by outside consultants
Data Mining Infrastructure
In companies where data mining is merely an exploratory activity, useful data mining can be accomplished with little infrastructure A desktop workstation with some data mining software and access to the corporate databases is likely
to be sufficient However, when data mining is central to the business, the data mining infrastructure must be considerably more robust In these companies, updating customer profiles with new model scores either on a regular schedule such as once a month or, in some cases with each new transaction, is part
of the regular production process of the data warehouse The data mining infrastructure must provide a bridge between the exploratory world where models are developed and the production world where models are scored and marketing campaigns run
A production-ready data mining environment must be able to support the following:
■■ The ability to access data from many sources and bring the data together as customer signatures in a data mining model set
■■ The ability to score customers using already created models from the model library on demand
■■ The ability to manage hundreds of model scores over time
■■ The ability to manage scores or hundreds of models developed over time
■■ The ability to reconstruct a customer signature for any point in a tomer’s tenure, such as immediately before a purchase or other interesting event
cus-■■ The ability to track changes in model scores over time
■■ The ability to publish scores, rules, and other data mining results back
to the data warehouse and to other applications that need them
The data mining infrastructure is logically (and often physically) split into two pieces supporting two quite different activities: mining and scoring Each task presents a different set of requirements
Trang 12The Mining Platform
The mining platform supports software for data manipulation along with data mining software embodying the data mining techniques described in this book, visualization and presentation software, and software to enable models
to be published to the scoring environment
Although we have already touched on a few integration issues, others to consider include:
■■ Where in the client/server hierarchy is the software to be installed?
■■ Will the data mining software require its own hardware platform? If so, will this introduce a new operating system into the mix?
■■ What software will have to be installed on users’ desktops in order to communicate with the package?
■■ What additional networking, SQL gateways, and middleware will be required?
■■ Does the data mining software provide good interfaces to reporting and graphics packages?
The purpose of the mining platform is to support exploration of the data, mining, and modeling The system should be devised with these activities in mind, including the fact that such work requires much processing and com
puting power The data mining software vendor should be able to provide specifications for a data mining platform adequate for the anticipated dataset sizes and expected usage patterns
The Scoring Platform
The scoring platform is where models developed on the mining platform are applied to customer records to create scores used to determine future treat
ments Often, the scoring platform is the customer database itself, which is likely to be a relational database running on a parallel hardware platform
In order to score a record, the record must contain, or the scoring platform must be able to calculate, the same features that went into the model These features used by the model are rarely in the raw form in which they occur in the data Often, new features have been created by combining existing vari
ables in various ways, such as taking the ratio of one to another and perform
ing transformations such as binning, summing, and averaging Whatever was done to calculate the features used when the model was created must now be done for every record to be scored Since there may be hundreds of millions of transactional records, it matters how this is done When the volume of data is large, so is the data processing challenge
Trang 13Scoring is not complete until the scores reside on a customer database somewhere accessible to the software that will be used to select customers for inclusion in marketing campaigns If Web log or call detail or point-of-sale scanner data needed as a model input resides in flat files on one system, and the customer marketing database resides on another system but the two are accurate
as of different dates,this too can be a data processing challenge
One Example of a Production Data Mining Architecture
Web retailing is an industry that has gone farther than most in routinely incorporating data mining and scoring into the operational environment Many Web retailers update a customer’s profile with every transaction and use model scores to determine what to display and what to recommend The architecture described here is from Blue Martini, a company that supplies software for mining-ready retail Web sites The example it provides of how data mining can be made an integral part of a company’s operations is not restricted to Web retailing Many companies could benefit from a similar architecture
Architectural Overview
The Blue Martini architecture is designed to support the differing needs of marketers, merchandisers, and, not least, data miners As shown in Figure 16.2, it has three modules for three different types of users For merchandisers, this architecture supports multiple product hierarchies and tools for controlling collections and promotions For marketers there are tools for making controlled experiments to track the effectiveness of various messages and marketing rules For data miners, there is integrated modeling software and relief from having to create customer signatures by hand from dozens of different Web server and application logs The architecture is what Ralph Kimball and Richard Merz would call a data Webhouse, made up of several special-purpose data marts with different schemas, all using common field definitions and shared metadata
Customers at a Web store interact with pages generated as needed from a database that includes product information and the page templates The contents of the page are driven by rules Some of these rules are business rules entered by managers Others are generated automatically and then edited by professional merchandisers
Trang 14Model Scores
Business Data Definition Module
OLAP
with logs Product Hierarchies
Promotions,
Customer Interaction
Web Server with logs
Signatures for Database for
Reporting
Application Server
OLTP Database for Customer Interaction
Figure 16.2 Blue Martini provides a good example of an IT architecture for data
mining–driven Web retailing
Generating pages from a database has many advantages First it makes it possible to enforce a consistent look and feel across the Web site Such stan
dard interfaces help customers navigate through the site Using a database also makes it possible to make global changes quickly, such as updating prices for a sale Another feature is the ability to store templates in different lan
guages and currencies, so the site can be customized for users in different counties From the data mining perspective, a major advantage is that all cus
tomer interactions are logged in the database
User interactions are managed through a collection of data marts Reporting and mining are centered on a customer behavior data mart that includes infor
mation derived from the user interaction, product, and business-rule data marts The complicated extract and transformation logic required to create customer signatures from transaction data is part of the system—a great sim
plification for anyone who has ever tried massaging Web logs to get informa
tion about customers
Customer Interaction Module
This architecture includes the databases and software needed to support mer
chandising, customer interaction, reporting, and mining as well as centric marketing in the form of personalization The Blue Martini system has
Trang 15customer-three major modules, each with its own data mart These repositories keep track of the following:
■■ Business rules
■■ Customer and visitor transactions
■■ Customer behavior The customer behavior data mart, shown in Figure 16.2 as part of the analysis module, is fed by data from the customer interaction module, and it, in turn, supplies rules to both the business data definition module and the customer interaction module
Merchandising information such as product hierarchies, assortments (families of products that are grouped together for merchandising purposes), and price lists are maintained in the business rules data mart, as is content information such as Web page templates, images, sounds, and video clips Business rules include personalization rules for greeting named customers, promotion rules, cross-sell rules, and so on Much of the data mining effort for a retail site goes into generating these rules
The customer interaction module is the part of the system that touches customers directly by processing all the customer transactions The customer interaction module is responsible for maintaining users’ sessions and context This module implements the actual Web store and collects any data that may
be wanted for later analysis The customer transaction data mart logs business events such as the following:
■■ Customer adds an item to the basket
■■ Customer initiates check-out process
■■ Customer completes check-out process
■■ Cross-sell rule is triggered, and recommendation is made
■■ Recommended link is followed
The customer interaction module supports marketing experiments by implementing control groups and keeping track of multiple rules It has detailed knowledge of the content it serves and can track many things that are not tracked in the Web server logs The customer interaction module collects data that allows both products and customers to be tracked over time
Analysis Module
The database that supports the customer interaction module, like most online transaction processing systems, is a relational database designed to support quick transaction processing Data destined for the analytic module must be extracted and transformed to support the structures suitable for mining and reporting Data mining requires flat signature tables with one row per customer
Trang 16or item to be studied This means transformations that flatten product hierar
chies so that, for example, the same transaction might generate one flag indi
cating that the customer bought French wine, another that he or she bought a wine from the Burgundy region, and a third indicating that the wine was from the Beaujolais district in Burgundy Other data must be rolled up from order files, billing files, and session logs that contain multiple transactions per cus
tomer Typical values derived this way include total spending by category, average order amount, difference between this customer’s average order and the mean average order, and the number of days since the customer last made
a purchase
Reporting is done from a multidimensional database that allows retrospective queries at various levels Data mining and OLAP are both part of the
analysis module, although they answer different kinds of questions OLAP
queries are used to answer questions such as these:
Data mining is used to answer more complicated questions such as these:
■■ What are the characteristics of heavy spenders? Does this user fit the profile?
■■ What promotion should be offered to this customer?
■■ What is the likelihood that this customer will return within 1 month?
■■ What customers should we worry about because they haven’t visited the site recently?
■■ Which products are associated with customers who spend the most money?
■■ Which products are driving sales of which other products?
In Figure 16.2, the arrow labeled “build data warehouse” connects the cus
tomer interaction module to the analysis module and represents all the trans
formations that must occur before either data mining or reporting can be done properly Two more arrows, labeled “deploy results,” show the output of the analysis module being shipped back to the business data definition and cus
tomer interaction modules Yet another arrow, labeled “stage data,” shows how the business rules embedded in the business definition module feed into the customer interacting module
Trang 17532 Chapter 16
What is appealing about this architecture is the way that it facilitates the virtuous cycle of data mining by allowing new knowledge discovered through data mining to be fed directly to the systems that interact with customers
Data Mining Software
One of the ways that the data mining world has changed most since the first edition of this book came out is the maturity of data mining software products Robustness, usability, and scalability have all improved significantly The one thing that may have decreased is the number of data mining software vendors
as tiny boutique software firms have been pushed aside by larger, more established companies As stated in the first edition, it is not reasonable to compare the merits of particular products in a book intended to remain useful beyond the shelf-life of the current versions of these products Although the products are changing—and hopefully improving—over time, the criteria for evaluating them have not changed: Price, availability, scalability, support, vendor relationships, compatibility, and ease of integration all factor into the selection process
Range of Techniques
As must be clear by now, there is no single data mining technique that is applicable in all situations Neural networks, decision trees, market basket analysis, statistics, survival analysis, genetic algorithms, memory-based reasoning, link analysis, and automatic cluster detection all have a place As shown in the case studies, it is not uncommon for two or more of these techniques to be applied in combination to achieve results beyond the reach of any single method
Be sure that the software selected is powerful enough to support the data and goals needed for the organization It is a good idea to have software a bit more advanced than the analysts’ abilities, so people can try out new things that they might not otherwise think of trying Having multiple techniques available in a single set of tools is useful, because it makes it easier to combine and compare different techniques At the same time, having several different products makes sense for a larger group, since different products have different strengths—even when they support the same underlying functionality Some are better at presenting results; some are better at developing scores; some are more intuitive for novice users
Assess the range of data mining tasks to be addressed and decide which data mining techniques will be most valuable If you have a single application
in mind, or a family of closely related applications, then it is likely that you
Team-Fly®