B back propagation, feed-forward future customer behaviors, BILL_MASTER file, customer Business Modeling and Data Mining businesses challenges of, identifying, 23–24 customer relationsh
Trang 1B
back propagation, feed-forward
future customer behaviors,
BILL_MASTER file, customer
Business Modeling and Data Mining
businesses challenges of, identifying, 23–24 customer relationship
virtuous cycle, 27–28 wireless communication industries,
Trang 2C
139–141 results, comparing, using confidence bounds, 141–143
CART (Classification and Regression Trees) algorithm, decision trees,
185, 188–189
Central Limit Theorem, statistics,
Trang 3champion-challenger approach, marketing campaigns, 139 change processes, feedback, 34 charts
CHIDIST function, 152 child nodes, classification, 167 children, number of, house-hold level data, 96
chi-square tests case study, 155–158 CHAID (Chi-square Automatic Interaction Detector), 182–183 CHIDIST function, 152
degrees of freedom values, 152–153
difference of proportions versus,
153–154 discussed, 149 expected values, calculating, 150–151 splits, decision trees, 180–183
churn
as binary outcome, 119 customer longevity, predicting,
class labels, probability, 85 classification
accuracy, 79 binary decision trees, 168
Classification and Regression Trees (CART) algorithm, decision trees,
185, 188–189 classification codes discussed, 266 precision measurements, 273–274 recall measurements, 273–274 clustering
automatic cluster detection agglomerative clustering, 368–370 case study, 374–378
categorical variables, 359 centroid distance, 369 complete linkage, 369 data preparation, 363–365 dimension, 352
directed clustering, 372 discussed, 12, 91, 351 distance and similarity, 359–363 divisive clustering, 371–372 evaluation, 372–373
Gaussian mixture model, 366–367 geometric distance, 360–361 hard clustering, 367
Hertzsprung-Russell diagram, 352–354
luminosity, 351 scaling, 363–364 single linkage, 369 soft clustering, 367 SOM (self-organizing map), 372 vectors, angles between, 361–362 weighting, 363–365
zone boundaries, adjusting, 380
Trang 4comparing models, using lift ratio,
continuous variables data preparation, 235–237 neural networks, 235–237 statistics, 137–138
75–76 coverage of values, neural networks, 232–233
Cox proportional hazards, 410–411
Trang 6D
data marts, 485, 491–492
Trang 7Data Preparation for Data Mining
Trang 8deep intimacy, customer relationships,
data selection, 62–63
Trang 9E
Trang 10F
F tests (Ronald A Fisher), 183–184 fax machines, link analysis, 337–341 Federal Express, transaction
processing systems, 3–4
EBCF (existing base churn
former customers, customer
G
Trang 11genetic algorithms case study, 440–443 crossover, 430 data representation, 432–433 genome, 424
implicit parallelism, 438 maximum values, of simple functions, 424
mutation, 431–432 neural networks and, 439–440 optimization, 422
overview, 421–422 resource optimization, 433–435 response modeling, 440–443 schemata, 434, 436–438 selection step, 429 statistical regression techniques, 423
Genetic Algorithms in Search, Optimization, and Machine Learning
(Goldberg), 445 geographic attributes, market based analysis, 293
geographic information system (GIS), 536
geographical resources, 555–556 geometric distance, automatic cluster detection, 360–361
gigabytes, 5 Gini, Corrado (Gini splitting criterion, decision trees), 178
GIS (geographic information system), 536
goals, formulating, 605–606
Goldberg (Genetic Algorithms in Search, Optimization, and Machine Learning), 445
good customers, holding on to, 17–18 good prospects, identifying, 88–89 Goodman, Marc (projective visualization), 206–208 graphical user interface (GUI), 535 graphs
data as, 337 directed, 330 edges, 322 graph-coloring algorithm, 340–341 Hamiltonian path, 328
linkage, 77 nodes, 322 planar, 323 traveling salesman problem, 327–329 vertices, 322
grouping See clustering
GUI (graphical user interface), 535
H
Hamiltonian path, graph theory, 328 hard clustering, automatic cluster detection, 367
hazards bathtub, 397–398 censoring, 399–403 constant, 397, 416–417 probabilities, 394–396 proportional
Hertzsprung-Russell diagram, automatic cluster detection, 352–354
hidden distance fields, distance function, 278
hidden layer, feed-forward neural networks, 221, 227
hierarchical categories, products, 305 histograms
historical data customer behaviors, 5 documentation as, 61
Trang 12I
MBR (memory-based reasoning),
125–126
IBM, relational database management
identified versus anonymous
297–298
chains as, 15–16 information gain, entropy, 178–180 information technology, data
16–17
Inmon, Bill (Building the Data
Trang 13K
L
M
Trang 14marketing campaigns See also
139–141 results, comparing, using confi
dence bounds, 141–143
Trang 16235–237 coverage of values, 232–233 data preparation
categorical values, 239–240
Trang 17O
Occam’s Razor, 124–125 ODBC (Open Database
Open Database Connectivity
433–435
P
parallel coordinates, neural
Trang 18projective visualization (Marc Goodman), 206–208
Trang 19proportion converting counts to, 75–76 difference of proportion
chi-square tests versus, 153–154
139–141
proportional scoring, census data, 94–95
gathering, 109–110 people most influenced by, 106–107
messages, selecting appropriate,
Business Modeling and Data Mining Data Preparation for Data Mining
Trang 20recall measurements, classification codes, 273–274
recency, frequency, and monetary (RFM) value, 575
recommendation-based businesses, 16–17
relational database management
495–496 resources geographical, 555–556 optimization, generic algorithms, 433–435
Trang 21S
SAC (Simplifying Assumptions
tool, 167–168 scalability, data mining, 533–534 scaling, automatic cluster detection,
436–438
527–528
Trang 23SQL data, time series analysis, 572–573
stability-based pruning, decision trees, 191–192
staffing, data mining, 525–526 standard deviation
standard error of proportion,
case study, 155–158 degrees of freedom values, chi-square tests, 152–153
difference of proportions versus,
statistical regression techniques,
for, 315–316
strings, fixed-length characters, 552–554
subgroups
Trang 24subscription-based relationships, cus
survival analysis attrition, handling different types of,
symmetric multiprocessor (SMP), 489–490
T
tables, lookup, auxiliary information,
good prospects, identifying, 88–89
acuity of, statistical analysis, 147–148
150–151
125–126
Trang 25642 Index
time attributes, market based
transaction data, OLAP, 476–477 transaction processing systems, customer relationship management, 3–4
truncated mean lifetime value,
Trang 26variables data selection, 63–64 variable selection problems, neural