John wiley sons data mining techniques for marketing sales_20 pptx

B back propagation, feed-forward future customer behaviors, BILL_MASTER file, customer Business Modeling and Data Mining businesses challenges of, identifying, 23–24 customer relationsh

Trang 1

B

back propagation, feed-forward

future customer behaviors,

BILL_MASTER file, customer

Business Modeling and Data Mining

businesses challenges of, identifying, 23–24 customer relationship

virtuous cycle, 27–28 wireless communication industries,

Trang 2

C

139–141 results, comparing, using confidence bounds, 141–143

CART (Classification and Regression Trees) algorithm, decision trees,

185, 188–189

Central Limit Theorem, statistics,

Trang 3

champion-challenger approach, marketing campaigns, 139 change processes, feedback, 34 charts

CHIDIST function, 152 child nodes, classification, 167 children, number of, house-hold level data, 96

chi-square tests case study, 155–158 CHAID (Chi-square Automatic Interaction Detector), 182–183 CHIDIST function, 152

degrees of freedom values, 152–153

difference of proportions versus,

153–154 discussed, 149 expected values, calculating, 150–151 splits, decision trees, 180–183

churn

as binary outcome, 119 customer longevity, predicting,

class labels, probability, 85 classification

accuracy, 79 binary decision trees, 168

Classification and Regression Trees (CART) algorithm, decision trees,

185, 188–189 classification codes discussed, 266 precision measurements, 273–274 recall measurements, 273–274 clustering

automatic cluster detection agglomerative clustering, 368–370 case study, 374–378

categorical variables, 359 centroid distance, 369 complete linkage, 369 data preparation, 363–365 dimension, 352

directed clustering, 372 discussed, 12, 91, 351 distance and similarity, 359–363 divisive clustering, 371–372 evaluation, 372–373

Gaussian mixture model, 366–367 geometric distance, 360–361 hard clustering, 367

Hertzsprung-Russell diagram, 352–354

luminosity, 351 scaling, 363–364 single linkage, 369 soft clustering, 367 SOM (self-organizing map), 372 vectors, angles between, 361–362 weighting, 363–365

zone boundaries, adjusting, 380

Trang 4

comparing models, using lift ratio,

continuous variables data preparation, 235–237 neural networks, 235–237 statistics, 137–138

75–76 coverage of values, neural networks, 232–233

Cox proportional hazards, 410–411

Trang 6

D

data marts, 485, 491–492

Trang 7

Data Preparation for Data Mining

Trang 8

deep intimacy, customer relationships,

data selection, 62–63

Trang 9

E

Trang 10

F

F tests (Ronald A Fisher), 183–184 fax machines, link analysis, 337–341 Federal Express, transaction

processing systems, 3–4

EBCF (existing base churn

former customers, customer

G

Trang 11

genetic algorithms case study, 440–443 crossover, 430 data representation, 432–433 genome, 424

implicit parallelism, 438 maximum values, of simple functions, 424

mutation, 431–432 neural networks and, 439–440 optimization, 422

overview, 421–422 resource optimization, 433–435 response modeling, 440–443 schemata, 434, 436–438 selection step, 429 statistical regression techniques, 423

Genetic Algorithms in Search, Optimization, and Machine Learning

(Goldberg), 445 geographic attributes, market based analysis, 293

geographic information system (GIS), 536

geographical resources, 555–556 geometric distance, automatic cluster detection, 360–361

gigabytes, 5 Gini, Corrado (Gini splitting criterion, decision trees), 178

GIS (geographic information system), 536

goals, formulating, 605–606

Goldberg (Genetic Algorithms in Search, Optimization, and Machine Learning), 445

good customers, holding on to, 17–18 good prospects, identifying, 88–89 Goodman, Marc (projective visualization), 206–208 graphical user interface (GUI), 535 graphs

data as, 337 directed, 330 edges, 322 graph-coloring algorithm, 340–341 Hamiltonian path, 328

linkage, 77 nodes, 322 planar, 323 traveling salesman problem, 327–329 vertices, 322

grouping See clustering

GUI (graphical user interface), 535

H

Hamiltonian path, graph theory, 328 hard clustering, automatic cluster detection, 367

hazards bathtub, 397–398 censoring, 399–403 constant, 397, 416–417 probabilities, 394–396 proportional

Hertzsprung-Russell diagram, automatic cluster detection, 352–354

hidden distance fields, distance function, 278

hidden layer, feed-forward neural networks, 221, 227

hierarchical categories, products, 305 histograms

historical data customer behaviors, 5 documentation as, 61

Trang 12

I

MBR (memory-based reasoning),

125–126

IBM, relational database management

identified versus anonymous

297–298

chains as, 15–16 information gain, entropy, 178–180 information technology, data

16–17

Inmon, Bill (Building the Data

Trang 13

K

L

M

Trang 14

marketing campaigns See also

139–141 results, comparing, using confi

dence bounds, 141–143

Trang 16

235–237 coverage of values, 232–233 data preparation

categorical values, 239–240

Trang 17

O

Occam’s Razor, 124–125 ODBC (Open Database

Open Database Connectivity

433–435

P

parallel coordinates, neural

Trang 18

projective visualization (Marc Goodman), 206–208

Trang 19

proportion converting counts to, 75–76 difference of proportion

chi-square tests versus, 153–154

139–141

proportional scoring, census data, 94–95

gathering, 109–110 people most influenced by, 106–107

messages, selecting appropriate,

Business Modeling and Data Mining Data Preparation for Data Mining

Trang 20

recall measurements, classification codes, 273–274

recency, frequency, and monetary (RFM) value, 575

recommendation-based businesses, 16–17

relational database management

495–496 resources geographical, 555–556 optimization, generic algorithms, 433–435

Trang 21

S

SAC (Simplifying Assumptions

tool, 167–168 scalability, data mining, 533–534 scaling, automatic cluster detection,

436–438

527–528

Trang 23

SQL data, time series analysis, 572–573

stability-based pruning, decision trees, 191–192

staffing, data mining, 525–526 standard deviation

standard error of proportion,

case study, 155–158 degrees of freedom values, chi-square tests, 152–153

difference of proportions versus,

statistical regression techniques,

for, 315–316

strings, fixed-length characters, 552–554

subgroups

Trang 24

subscription-based relationships, cus

survival analysis attrition, handling different types of,

symmetric multiprocessor (SMP), 489–490

T

tables, lookup, auxiliary information,

good prospects, identifying, 88–89

acuity of, statistical analysis, 147–148

150–151

125–126

Trang 25

642 Index

time attributes, market based

transaction data, OLAP, 476–477 transaction processing systems, customer relationship management, 3–4

truncated mean lifetime value,

Trang 26

variables data selection, 63–64 variable selection problems, neural

Định dạng
Số trang	26
Dung lượng	677,04 KB