Categorical Variables Cumulative Frequency Cumulative Percent Cumulative Percent Cumulative Frequency Cumulative Percent Cumulative Frequency Cumulative Percent table continued on next p
Trang 2Page 338Extremes
Trang 3Page 339
Age of File
Univariate ProcedureVariable=AGE_FIL
Trang 4Page 340Quantiles (Def=5)
Trang 5Page 341
Number of 30 -Day Delinquencies
Univariate ProcedureVariable=NO30DAY
Trang 6Page 342Quantiles (Def=5)
Trang 7Page 343
Replacing Missing Values for Income
The first frequency shows the distribution of HOME EQUITY This is used to create the matrix that displays the mean INCOME by HOME EQUITY range and AGE Group
Cumulative Frequency
Cumulative Percent
Trang 8Mean INCOME by HOME EQUITY and AGE Group
Trang 9Page 345All variables left in the model are significant at the 0.1000 level.
OBS _MODEL_ _TYPE_ _DEPVAR_ _RMSE_ INTERCEP INFD_AG2
1 INC_REG PARMS INC_EST2 46.4383 36.8761 0.11446
OBS HOM_EQU2 CREDLIN2 TOT_BAL2 INC_EST2
1 -.0000034348 .00011957 -.0000067044 -1
The following print output displays the values for INCOME after regression substitution
Trang 1272540 42.5443
Trang 13Page 347
APPENDIX B—
UNIVARIATE ANALYSIS OF CATEGORICAL VARIABLES
In this appendix you will find simple frequencies of the categorical variables discussed in chapter 3
Categorical Variables
Cumulative Frequency
Cumulative Percent
Cumulative Percent
Cumulative Frequency
Cumulative Percent
Cumulative Frequency
Cumulative Percent
(table continued on next page)
Trang 14Cumulative Percent
Trang 16Cumulative Percent
Cumulative Percent
FINL_ID Frequency Percent
Cumulative Frequency
Cumulative Percent
Cumulative Percent
Cumulative Frequency
Cumulative Percent
Trang 17I 124038 17.0 417369 57.2
(table continued on next page)
Trang 18Cumulative Percent
Cumulative Frequency
Cumulative Percent
Cumulative Percent
Cumulative Percent
Cumulative Percent
Cumulative Percent
Trang 19Cumulative Percent
Cumulative Percent
Trang 21Page 352
(table continued from previous page)
Cumulative Frequency
Cumulative Percent
Cumulative Percent
Trang 23Page 353
(table continued from previous page)
Cumulative Frequency
Cumulative Percent
Cumulative Percent
Cumulative Frequency
Cumulative Percent
Trang 24V 8311 1.1 729228 100.0
(table continued on next page)
Trang 25Page 354
(table continued from previous page)
Cumulative Frequency
Cumulative Percent
Trang 26Page 355
RECOMMENDED READING
Berry, Michael J.A., and Gordon Linoff 1997 Data Mining Techniques New York: John Wiley & Sons.
Berry, Michael J.A., and Gordon Linoff 1997 Mastering Data Mining New York: John Wiley & Sons.
Hosmer, David W., and Stanley Lemeshow 1989 Applied Logistic Regression New York: John Wiley &
Sons
Hughes, Arthur M 1994 Strategic Database Marketing Chicago: Probus Publishing.
Journal of Targeting, Measurement and Analysis for Marketing London: Henry Stewart
Publications
Tukey, John W 1977 Exploratory Data Analysis Reading, MA: Addison-Wesley.
Trang 27Page 357
WHAT'S ON THE CD-ROM?
The CD -ROM contains step-by -step instructions for developing the data models described in Data Mining Cookbook
Written in SAS code, you can use the contents as a template to create your own models The content on the CD -ROM is equivalent to taking a three-day course in data modeling
Within chapters 3 through 12 of this book are blocks of SAS code used to develop, validate, and implement the data models By adapting this code and using some common sense, it is possible to build a model from the data preparation phase through model development and validation However, this could take a considerable amount of time and introduce the possibility of coding errors To simplify this task and make the code easily accessible for a variety of model types, a companion CD-ROM is available for purchase separately
The CD -ROM includes full examples of all the code necessary to develop a variety of models including response, approval, attrition or churn, risk, and lifetime or net present value Detailed code for developing the objective function includes examples from the credit cards, insurance, telecommunications, and catalog industries The code is well documented and explains the goals and methodology for each step The only software needed is BASE SAS and
SAS/STAT The spreadsheets used for creating gains tables and lift charts are also included These can be used by plugging in the preliminary results from the analyses created in SAS While the steps before and after the model
processing can be used in conjunction with any data modeling software package, the code can also serve as a stand-alone modeling template The model processing steps focus on variable preparation for use in logistic regression Additional efficiencies in the form of SAS macros for variable processing and validation are included
Trang 28Page 358
Hardware Requirements
To use this CD-ROM, your system must meet the following requirements:
Platform/processor/operating system: Windows® 95, NT 4.0 or higher; 200 MHz
Pentium
RAM: 64 MB minimum; 128 MB recommended
Hard drive space: Nothing will install to the hard drive, but in order to make a local copy of all the files on the
CD-ROM requires 5 MB of free space
Peripherals: CD-ROM drive You also will need to have the following applications to make full use of the CD-ROM: a
running copy of SAS 6.12 or higher software to process SAS code provided; a browser such as Internet Explorer or Netscape Navigator to navigate the CD-ROM; Microsoft Excel 97/2000 or Microsoft Excel 5.0/95
Installing the Software
Insert the CD-ROM and launch the readme.htm file in a web browser, or navigate using Windows Explorer to browse the contents of the CD The model programs and output are in text format that can be opened in any editing software (including SAS) that reads ASCII files Spreadsheets are in Microsoft Excel 97/2000 and Microsoft Excel 5.0/95 Launch the application (SAS 6.12 or higher) and open the file directly from the CD-ROM If you wish to make changes, you can rename the files and save them to your local hard drive
Using the Software
The CD is organized into folders that correspond to each chapter in Data Mining Cookbook Within each folder are
sub-folders containing SAS programs, SAS output, Excel spreadsheets The programs can be used as templates You just need to change the data set and variable names More specific instructions on how to use the programs are included The
output is included to provide a more complete understanding of the recipes in Data Mining Cookbook The spreadsheets contain all formulas used to create the tables and charts in Data Mining Cookbook.
Trang 29hiring and teamwork, 21–22
product focus versus customer focus, 22–23
American Standard for Computer Information Interchange See ASCII files
Analyst
hiring of, 21
retaining, 22
teamwork with, 22
Analytics See Adaptive company
Approval models See Risk models
See also Modeling churn
case example of, 42
credit cards and silent type of, 258
defining to optimize profits, 259–261
definition of, 11