1. Trang chủ
  2. » Công Nghệ Thông Tin

IT training data mining and predictive analytics larose larose 2015 03 16

1,2K 610 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 1.183
Dung lượng 35,14 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

In Data Mining and Predictive Analytics, the step-by-step hands-on solutions of real-world business problems using widely available data mining techniques applied to real-world datasets

Trang 8

11.5 Decision Rules

11.6 Comparison of the C5.0 and CART Algorithms Applied to Real DataThe R Zone

Trang 11

R References

Exercises

Chapter 22: Measuring Cluster Goodness

Trang 14

32.2 Models that use Misclassification Costs

Trang 15

Part 4: Summarization And Visualization Of Bivariate Relationships

Index

End User License Agreement

Trang 17

Figure 3.7Figure 3.8Figure 3.9Figure 3.10Figure 3.11Figure 3.12Figure 3.13Figure 3.14Figure 3.15Figure 3.16Figure 3.17bFigure 3.18bFigure 3.19aFigure 3.20Figure 3.21Figure 3.22Figure 3.23Figure 3.24Figure 3.25Figure 3.26Figure 3.27Figure 3.28Figure 3.29Figure 4.1Figure 4.2Figure 4.3Figure 4.4Figure 4.6Figure 4.7Figure 6.1Figure 6.2Figure 5.3Figure 5.4

Trang 18

Figure 6.1Figure 6.2Figure 7.1Figure 7.2Figure 7.3Figure 7.4Figure 8.1Figure 8.2Figure 8.3Figure 8.4Figure 8.5Figure 8.6Figure 8.7Figure 8.8Figure 8.9Figure 8.10Figure 8.12Figure 8.11Figure 8.13Figure 8.14Figure 8.15Figure 8.16Figure 8.17Figure 8.18Figure 8.19Figure 8.20Figure 8.21Figure 8.22Figure 8.23Figure 9.1Figure 9.2Figure 9.3Figure 9.4

Trang 19

Figure 9.5Figure 9.6Figure 9.7Figure 9.8Figure 9.9Figure 9.10Figure 9.11Figure 9.12Figure 9.13Figure 9.14Figure 9.15Figure 9.16Figure 9.17Figure 9.18Figure 10.1Figure 10.2Figure 10.3Figure 10.4Figure 10.5Figure 11.1Figure 11.2Figure 11.3Figure 11.4Figure 11.5Figure 11.6Figure 11.7Figure 11.8Figure 11.9Figure 12.1Figure 12.2Figure 12.3Figure 12.4Figure 12.5

Trang 20

Figure 12.6Figure 12.7Figure 12.8Figure 12.9Figure 12.10Figure 13.1Figure 13.2Figure 13.3Figure 13.4Figure 13.5Figure 13.6Figure 13.7Figure 14.1Figure 14.2Figure 14.3Figure 14.4Figure 14.5Figure 14.6Figure 15.1Figure 15.2Figure 15.3Figure 15.4Figure 16.1a–cFigure 18.1Figure 18.2Figure 18.3Figure 18.4Figure 18.5Figure 18.6Figure 18.7Figure 18.8Figure 19.1Figure 19.2

Trang 21

Figure 19.3Figure 19.4Figure 19.5Figure 19.6Figure 19.7Figure 19.8Figure 19.9Figure 19.10Figure 19.11Figure 20.1Figure 20.2Figure 20.3Figure 20.4Figure 20.5Figure 20.6Figure 20.7Figure 20.8Figure 20.9Figure 20.10Figure 21.1Figure 21.2Figure 21.3Figure 21.4Figure 21.5Figure 21.6Figure 21.7Figure 21.8Figure 21.9Figure 21.10Figure 21.11Figure 21.12Figure 21.13Figure 21.14

Trang 22

Figure 22.1Figure 22.2Figure 22.3Figure 22.4Figure 22.5Figure 22.6Figure 22.7Figure 22.8Figure 22.9Figure 22.10Figure 22.11Figure 22.12Figure 23.1Figure 23.2Figure 23.3Figure 23.4Figure 23.5Figure 24.1Figure 24.2Figure 24.4Figure 24.3Figure 24.5Figure 24.6Figure 24.7Figure 24.8Figure 24.9Figure 25.1Figure 25.2Figure 25.3Figure 25.4Figure 25.5Figure 25.6Figure 25.7

Trang 23

Figure 25.8Figure 25.9Figure 25.10Figure 26.1Figure 26.3Figure 26.2Figure 27.1Figure 27.2Figure 27.3Figure 27.4Figure 27.5Figure 27.6Figure 27.7Figure 27.8Figure 27.9Figure 27.10Figure 27.11Figure 28.1Figure 28.2Figure 29.1Figure 29.2Figure 29.3Figure 29.4Figure 29.5Figure 29.6Figure 29.7Figure 29.8Figure 29.9Figure 29.10Figure 29.11Figure 29.12Figure 29.13Figure 29.14

Trang 24

Figure 29.15Figure 29.16Figure 29.17Figure 29.18Figure 29.19Figure 29.20Figure 29.21Figure 29.22Figure 29.23Figure 29.24Figure 29.25Figure 29.26Figure 29.27Figure 29.28Figure 29.29Figure 30.1Figure 30.2Figure 30.3Figure 30.4Figure 30.5Figure 30.6Figure 30.7Figure 30.8Figure 30.11Figure 30.12Figure 30.13Figure 30.14Figure 31.1Figure 31.2Figure 31.3Figure 31.4Figure 31.5Figure 32.1

Trang 25

Figure 32.2Figure A.1Figure A.2Figure A.3Figure A.4Figure A.5Figure A.6Figure A.7Figure A.9Figure A.8

Trang 27

Table 4.12Table 4.13Table 6.1Table 6.2Table 6.3Table 6.4Table 6.5Table 6.6Table 6.7Table 6.1Table 6.2Table 6.3Table 6.4Table 6.5Table 6.6Table 6.7Table 6.8Table 6.9Table 6.10Table 6.12Table 6.13Table 7.1Table 8.1Table 8.2Table 8.3Table 8.4Table 8.5Table 8.6Table 8.7Table 8.8Table 8.9Table 8.10Table 8.11

Trang 28

Table 8.12Table 8.13Table 8.14Table 8.15Table 8.16Table 8.17Table 8.18Table 9.1Table 9.2Table 9.3Table 9.4Table 9.5Table 9.6Table 9.7Table 9.8Table 9.9Table 9.10Table 9.11Table 9.12Table 9.13Table 9.14Table 9.15Table 9.16Table 9.17Table 9.18Table 9.19Table 9.20Table 9.21Table 9.22Table 9.23Table 9.24Table 9.25Table 9.26

Trang 29

Table 9.27Table 10.1Table 10.2Table 10.3Table 10.4Table 10.5Table 11.1Table 11.2Table 11.3Table 11.4Table 11.5Table 11.6Table 11.7Table 11.8Table 11.9Table 11.10Table 11.11Table 12.1Table 13.1Table 13.2Table 13.3Table 13.4Table 13.5Table 13.6Table 13.7Table 13.8Table 13.9Table 13.10Table 13.11Table 13.12Table 13.13Table 13.14Table 13.15

Trang 30

Table 13.16Table 13.17Table 13.18Table 13.19Table 13.20Table 13.21Table 13.22Table 13.23Table 13.24Table 13.25Table 13.26Table 13.27Table 13.28Table 13.29Table 13.30Table 13.31Table 13.32Table 13.33Table 14.1Table 14.2Table 14.3Table 14.4Table 14.5Table 14.6Table 14.7Table 14.8Table 14.9Table 14.10Table 15.1Table 15.2Table 15.3Table 15.4Table 15.5

Trang 31

Table 15.6Table 16.1Table 16.2Table 16.3Table 16.4Table 16.5Table 16.6Table 16.7Table 16.8Table 16.9Table 16.10Table 16.11Table 16.12Table 16.13Table 16.14Table 16.15Table 17.1Table 17.2Table 17.3Table 17.4Table 17.5Table 17.6Table 17.7Table 17.8Table 17.9Table 17.10Table 17.11Table 17.12Table 17.13Table 17.14Table 17.15Table 18.1Table 19.1

Trang 32

Table 19.2Table 19.3Table 19.4Table 19.5Table 20.1Table 21.1Table 21.2Table 21.3Table 21.4Table 21.5Table 22.1Table 22.2Table 22.3Table 22.4Table 22.5Table 23.1Table 23.2Table 23.3Table 23.4Table 23.5Table 23.6Table 23.7Table 25.1Table 25.2Table 25.3Table 25.4Table 25.5Table 25.6Table 26.1Table 26.2Table 26.3Table 26.4Table 26.10

Trang 33

Table 26.11Table 26.12Table 27.1Table 27.2Table 27.3Table 27.4Table 27.5Table 27.6Table 27.7Table 27.8Table 29.1Table 29.2Table 29.3Table 30.1Table 30.2Table 31.1Table 31.2Table 31.3Table 31.4Table 31.5Table 31.6Table 31.7Table 31.8Table 31.9Table 31.10Table 32.1Table 32.2Table 32.3Table 32.4Table 32.5Table A.1Table A.2Table A.3

Trang 34

Table A.4Table A.5

Trang 38

Second Edition

DANIEL T LAROSE

CHANTAL D LAROSE

Trang 39

Published by John Wiley & Sons, Inc., Hoboken, New Jersey

Published simultaneously in Canada

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the

1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com Requests to the Publisher for permission should

be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose No warranty may be

created or extended by sales representatives or written sales materials The advice and strategies contained herein may not be suitable for your situation Y ou should consult with a professional where appropriate Neither the publisher nor author shall

be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential,

or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care

Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002 Wiley also publishes its books in a variety of electronic formats Some content that appears in print may not be available in electronic formats For more information about Wiley products, visit our web site at www.wiley.com.

Trang 40

To those who have gone before us, And to those who come after us,

In the Family Tree of Life…

Trang 44

Preface

Trang 46

According to the research firm MarketsandMarkets, the global big data market is expected togrow by 26% per year from 2013 to 2018, from $14.87 billion in 2013 to $46.34 billion in

2018.1 Corporations and institutions worldwide are learning to apply data mining and

predictive analytics, in order to increase profits Companies that do not apply these methodswill be left behind in the global competition of the twenty-first-century economy

Humans are inundated with data in most fields Unfortunately, most of this valuable data,which cost firms millions to collect and collate, are languishing in warehouses and

managers and analysts who know how to operate companies by using insights from bigdata… We project that demand for deep analytical positions in a big data world could

exceed the supply being produced on current trends by 140,000 to 190,000 positions …

In addition, we project a need for 1.5 million additional managers and analysts in the

United States who can ask the right questions and consume the results of the analysis ofbig data effectively

This book is an attempt to help alleviate this critical shortage of data analysts

Data mining is becoming more widespread every day, because it empowers companies touncover profitable patterns and trends from their existing databases Companies and

institutions have spent millions of dollars to collect gigabytes and terabytes of data, but arenot taking advantage of the valuable and actionable information hidden deep within theirdata repositories However, as the practice of data mining becomes more widespread,

companies that do not apply these techniques are in danger of falling behind, and losing

market share, because their competitors are applying data mining, and thereby gaining thecompetitive edge

Trang 47

In Data Mining and Predictive Analytics, the step-by-step hands-on solutions of real-world

business problems using widely available data mining techniques applied to real-world datasets will appeal to managers, CIOs, CEOs, CFOs, data analysts, database analysts, and otherswho need to keep abreast of the latest methods for enhancing return on investment

Using Data Mining and Predictive Analytics, you will learn what types of analysis will

uncover the most profitable nuggets of knowledge from the data, while avoiding the potential

pitfalls that may cost your company millions of dollars You will learn data mining and

predictive analytics by doing data mining and predictive analytics.

Trang 48

The growth of new off-the-shelf software platforms for performing data mining has kindled anew kind of danger The ease with which these applications can manipulate data, combinedwith the power of the formidable data mining algorithms embedded in the black-box

software, make their misuse proportionally more hazardous

In short, data mining is easy to do badly A little knowledge is especially dangerous when it

comes to applying powerful models based on huge data sets For example, analyses carriedout on unpreprocessed data can lead to erroneous conclusions, or inappropriate analysis may

be applied to data sets that call for a completely different approach, or models may be derivedthat are built on wholly unwarranted specious assumptions If deployed, these errors in

analysis can lead to very expensive failures Data Mining and Predictive Analytics will help

make you a savvy analyst, who will avoid these costly pitfalls

Trang 49

Understanding the Underlying Algorithmic and Model Structures

The best way to avoid costly errors stemming from a blind black-box approach to data miningand predictive analytics is to instead apply a “white-box” methodology, which emphasizes anunderstanding of the algorithmic and statistical model structures underlying the software

Data Mining and Predictive Analytics applies this white-box approach by

clearly explaining why a particular method or algorithm is needed;

getting the reader acquainted with how a method or algorithm works, using a toy example (tiny data set), so that the reader may follow the logic step by step, and thus gain a white-

box insight into the inner workings of the method or algorithm;

providing an application of the method to a large, real-world data set;

using exercises to test the reader's level of understanding of the concepts and algorithms;providing an opportunity for the reader to experience doing some real data mining onlarge data sets

Trang 50

Data Mining Methods and Models walks the reader through the operations and nuances of

the various algorithms, using small data sets, so that the reader gets a true appreciation ofwhat is really going on inside the algorithm For example, in Chapter 21, we follow step bystep as the balanced iterative reducing and clustering using hierarchies (BIRCH) algorithmworks through a tiny data set, showing precisely how BIRCH chooses the optimal clusteringsolution for this data, from start to finish As far as we know, such a demonstration is unique

to this book for the BIRCH algorithm Also, in Chapter 27, we proceed step by step to find theoptimal solution using the selection, crossover, and mutation operators, using a tiny data set,

Chapter Exercises: Checking to Make Sure You Understand It

Data Mining and Predictive Analytics includes over 750 chapter exercises, which allow

readers to assess their depth of understanding of the material, as well as have a little fun

playing with numbers and data These include Clarifying the Concept exercises, which help to clarify some of the more challenging concepts in data mining, and Working with the Data

exercises, which challenge the reader to apply the particular data mining algorithm to a smalldata set, and, step by step, to arrive at a computationally sound solution For example, in

Trang 53

Some readers may be a bit rusty on some statistical and graphical concepts, usually

encountered in an introductory statistics course Data Mining and Predictive Analytics

contains an appendix that provides a review of the most common concepts and terminologyhelpful for readers to hit the ground running in their understanding of the material in thisbook

Trang 54

Data Mining and Predictive Analytics culminates in a detailed Case Study Here the reader

has the opportunity to see how everything he or she has learned is brought all together tocreate actionable and profitable solutions This detailed Case Study ranges over four chapters,and is as follows:

Chapter 29: Case Study, Part 1 : Business Understanding, Data Preparation, and EDA

Chapter 30: Case Study, Part 2 : Clustering and Principal Components Analysis

Chapter 31: Case Study, Part 3 : Modeling and Evaluation for Performance and

Interpretability

Chapter 32: Case Study, Part 4 : Modeling and Evaluation for High Performance Only

The Case Study includes dozens of pages of graphical, exploratory data analysis (EDA),

predictive modeling, customer profiling, and offers different solutions, depending on therequisites of the client The models are evaluated using a custom-built data-driven cost-

benefit table, reflecting the true costs of classification errors, rather than the usual methodssuch as overall error rate Thus, the analyst can compare models using the estimated profitper customer contacted, and can predict how much money the models will earn, based on thenumber of customers contacted

Ngày đăng: 05/11/2019, 16:10

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN