Data science algorithms in a week

What this book covers Chapter 1, Classification Using K Nearest Neighbors, Classify a data item based on the k most similar items.. Classification Using K Nearest Neighbors The nearest n

Trang 2

'DWD6 FLHQ FH$ OJRULWK PVLQ D :HHN

Data analysis, machine learning, and more

'bY LG1DWLQ JJD

Trang 3

Data Science Algorithms in a Week

All rights reserved No part of this book may be reproduced, stored in a retrieval system, ortransmitted in any form or by any means, without the prior written permission of thepublisher, except in the case of brief quotations embedded in critical articles or reviews.Every effort has been made in the preparation of this book to ensure the accuracy of theinformation presented However, the information contained in this book is sold withoutwarranty, either express or implied Neither the author, nor Packt Publishing, and itsdealers and distributors will be held liable for any damages caused or alleged to be causeddirectly or indirectly by this book

Packt Publishing has endeavored to provide trademark information about all of the

companies and products mentioned in this book by the appropriate use of capitals

However, Packt Publishing cannot guarantee the accuracy of this information

First published: August 2017

Trang 4

Chandan Kumar IndexerPratik Shirodkar

Content Development Editor

Mamata Walkar Production CoordinatorShantanu Zagade

Technical Editor

Naveenkumar Jain

Trang 5

About the Author

Dávid Natingga graduated in 2014 from Imperial College London in MEng Computing

with a specialization in Artificial Intelligence In 2011, he worked at Infosys Labs in

Bangalore, India, researching the optimization of machine learning algorithms In 2012 and

2013, at Palantir Technologies in Palo Alto, USA, he developed algorithms for big data In

2014, as a data scientist at Pact Coffee, London, UK, he created an algorithm suggestingproducts based on the taste preferences of customers and the structure of coffees In 2017,

he work at TomTom in Amsterdam, Netherlands, processing map data for navigationplatforms

As a part of his journey to use pure mathematics to advance the field of AI, he is a PhDcandidate in Computability Theory at, University of Leeds, UK In 2016, he spent 8 months

at Japan, Advanced Institute of Science and Technology, Japan, as a research visitor

Dávid Natingga married his wife Rheslyn and their first child will soon behold the outerworld

I would like to thank Packt Publishing for providing me with this opportunity to share my knowledge and experience in data science through this book My gratitude belongs to my wife Rheslyn who has been patient, loving, and supportive through out the whole process of writing this book.

Trang 6

About the Reviewer

Surendra Pepakayala is a seasoned technology professional and entrepreneur with over 19

years of experience in the US and India He has broad experience in building enterprise/websoftware products as a developer, architect, software engineering manager, and productmanager at both start-ups and multinational companies in India and the US He is a hands-

on technologist/hacker with deep interest and expertise in Enterprise/Web ApplicationsDevelopment, Cloud Computing, Big Data, Data Science, Deep Learning, and ArtificialIntelligence

A technologist turned entrepreneur, after 11 years in corporate US, Surendra has founded

an enterprise BI / DSS product for school districts in the US He subsequently sold thecompany and started a Cloud Computing, Big Data, and Data Science consulting practice tohelp start-ups and IT organizations streamline their development efforts and reduce time tomarket of their products/solutions Also, Surendra takes pride in using his considerable ITexperience for reviving / turning-around distressed products / projects

He serves as an advisor to eTeki, an on-demand interviewing platform, where he leads theeffort to recruit and retain world-class IT professionals into eTeki’s interviewer panel Hehas reviewed drafts, recommended changes and formulated questions for various IT

certifications such as CGEIT, CRISC, MSP, and TOGAF His current focus is on applyingDeep Learning to various stages of the recruiting process to help HR (staffing and corporaterecruiters) find the best talent and reduce friction involved in the hiring process

Trang 7

For support files and downloads related to your book, please visit www.PacktPub.com Didyou know that Packt offers eBook versions of every book published, with PDF and ePubfiles available? You can upgrade to the eBook version at www.PacktPub.com and as a printbook customer, you are entitled to a discount on the eBook copy Get in touch with us at

collection of free technical articles, sign up for a range of free newsletters and receive

exclusive discounts and offers on Packt books and eBooks

https://www.packtpub.com/mapt

Get the most in-demand software skills with Mapt Mapt gives you full access to all Packtbooks and video courses, as well as industry-leading tools to help you plan your personaldevelopment and advance your career

Why subscribe?

Fully searchable across every book published by Packt

Copy and paste, print, and bookmark content

On demand and accessible via a web browser

Trang 8

Customer Feedback

Thanks for purchasing this Packt book At Packt, quality is at the heart of our editorialprocess To help us improve, please leave us an honest review on this book's Amazon page

at link

If you'd like to join our team of regular reviewers, you can e-mail us at

customerreviews@packtpub.com We award our regular reviewers with free eBooks andvideos in exchange for their valuable feedback Help us be relentless in improving ourproducts!

Trang 9

Table of Contents

Swim preference - information gain calculation 55

Swim preference - decision tree construction by ID3 algorithm 57

Classifying a data sample with the swimming preference decision tree 65

Trang 10

Summary 70

Overview of random forest construction 76

Classification with random forest 83

Going shopping - overcoming data inconsistency with randomness and

K-means clustering algorithm 103

k-means clustering algorithm on household income example 104

Input data from gender classification 112

Program output for gender classification data 112

Document clustering – understanding the number of clusters k in a

Trang 11

[ ]

Visualization - comparison of models by R and gradient descent

Trang 12

Reading and writing the file 188

Chapter 11: Glossary of Algorithms and Methods in Data Science 189

Trang 13

Data science is a discipline at the intersection of machine learning, statistics and data

mining with the objective to gain new knowledge from the existing data by the means ofalgorithmic and statistical analysis In this book you will learn the 7 most important ways inData Science to analyze the data Each chapter first explains its algorithm or analysis as asimple concept supported by a trivial example Further examples and exercises are used tobuild and expand the knowledge of a particular analysis

What this book covers

Chapter 1, Classification Using K Nearest Neighbors, Classify a data item based on the k most

Tiêu đề	Data Science Algorithms in a Week
Tác giả	Dávid Natingga
Trường học	Imperial College London
Chuyên ngành	Computing
Thể loại	book
Năm xuất bản	2017
Thành phố	Birmingham

Định dạng
Số trang	205
Dung lượng	5,1 MB
File đính kèm	40. Data Science Algorithms in a Week.rar (4 MB)

Data science algorithms in a week

DOOLVW LFIOLJKW DQDO\VLV – QRQOLQHDUP RGHO