1. Trang chủ
  2. » Công Nghệ Thông Tin

Mastering NET machine learning

465 55 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 465
Dung lượng 10,38 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

He is a technology enthusiast, with keen interest in machine learning, computer vision,and image processing, and regularly writes articles on those topics for the CodeProject,where he ha

Trang 1

www.allitebooks.com

Trang 3

www.allitebooks.com

Trang 8

9 AdventureWorks Production – Neural NetworksNeural networks

Trang 11

Mastering NET Machine Learning

Trang 13

All rights reserved No part of this book may be reproduced, stored in a retrieval system,

or transmitted in any form or by any means, without the prior written permission of thepublisher, except in the case of brief quotations embedded in critical articles or reviews.Every effort has been made in the preparation of this book to ensure the accuracy of theinformation presented However, the information contained in this book is sold withoutwarranty, either express or implied Neither the author, nor Packt Publishing, and its

dealers and distributors will be held liable for any damages caused or alleged to be causeddirectly or indirectly by this book

Trang 17

Jamie Dixon has been writing code for as long as he can remember and has been getting

paid to do it since 1995 He was using C# and JavaScript almost exclusively until

discovering F#, and now combines all three languages for the problem at hand He has apassion for discovering overlooked gems in datasets and merging software engineeringtechniques to scientific computing When he codes for fun, he spends his time using

Jamie lives in Cary, North Carolina with his wonderful wife Jill and their three awesomechildren: Sonoma, Sawyer, and Sloan He blogs weekly at jamessdixon.wordpress.com

and can be found on Twitter at @jamie_dixon

Trang 19

mail, asking me if I was interested in writing the book that you are holding My first

I had never considered writing a book until Meeta from Packt Publishing sent me an e-reaction was excitement immediately followed by fear I have heard that writing a book is

an arduous and painful undertaking with scant reward—was I really ready to dive intothat? Fortunately, writing this book was nothing of the sort—all due to the many

wonderful people that helped me along the way

First and foremost are the technical reviewers Reed Copsey, Jr and César Roberto deSouza Their attention to detail, their spot-on suggestions, and occasional words of

encouragement made all of the difference Next, the team at Packt of Meeta Rajani, PankajKadam, and Laxmi Subramanian took my words, code samples, and screenshots and

turned them into something, well, beautiful Mathias Brandiveder, Evalina Gasborova,Melinda Thielbar, James McCaffrey, Phil Trelford, Seth Jurez, and Chris Kalle all helped

me at different points with questions about what and how to present the machine learningmodels and ideas Dmitry Morozov and Ross McKinlay were indispensable for explainingthe finer points of type providers Isaac Abraham helped me with the section on MBraceand Tomas Petricek helped me with the section on Deedle Chris Matthews and MarkHutchinson reviewed the initial outline and gave me great feedback Ian Hoppes saved mehours (days?) by sharing his expertise on the finer points of Razor and JavaScript Finally,Rob Seder, Mike Esposito, and Kevin Allen encouraged and supported me throughout theentire process

To everyone I mentioned and the people I may have missed, please accept my sincerestthanks

Finally, my deepest love for the initial proofreader, soul mate, and best wife any personcould have: Jill Dixon I am truly the luckiest man in the world to be with you

Trang 21

César Roberto de Souza is the author of the Accord.NET Framework and an experienced

software developer During his early university years in Brazil, he decided to create theAccord.NET Framework, a framework for machine learning, image processing, and

scientific computing for NET Targeted at both professionals and hobbyists, the projecthas been used by large and small companies, big corporations, start-ups, universities, and

in an extensive number of scientific publications After finishing his MSc in the FederalUniversity of São Carlos, the success of the project eventually granted him an opportunity

to work and live in Europe, from where he continues its development and interacts withthe growing community of users that now helps advance the project even further

He is a technology enthusiast, with keen interest in machine learning, computer vision,and image processing, and regularly writes articles on those topics for the CodeProject,where he has won its article writing competition multiple times

Trang 23

www.PacktPub.com

Trang 24

Did you know that Packt offers eBook versions of every book published, with PDF andePub files available? You can upgrade to the eBook version at www.PacktPub.com and as

a print book customer, you are entitled to a discount on the eBook copy Get in touch with

us at < customercare@packtpub.com > for more details

At www.PacktPub.com, you can also read a collection of free technical articles, sign upfor a range of free newsletters and receive exclusive discounts and offers on Packt booksand eBooks

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt’s online digitalbook library Here, you can search, access, and read Packt’s entire library of books

Trang 25

Fully searchable across every book published by PacktCopy and paste, print, and bookmark content

On demand and accessible via a web browser

To Sonoma, Sawyer, and Sloan Dixon

Trang 27

The NET Framework is one of the most successful application frameworks in history.Literally billions of lines of code have been written on the NET Framework, with billionsmore to come For all of its success, it can be argued that the NET Framework is stillunderrepresented for data science endeavors This book attempts to help address this issue

by showing how machine learning can be rapidly injected into the common NET line ofbusiness applications It also shows how typical data science scenarios can be addressedusing the NET Framework This book quickly builds upon an introduction to machinelearning models and techniques in order to build real-world applications using machinelearning While by no means a comprehensive study of predictive analytics, it does

address some of the more common issues that data scientists encounter when buildingtheir models

Many books about machine learning are written with every chapter centering around adataset and how to implement a model on that dataset While this is a good way to build amental blueprint (as well as some code boilerplate), this book is going to take a slightlydifferent approach This book centers around introducing the same application for the line

of business development and one common open data dataset for the scientific programmer

We will then introduce different machine techniques, depending on the business scenario.This means you will be putting on different hats for each chapter If you are a line of

business software engineer, Chapters 2, 3, 6, and 9 will seem like old hat If you are aresearch analyst, Chapters 4, 7, and 10 will be very familiar to you I encourage you to tryall chapters, regardless of your background, as you will perhaps gain a new perspectivethat will make you more effective as a data scientist As a final note, one word you willnot find in this book is “simply” It drives me nuts when I read a tutorial-based book andthe author says “it is simply this” or “simply do that” If it was simple, I wouldn’t need thebook I hope you find each of the chapters accessible and the code samples interesting, andthese two factors can help you immediately in your career

Trang 28

and a logistic regression to solve different business problems at AdventureWorks It willlook at different factors that affect bike sales and then categorize potential customers intopotential sales or potential lost leads It will then implement the models to help our

website convert potential lost leads into potential sales

Chapter 4, Traffic Stops – Barking Up the Wrong Tree?, takes a break from

AdventureWorks You will put on your data scientist hat, use an open dataset of trafficstops, and see if we can understand why some people get a verbal warning and why othersget a ticket at a traffic stop We will use basic summary statistics and decision trees to help

Chapter 7, Traffic Stops and Crash Locations – When Two Datasets Are Better Than One,

returns back to the traffic stop data and adds in two other open datasets that can be used toimprove the predictions and gain new insights The chapter will introduce two commonunsupervised machine learning techniques: k-means and PCA

Trang 29

build machine learning models on top of data that is characterized by massive volume,variability, and velocity We will then look at how IoT devices can generate this big dataand how to deploy machine learning models onto these devices so that they become self-learning.

Trang 30

www.allitebooks.com

Trang 31

You will need Visual Studio 2013 (any version) or beyond installed on your computer.You can also use VS Code or Mono Develop The examples in this book use Visual Studio

2015 Update 1

Trang 33

Also, the nature of the NET software developer’s job is changing Earlier, when the cliché

of ours is a changing industry was being thrown around, it was about languages (need to

know JavaScript, C#, and TSql) and frameworks (Angular, MVC, WPF, and EF) Now,the cliché means that the software developer needs to know how to make sure their code iscorrect (test-driven development), how to get their code off of their machine onto thecustomer’s machine (DevOps), and how to make their applications smarter (machinelearning)

Also, the same forces that are pushing the business developer to retool are pushing theresearch analyst into unfamiliar territory Earlier, analysts focused on data collection,exploration, and visualization in the context of an application (Excel, PowerBI, and SAS)for point-in-time analysis The analyst would start with a question, grab some data, buildsome models, and then present the findings Any kind of continuous analysis was done viareport writing or just re-running the models Today, analysts are being asked to sift

through massive amounts of data (IoT telemetry, user exhaust, and NoSQL data lakes),where the questions may not be known beforehand Also, once models are created, theyare pushed into production applications where they are continually being re-trained in realtime No longer just a decision aid for humans, research is being done by computers toimpact users immediately

The newly-minted data scientist title is at the confluence of these forces Typically, no one

person can be an expert on both sides of the divide, so the data scientist is a bit of a jack of

all trades, master of none who knows machine learning a little bit better than all of the

other software engineers on the team and knows software engineering a little bit betterthan any researcher on the team The goal of this book is to help move from either

software engineer or business analyst to data scientist

Trang 35

Warnings or important notes appear in a box like this

Tip

Tips and tricks appear like this

Trang 37

Feedback from our readers is always welcome Let us know what you think about thisbook—what you liked or disliked Reader feedback is important for us as it helps usdevelop titles that you will really get the most out of

To send us general feedback, simply e-mail < feedback@packtpub.com >, and mention thebook’s title in the subject of your message

If there is a topic that you have expertise in and you are interested in either writing orcontributing to a book, see our author guide at www.packtpub.com/authors

Trang 39

Now that you are the proud owner of a Packt book, we have a number of things to helpyou to get the most from your purchase

Trang 40

You can download the example code files for this book from your account at

http://www.packtpub.com If you purchased this book elsewhere, you can visit

http://www.packtpub.com/support and register to have the files e-mailed directly to you.You can download the code files by following these steps:

WinRAR / 7-Zip for Windows

Zipeg / iZip / UnRarX for Mac

7-Zip / PeaZip for Linux

Trang 41

Although we have taken every care to ensure the accuracy of our content, mistakes dohappen If you find a mistake in one of our books—maybe a mistake in the text or thecode—we would be grateful if you could report this to us By doing so, you can save otherreaders from frustration and help us improve subsequent versions of this book If you findany errata, please report them by visiting http://www.packtpub.com/submit-errata,

selecting your book, clicking on the Errata Submission Form link, and entering the

details of your errata Once your errata are verified, your submission will be accepted andthe errata will be uploaded to our website or added to any list of existing errata under theErrata section of that title

To view the previously submitted errata, go to

https://www.packtpub.com/books/content/support and enter the name of the book in the

search field The required information will appear under the Errata section.

Trang 42

Piracy of copyrighted material on the Internet is an ongoing problem across all media AtPackt, we take the protection of our copyright and licenses very seriously If you comeacross any illegal copies of our works in any form on the Internet, please provide us withthe location address or website name immediately so that we can pursue a remedy

Please contact us at < copyright@packtpub.com > with a link to the suspected piratedmaterial

We appreciate your help in protecting our authors and our ability to bring you valuablecontent

Trang 43

If you have a problem with any aspect of this book, you can contact us at

< questions@packtpub.com >, and we will do our best to address the problem

Trang 45

Chapter 1 Welcome to Machine Learning Using the NET Framework

This is a book on creating and then using Machine Learning (ML) programs using the

.NET Framework Machine learning, a hot topic these days, is part of an overall trend in

the software industry of analytics which attempts to make machines smarter Analytics,

though not really a new trend, has perhaps a higher visibility than in the past This chapterwill focus on some of the larger questions you might have about machine learning usingthe NET Framework, namely: What is machine learning? Why should we consider it inthe NET Framework? How can I get started with coding?

Trang 46

If you check out on Wikipedia, you will find a fairly abstract definition of machine

learning:

“Machine learning explores the study and construction of algorithms that can learn from and make predictions on data Such algorithms operate by building a model

from example inputs in order to make data-driven predictions or decisions, rather

than following strictly static program instructions.”

I like to think of machine learning as computer programs that produce different results asthey are exposed to more information without changing their source code (and

consequently needed to be redeployed) For example, consider a game that I play with thecomputer

I show the computer this picture and tell it “Blue Circle” I then show it this picture and tell it “Red Circle” Next I show it this picture and say “Green Triangle.”

Finally, I show it this picture and ask it “What is this?” Ideally the computer wouldrespond, “Green Circle.”

This is one example of machine learning Although I did not change my code or recompileand redeploy, the computer program can respond accurately to data it has never seen

before Also, the computer code does not have to explicitly write each possible data

permutation Instead, we create models that the computer applies to new data Sometimesthe computer is right, sometimes it is wrong We then feed the new data to the computer toretrain the model so the computer gets more and more accurate over time—or, at least, that

is the goal

Once you decide to implement some machine learning into your code base, another

decision has to be made fairly early in the process How often do you want the computer

to learn? For example, if you create a model by hand, how often do you update it? Withevery new data row? Every month? Every year? Depending on what you are trying toaccomplish, you might create a real-time ML model, a near-time model, or a periodicmodel We will discuss the implications and implementations of each of these in severalchapters in the book as different models lend themselves to different retraining strategies

Ngày đăng: 12/04/2019, 00:41