IPython interactive computing and visualization cookbook second edition

About the authorPackt is Searching for Authors Like You Preface Who this book is for What this book covers Part 1 – Interactive Computing with Jupyter Part 2 – Standard Methods in Data S

Trang 2

About the author

Packt is Searching for Authors Like You

Preface

Who this book is for

What this book covers

Part 1 – Interactive Computing with Jupyter

Part 2 – Standard Methods in Data Science and AppliedMathematics

To get the most out of this book

Installing Python

GitHub repositories

Download the example code files

Download the color images

Trang 3

What's new in the SciPy ecosystem?

How to install Python

Trang 4

The IPython terminal

IPython and text editor

The Jupyter Notebook

Integrated Development Environments

Trang 5

Writing high-quality Python code

Workflows with unit testing

Unit testing and continuous integration

Debugging code with IPython

The Notebook ecosystem

Architecture of the Jupyter Notebook

Connecting multiple clients to one kernel

Trang 7

Why are NumPy arrays efficient?

What is the difference between in-place and implicit-copyoperations?

Why can't some arrays be reshaped without a copy?

What are NumPy broadcasting rules?

Trang 8

Compiler-related installation instructions

Using Python to write faster code

Trang 10

Exploration, inference, decision, prediction

Univariate and multivariate methods

Frequentist and Bayesian methods

Parametric and nonparametric inference methods

Exploring a dataset with pandas and Matplotlib

How to do it

There's more

Getting started with statistical hypothesis testing — a simple z-test

Trang 11

Computation of the posterior distribution

Maximum a posteriori estimation

Pearson's correlation coefficient

Contingency table and chi-squared test

Trang 12

Feature selection and feature extraction

Overfitting, underfitting, and the bias-variance tradeoff

Model selection

Machine learning references

Getting started with scikit-learn

Getting ready

How to do it

How it works

scikit-learn API

Ordinary Least Squares regression

Polynomial interpolation with linear regression

Trang 14

The objective function

Local and global minima

Constrained and unconstrained optimization

Deterministic and stochastic algorithms

Analog and digital signals

The Nyquist–Shannon sampling theorem

The discrete Fourier transform

Inverse Fourier transform

Trang 15

What are linear filters?

Linear filters and convolutions

The FIR and IIR filters

Filters in the frequency domain

The low-, high-, and band-pass filtersThere's more

Trang 18

Geographical information systems in Python

Trang 20

Index

Trang 21

IPython Interactive Computing and Visualization Cookbook

Second Edition

Trang 22

IPython Interactive Computing and Visualization

CookbookSecond Edition

a retrieval system, or transmitted in any form or by any means, withoutthe prior written permission of the publisher, except in the case of briefquotations embedded in critical articles or reviews

Every effort has been made in the preparation of this book to ensurethe accuracy of the information presented However, the informationcontained in this book is sold without warranty, either express or

implied Neither the author, nor Packt Publishing or its dealers anddistributors, will be held liable for any damages caused or alleged tohave been caused directly or indirectly by this book

Packt Publishing has endeavored to provide trademark informationabout all of the companies and products mentioned in this book by theappropriate use of capitals However, Packt Publishing cannot

guarantee the accuracy of this information

Commissioning Editor: Veena Pagare

Acquisition Editor: Dominic Shakeshaft

Project Editor: Suzanne Coutinho

Technical Editors: Bhagyashree Rai, Nidhisha Shetty

Proofreader: Safis Editing

Indexer: Aishwarya Gangawane

Graphics: Tom Scaria

Production Coordinator: Shantanu Zagade

First published: September 2014

Second Edition: January 2018

Trang 23

mapt.io

Mapt is an online digital library that gives you full access to over 5,000books and videos, as well as industry leading tools to help you planyour personal development and advance your career For more

information, please visit our website

Why subscribe?

Spend less time learning and more time coding with practicaleBooks and Videos from over 4,000 industry professionals

Learn better with Skill Plans built especially for you

Get a free eBook or video every month

Mapt is fully searchable

Copy and paste, print, and bookmark content

Trang 24

Did you know that Packt offers eBook versions of every book

published, with PDF and ePub files available? You can upgrade to theeBook version at www.PacktPub.com and as a print book customer,you are entitled to a discount on the eBook copy Get in touch with us

at < service@packtpub.com > for more details

At www.PacktPub.com, you can also read a collection of free technicalarticles, sign up for a range of free newsletters, and receive exclusivediscounts and offers on Packt books and eBooks

Trang 25

About the author

Cyrille Rossant, PhD, is a neuroscience researcher and software

engineer at University College London He is a graduate of ÉcoleNormale Supérieure, Paris, where he studied mathematics andcomputer science He has also worked at Princeton University andCollège de France While working on data science and softwareengineering projects, he has gained experience in numerical

computing, parallel computing, and high-performance data

visualization

He is the author of Learning IPython for Interactive Computing and

Data Visualization, Second Edition, Packt Publishing, the prequel of

this cookbook

I'm grateful to everyone who gave their feedback on this book,including Matthias Bussonnier, Thomas Caswell, Guillaume Gay,Brian Granger, Matthew Rocklin, Steven Silvester, and Jake

VanderPlas I'd also like to thank my family for their support

Trang 26

Packt is Searching for Authors Like You

If you're interested in becoming an author for Packt, please visit

authors.packtpub.com and apply today We have worked with

thousands of developers and tech professionals, just like you, to helpthem share their insight with the global tech community You canmake a general application, apply for a specific hot topic that we arerecruiting an author for, or submit your own idea

Trang 27

We are becoming awash in the flood of digital data from scientificresearch, engineering, economics, politics, journalism, business, andmany other domains As a result, analyzing, visualizing, and

harnessing data is the occupation of an increasingly large and diverseset of people Quantitative skills such as programming, numericalcomputing, mathematics, statistics, and data mining, which form thecore of data science, are more and more appreciated in a seeminglyendless plethora of fields

Python, a widely-known programming language, is also one of theleading open platforms for data science IPython is a mature Pythonproject that provides scientist-friendly interactive access to Python It

is part of the broader Project Jupyter, which aims to provide quality environments for interactive computing, data analysis,

high-visualization, and the authoring of interactive scientific documents.Jupyter is estimated to have several million users today

The prequel of this book, Learning IPython for Interactive Computing

and Data Visualization Second Edition, Packt Publishing was

published in 2015, two years after the first edition It is a level introduction to data science and numerical computing with

beginner-Python, Ibeginner-Python, and Jupyter

This book, the first edition of which was published in 2014, continuesthat journey by presenting more than 100 recipes for interactive

scientific computing and data science These recipes not only coverprogramming topics such as numerical computing, high-performancecomputing, parallel computing, and interactive visualization, but alsodata analysis topics such as statistics, data mining, machine learning,signal processing, graph theory, numerical optimization, and manyothers

This second edition is fully compatible with the latest versions of theplatform and its libraries It includes new recipes to better leverage thelatest features of Python 3, and it introduces promising new projectssuch as JupyterLab, Altair, and Dask

Note

By design, this book privileges breadth over depth A particularlywide range of libraries and techniques are covered in this book,

Trang 28

but not comprehensively We give many references that let youdeepen your knowledge of individual methods The goal of thisbook is not to make you an expert of the subjects covered, but togive you a glimpse of the extremely diverse set of applications thatyou can tackle with the platform.

All the recipes in this book, which cover a specific techniques, areavailable online as a Jupyter notebook This interactive documentlets you read, execute, and modify the code interactively, whichmakes the learning process more engaging and dynamic

Almost all of this book's content is available online on the GitHubplatform (http://ipython-books.github.io/) Updates and correctionswill be regularly published there, so you should make sure youcheck out the latest version of the book online

Trang 29

Who this book is for

This book targets researchers, engineers, data scientists, teachers,students, analysts, journalists, economists, and hobbyists interested indata analysis and numerical computing

Readers familiar with the scientific Python ecosystem will find manyresources to sharpen their skills in high-performance interactive

computing with IPython and Jupyter

Readers who need to implement algorithms for domain-specific

applications will appreciate the introductions to a wide variety of topics

in data analysis and applied mathematics

Readers who are new to numerical computing with Python should

start with the prequel of this book, Learning IPython for Interactive

Computing and Data Visualization Second Edition, Packt Publishing

published in 2015

Trang 30

What this book covers

This book is split into two parts:

Part 1 (chapters 1 to 6) covers relatively advanced methods in

interactive numerical computing, high-performance computing, anddata visualization

Part 2 (chapters 7 to 15) introduces standard methods in data

science and mathematical modeling Many of these methods are

applied to real-world data

Part 1 – Interactive Computing with Jupyter

Chapter 1, A Tour of Interactive Computing with Jupyter and IPython,

contains a brief introduction to data analysis and numerical computingwith IPython and Jupyter It not only covers common packages such

as Python, NumPy, pandas, and Matplotlib, but also advanced

IPython/Jupyter topics such as interactive widgets in the Notebook,custom magic commands, configurable IPython extensions, and

custom Jupyter kernels

Chapter 2, Best Practices in Interactive Computing, details best

practices to write reproducible, high-quality code: task automation,version control with Git, workflows with IPython and Jupyter, unit

testing, continuous integration, debugging, and other related topics.The importance of these subjects in computational research and dataanalysis cannot be overstated

Chapter 3, Mastering the Jupyter Notebook, covers topics related to

the Jupyter Notebook, notably the Notebook format, notebook

conversions, and interactive widgets

Chapter 4, Profiling and Optimization, covers methods to make your

code faster and more efficient: CPU and memory profiling in Python,advanced optimization techniques with NumPy (including large arraymanipulations), and memory mapping of huge arrays These

techniques are essential for big data analysis

Chapter 5, High-Performance Computing, covers techniques to make

your code much faster: code acceleration with Numba and Cython,wrapping C libraries in Python with ctypes, parallel computing with

IPython and Dask, OpenMP, and General-Purpose Computing on Graphics Processing Units (GPGPU) with CUDA The chapter ends

with an introduction to the Julia language, a high-performance

Trang 31

numerical computing programming language that can be used in theJupyter Notebook.

Chapter 6, Data Visualization, introduces several visualization or

interactive visualization libraries, such as matplotlib, seaborn, bokeh,D3, Altair, and others.

Part 2 – Standard Methods in Data Science

and Applied Mathematics

Chapter 7, Statistical Data Analysis, covers methods for getting

insights into data It introduces classic frequentist and Bayesian

methods for hypothesis testing, parametric and nonparametric

estimation, and model inference The chapter leverages Python

libraries such as pandas, SciPy, statsmodels, and PyMC The last recipeintroduces the statistical language R, which can be easily used in theJupyter Notebook

Chapter 8, Machine Learning, covers methods to learn and make

predictions from data Using the scikit-learn Python package, thischapter illustrates fundamental data mining and machine learningconcepts such as supervised and unsupervised learning,

classification, regression, feature selection, feature extraction,

overfitting, regularization, cross-validation, and grid search Algorithmsaddressed in this chapter include logistic regression, Naive Bayes, K-nearest neighbors, support vector machines, random forests, andothers These methods are applied to various types of datasets:

numerical data, images, and text

Chapter 9, Numerical Optimization, covers minimizing and maximizingmathematical functions This topic is pervasive in data science,

notably in statistics, machine learning, and signal processing Thischapter illustrates a few root-finding, minimization, and curve-fittingroutines with SciPy

Chapter 10, Signal Processing, covers extracting relevant informationfrom complex and noisy data These steps are sometimes requiredprior to running statistical and data mining algorithms This chapterintroduces basic signal processing methods such as Fourier

transforms and digital filters

Chapter 11, Image and Audio Processing, covers signal processing

methods for images and sounds It introduces image filtering,

Trang 32

segmentation, computer vision, and face detection with scikit-imageand OpenCV It also presents methods for audio processing and

synthesis

Chapter 12, Deterministic Dynamical Systems, describes the

dynamical processes underlying particular types of data It illustratessimulation techniques for discrete-time dynamical systems, as well asfor ordinary differential equations and partial differential equations

Chapter 13, Stochastic Dynamical Systems, describes the dynamical

random processes underlying particular types of data It illustratessimulation techniques for discrete-time Markov chains, point

processes, and stochastic differential equations

Chapter 14, Graphs, Geometry, and Geographic Information Systems,

covers analysis and visualization methods for graphs, flight networks,road networks, maps, and geographic data

Chapter 15, Symbolic and Numerical Mathematics, introduces SymPy,

a computer algebra system that brings symbolic computing to Python.The chapter ends with an introduction to Sage, another Python-basedsystem for computational mathematics

Trang 33

To get the most out of this

book

This book is accessible to beginners However, it may be easier for

you if you are familiar with the contents of Learning IPython for

Interactive Computing and Data Visualization, Second Edition, Packt Publishing (also called the "IPython minibook"), the prequel of this

book The minibook introduces Python programming, the IPythonconsole, the Jupyter Notebook, numerical computing with NumPy,basic data analysis with pandas, and plotting with Matplotlib Thisbook tackles scientific programming topics that rely on all of thesetools

Part 2 is a bit more theoretical It is easier to read if you know the

basics of calculus, linear algebra, and probability theory (real-valuedfunctions, integrals and derivatives, differential equations, matrices,vector spaces, probabilities, random variables, and so on) Thesechapters introduce different topics in data science and applied

mathematics, and how to apply them with Python: statistics, machinelearning, numerical optimization, signal processing, dynamical

systems, graph theory, and others

Installing Python

This book uses the free Anaconda distribution

(https://www.anaconda.com/download/) It includes Python 3, IPython,Jupyter, and almost all of the packages that we will be using in thisbook Anaconda also includes a powerful packaging system namedConda The introduction of this book's first chapter gives you moredetails

The code of this book has been written for Python 3 and is

incompatible with older versions of Python, Python 2 (although

minimal to no changes would be required to make it compatible)

GitHub repositories

This book has a website: http://ipython-books.github.io The text, thecode, and the data from the book are available on several GitHubrepositories at https://github.com/ipython-books/ You can also run thecode interactively in your web browser without installing anything on

Trang 34

your computer, thanks to the Binder project.

Be sure to check out http://ipython-books.github.io and the

repositories to get the latest updates and corrections You can alsopropose your own corrections and suggestions on GitHub by openingissues or pull requests

You can also follow the author online (http://cyrille.rossant.net) and onTwitter (@cyrillerossant)

Download the example code files

You can download the example code files for this book from your

account at http://www.packtpub.com If you purchased this book

elsewhere, you can visit http://www.packtpub.com/support and

register to have the files emailed directly to you

You can download the code files by following these steps:

1 Log in or register at http://www.packtpub.com

2 Select the SUPPORT tab.

3 Click on Code Downloads & Errata.

4 Enter the name of the book in the Search box and follow the

on-screen instructions

Once the file is downloaded, please make sure that you unzip or

extract the folder using the latest version of:

WinRAR / 7-Zip for Windows

Zipeg / iZip / UnRarX for Mac

7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub at

https://github.com/PacktPublishing/ We also have other code bundlesfrom our rich catalog of books and videos available at

https://github.com/PacktPublishing/ Check them out!

Download the color images

We also provide a PDF file that has color images of the

screenshots/diagrams used in this book You can download it here:

https://www.packtpub.com/sites/default/files/downloads/

Conventions used

There are a number of text conventions used throughout this book.CodeInText: Indicates code words in text, database table names, foldernames, filenames, file extensions, pathnames, dummy URLs, user

Trang 35

input, and Twitter handles Here is an example:«"

A block of code is set as follows:

Bold: Indicates a new term, an important word, or words that you see

on the screen, for example, in menus or dialog boxes, also appear in

the text like this Here is an example: "Select System info from the Administration panel."

Trang 36

In this book, you will find several headings that appear frequently

(Getting ready, How to do it , How it works , There's more , and

See also).

To give clear instructions on how to complete a recipe, use thesesections as follows:

Getting ready

This section tells you what to expect in the recipe and describes how

to set up any software or any preliminary settings required for therecipe

How to do it…

This section contains the steps required to follow the recipe

How it works…

This section usually consists of a detailed explanation of what

happened in the previous section

Trang 37

Get in touch

Feedback from our readers is always welcome

General feedback: Email < feedback@packtpub.com > and mention thebook's title in the subject of your message If you have questions

about any aspect of this book, please email us at

< questions@packtpub.com >

Errata: Although we have taken every care to ensure the accuracy of

our content, mistakes do happen If you have found a mistake in thisbook we would be grateful if you would report this to us Please visit,

http://www.packtpub.com/submit-errata, selecting your book, clicking

on the Errata Submission Form link, and entering the details

Piracy: If you come across any illegal copies of our works in any form

on the Internet, we would be grateful if you would provide us with thelocation address or website name Please contact us at

< copyright@packtpub.com > with a link to the material

If you are interested in becoming an author: If there is a topic that

you have expertise in and you are interested in either writing or

contributing to a book, please visit http://authors.packtpub.com

Reviews

Please leave a review Once you have read and used this book, whynot leave a review on the site that you purchased it from? Potentialreaders can then see and use your unbiased opinion to make

purchase decisions, we at Packt can understand what you think aboutour products, and our authors can see your feedback on their book.Thank you!

For more information about Packt, please visit packtpub.com

Trang 38

Chapter 1 A Tour of Interactive Computing with Jupyter and

IPython

In this chapter, we will cover the following topics:

Introducing IPython and the Jupyter Notebook

Getting started with exploratory data analysis in the Jupyter

(the name was inspired by the British comedy Monty Python's Flying

Circus) This easy-to-use language is commonly used by system

administrators as a glue language, linking various system componentstogether It is also a robust language for large-scale software

development In addition, Python comes with an extremely rich

standard library (the batteries included philosophy), which covers

string processing, internet protocols, operating system interfaces, andmany other domains

In the last twenty years, Python has been increasingly used for

scientific computing and data analysis as well Other competing

platforms include commercial software such as MATLAB, Maple,

Mathematica, Excel, SPSS, SAS, and others Competing open-sourceplatforms include Julia, R, Octave, and Scilab These tools are

dedicated to scientific computing, whereas Python is a

Trang 39

general-purpose programming language that was not initially designed for

scientific computing

However, a wide ecosystem of tools has been developed to bring

Python to the level of these other scientific computing systems Today,the main advantage of Python, and one of the main reasons why it is

so popular, is that it brings scientific computing features to a purpose language that is used in many research areas and industries.This makes the transition from research to production much easier

general-What is IPython?

IPython is a Python library that was originally meant to improve the

default interactive console provided by Python, and to make it

scientist-friendly In 2011, ten years after the first release of IPython,

the IPython Notebook was introduced This web-based interface to

IPython combines code, text, mathematical expressions, inline plots,interactive figures, widgets, graphical interfaces, and other rich mediawithin a standalone sharable web document This platform provides

an ideal gateway to interactive scientific computing and data analysis.IPython has become essential to researchers, engineers, data

scientists, and teachers and their students

What is Jupyter?

Within a few years, IPython gained an incredible popularity among thescientific and engineering communities The Notebook started to

support more and more programming languages beyond Python In

2014, the IPython developers announced the Jupyter project, an

initiative created to improve the implementation of the Notebook andmake it language-agnostic by design The name of the project reflectsthe importance of three of the main scientific computing languagessupported by the Notebook: Julia, Python, and R

Today, Jupyter is an ecosystem by itself that comprehends severalalternative Notebook interfaces (JupyterLab, nteract, Hydrogen, andothers), interactive visualization libraries, and authoring tools

compatible with notebooks Jupyter has its own conference namedJupyterCon The project received funding from several companies aswell as the Alfred P Sloan Foundation and the Gordon and Betty

Moore Foundation

Trang 40

What is the SciPy ecosystem?

SciPy is the name of a Python package for scientific computing, but itrefers also, more generally, to the collection of all Python tools thathave been developed to bring scientific computing features to Python

In the late 1990s, Travis Oliphant and others started to build efficienttools to deal with numerical data in Python: Numeric, Numarray, and

finally, NumPy SciPy, which implements many numerical computing

algorithms, was also created on top of NumPy In the early 2000s,

John Hunter created Matplotlib to bring scientific graphics to Python.

At the same time, Fernando Perez created IPython to improve

interactivity and productivity in Python In the late 2000s, Wes

McKinney created pandas for the manipulation and analysis of

numerical tables and time series Since then, hundreds of engineersand researchers collaboratively worked on this platform to make SciPyone of the leading open source platforms for scientific computing anddata science

Note

Many of the SciPy tools are supported by NumFOCUS, a nonprofitthat was created as a legal structure to promote the sustainabledevelopment of the ecosystem NumFOCUS is supported by

several large companies including Microsoft, IBM, and Intel

SciPy has its own conferences, too: SciPy (in the US) and EuroSciPy(in Europe) (see https://conference.sci)

What's new in the SciPy ecosystem?

What are some of the main changes in the SciPy ecosystem since thefirst edition of this book, published in 2014? We give here a very briefselection

Tip

Feel free to skip this section if you are new to the platform

The last version of IPython at the time of writing is IPython 6.0,

released in April 2017 It is the first version of IPython that is no longercompatible with Python 2 This decision allowed the developers tomake the internal code simpler and to make better use of the newfeatures of the language

IPython now has a web-based Terminal interface that can be used

Định dạng
Số trang	705
Dung lượng	27,16 MB