About the authorPackt is Searching for Authors Like You Preface Who this book is for What this book covers Part 1 – Interactive Computing with Jupyter Part 2 – Standard Methods in Data S
Trang 2About the author
Packt is Searching for Authors Like You
Preface
Who this book is for
What this book covers
Part 1 – Interactive Computing with Jupyter
Part 2 – Standard Methods in Data Science and AppliedMathematics
To get the most out of this book
Installing Python
GitHub repositories
Download the example code files
Download the color images
Trang 3What's new in the SciPy ecosystem?
How to install Python
Trang 4The IPython terminal
IPython and text editor
The Jupyter Notebook
Integrated Development Environments
Trang 5Writing high-quality Python code
Workflows with unit testing
Unit testing and continuous integration
Debugging code with IPython
The Notebook ecosystem
Architecture of the Jupyter Notebook
Connecting multiple clients to one kernel
Trang 7Why are NumPy arrays efficient?
What is the difference between in-place and implicit-copyoperations?
Why can't some arrays be reshaped without a copy?
What are NumPy broadcasting rules?
Trang 8Compiler-related installation instructions
Using Python to write faster code
Trang 10Exploration, inference, decision, prediction
Univariate and multivariate methods
Frequentist and Bayesian methods
Parametric and nonparametric inference methods
Exploring a dataset with pandas and Matplotlib
How to do it
There's more
Getting started with statistical hypothesis testing — a simple z-test
Trang 11Computation of the posterior distribution
Maximum a posteriori estimation
Pearson's correlation coefficient
Contingency table and chi-squared test
Trang 12Feature selection and feature extraction
Overfitting, underfitting, and the bias-variance tradeoff
Model selection
Machine learning references
Getting started with scikit-learn
Getting ready
How to do it
How it works
scikit-learn API
Ordinary Least Squares regression
Polynomial interpolation with linear regression
Trang 14The objective function
Local and global minima
Constrained and unconstrained optimization
Deterministic and stochastic algorithms
Analog and digital signals
The Nyquist–Shannon sampling theorem
The discrete Fourier transform
Inverse Fourier transform
Trang 15What are linear filters?
Linear filters and convolutions
The FIR and IIR filters
Filters in the frequency domain
The low-, high-, and band-pass filtersThere's more
Trang 18Geographical information systems in Python
Trang 20Index
Trang 21IPython Interactive Computing and Visualization Cookbook
Second Edition
Trang 22IPython Interactive Computing and Visualization
CookbookSecond Edition
Copyright © 2018 Packt Publishing
All rights reserved No part of this book may be reproduced, stored in
a retrieval system, or transmitted in any form or by any means, withoutthe prior written permission of the publisher, except in the case of briefquotations embedded in critical articles or reviews
Every effort has been made in the preparation of this book to ensurethe accuracy of the information presented However, the informationcontained in this book is sold without warranty, either express or
implied Neither the author, nor Packt Publishing or its dealers anddistributors, will be held liable for any damages caused or alleged tohave been caused directly or indirectly by this book
Packt Publishing has endeavored to provide trademark informationabout all of the companies and products mentioned in this book by theappropriate use of capitals However, Packt Publishing cannot
guarantee the accuracy of this information
Commissioning Editor: Veena Pagare
Acquisition Editor: Dominic Shakeshaft
Project Editor: Suzanne Coutinho
Technical Editors: Bhagyashree Rai, Nidhisha Shetty
Proofreader: Safis Editing
Indexer: Aishwarya Gangawane
Graphics: Tom Scaria
Production Coordinator: Shantanu Zagade
First published: September 2014
Second Edition: January 2018
Trang 23mapt.io
Mapt is an online digital library that gives you full access to over 5,000books and videos, as well as industry leading tools to help you planyour personal development and advance your career For more
information, please visit our website
Why subscribe?
Spend less time learning and more time coding with practicaleBooks and Videos from over 4,000 industry professionals
Learn better with Skill Plans built especially for you
Get a free eBook or video every month
Mapt is fully searchable
Copy and paste, print, and bookmark content
Trang 24Did you know that Packt offers eBook versions of every book
published, with PDF and ePub files available? You can upgrade to theeBook version at www.PacktPub.com and as a print book customer,you are entitled to a discount on the eBook copy Get in touch with us
at < service@packtpub.com > for more details
At www.PacktPub.com, you can also read a collection of free technicalarticles, sign up for a range of free newsletters, and receive exclusivediscounts and offers on Packt books and eBooks
Trang 25About the author
Cyrille Rossant, PhD, is a neuroscience researcher and software
engineer at University College London He is a graduate of ÉcoleNormale Supérieure, Paris, where he studied mathematics andcomputer science He has also worked at Princeton University andCollège de France While working on data science and softwareengineering projects, he has gained experience in numerical
computing, parallel computing, and high-performance data
visualization
He is the author of Learning IPython for Interactive Computing and
Data Visualization, Second Edition, Packt Publishing, the prequel of
this cookbook
I'm grateful to everyone who gave their feedback on this book,including Matthias Bussonnier, Thomas Caswell, Guillaume Gay,Brian Granger, Matthew Rocklin, Steven Silvester, and Jake
VanderPlas I'd also like to thank my family for their support
Trang 26Packt is Searching for Authors Like You
If you're interested in becoming an author for Packt, please visit
authors.packtpub.com and apply today We have worked with
thousands of developers and tech professionals, just like you, to helpthem share their insight with the global tech community You canmake a general application, apply for a specific hot topic that we arerecruiting an author for, or submit your own idea
Trang 27We are becoming awash in the flood of digital data from scientificresearch, engineering, economics, politics, journalism, business, andmany other domains As a result, analyzing, visualizing, and
harnessing data is the occupation of an increasingly large and diverseset of people Quantitative skills such as programming, numericalcomputing, mathematics, statistics, and data mining, which form thecore of data science, are more and more appreciated in a seeminglyendless plethora of fields
Python, a widely-known programming language, is also one of theleading open platforms for data science IPython is a mature Pythonproject that provides scientist-friendly interactive access to Python It
is part of the broader Project Jupyter, which aims to provide quality environments for interactive computing, data analysis,
high-visualization, and the authoring of interactive scientific documents.Jupyter is estimated to have several million users today
The prequel of this book, Learning IPython for Interactive Computing
and Data Visualization Second Edition, Packt Publishing was
published in 2015, two years after the first edition It is a level introduction to data science and numerical computing with
beginner-Python, Ibeginner-Python, and Jupyter
This book, the first edition of which was published in 2014, continuesthat journey by presenting more than 100 recipes for interactive
scientific computing and data science These recipes not only coverprogramming topics such as numerical computing, high-performancecomputing, parallel computing, and interactive visualization, but alsodata analysis topics such as statistics, data mining, machine learning,signal processing, graph theory, numerical optimization, and manyothers
This second edition is fully compatible with the latest versions of theplatform and its libraries It includes new recipes to better leverage thelatest features of Python 3, and it introduces promising new projectssuch as JupyterLab, Altair, and Dask
Note
By design, this book privileges breadth over depth A particularlywide range of libraries and techniques are covered in this book,
Trang 28but not comprehensively We give many references that let youdeepen your knowledge of individual methods The goal of thisbook is not to make you an expert of the subjects covered, but togive you a glimpse of the extremely diverse set of applications thatyou can tackle with the platform.
All the recipes in this book, which cover a specific techniques, areavailable online as a Jupyter notebook This interactive documentlets you read, execute, and modify the code interactively, whichmakes the learning process more engaging and dynamic
Almost all of this book's content is available online on the GitHubplatform (http://ipython-books.github.io/) Updates and correctionswill be regularly published there, so you should make sure youcheck out the latest version of the book online
Trang 29Who this book is for
This book targets researchers, engineers, data scientists, teachers,students, analysts, journalists, economists, and hobbyists interested indata analysis and numerical computing
Readers familiar with the scientific Python ecosystem will find manyresources to sharpen their skills in high-performance interactive
computing with IPython and Jupyter
Readers who need to implement algorithms for domain-specific
applications will appreciate the introductions to a wide variety of topics
in data analysis and applied mathematics
Readers who are new to numerical computing with Python should
start with the prequel of this book, Learning IPython for Interactive
Computing and Data Visualization Second Edition, Packt Publishing
published in 2015
Trang 30What this book covers
This book is split into two parts:
Part 1 (chapters 1 to 6) covers relatively advanced methods in
interactive numerical computing, high-performance computing, anddata visualization
Part 2 (chapters 7 to 15) introduces standard methods in data
science and mathematical modeling Many of these methods are
applied to real-world data
Part 1 – Interactive Computing with Jupyter
Chapter 1, A Tour of Interactive Computing with Jupyter and IPython,
contains a brief introduction to data analysis and numerical computingwith IPython and Jupyter It not only covers common packages such
as Python, NumPy, pandas, and Matplotlib, but also advanced
IPython/Jupyter topics such as interactive widgets in the Notebook,custom magic commands, configurable IPython extensions, and
custom Jupyter kernels
Chapter 2, Best Practices in Interactive Computing, details best
practices to write reproducible, high-quality code: task automation,version control with Git, workflows with IPython and Jupyter, unit
testing, continuous integration, debugging, and other related topics.The importance of these subjects in computational research and dataanalysis cannot be overstated
Chapter 3, Mastering the Jupyter Notebook, covers topics related to
the Jupyter Notebook, notably the Notebook format, notebook
conversions, and interactive widgets
Chapter 4, Profiling and Optimization, covers methods to make your
code faster and more efficient: CPU and memory profiling in Python,advanced optimization techniques with NumPy (including large arraymanipulations), and memory mapping of huge arrays These
techniques are essential for big data analysis
Chapter 5, High-Performance Computing, covers techniques to make
your code much faster: code acceleration with Numba and Cython,wrapping C libraries in Python with ctypes, parallel computing with
IPython and Dask, OpenMP, and General-Purpose Computing on Graphics Processing Units (GPGPU) with CUDA The chapter ends
with an introduction to the Julia language, a high-performance
Trang 31numerical computing programming language that can be used in theJupyter Notebook.
Chapter 6, Data Visualization, introduces several visualization or
interactive visualization libraries, such as matplotlib, seaborn, bokeh,D3, Altair, and others.
Part 2 – Standard Methods in Data Science
and Applied Mathematics
Chapter 7, Statistical Data Analysis, covers methods for getting
insights into data It introduces classic frequentist and Bayesian
methods for hypothesis testing, parametric and nonparametric
estimation, and model inference The chapter leverages Python
libraries such as pandas, SciPy, statsmodels, and PyMC The last recipeintroduces the statistical language R, which can be easily used in theJupyter Notebook
Chapter 8, Machine Learning, covers methods to learn and make
predictions from data Using the scikit-learn Python package, thischapter illustrates fundamental data mining and machine learningconcepts such as supervised and unsupervised learning,
classification, regression, feature selection, feature extraction,
overfitting, regularization, cross-validation, and grid search Algorithmsaddressed in this chapter include logistic regression, Naive Bayes, K-nearest neighbors, support vector machines, random forests, andothers These methods are applied to various types of datasets:
numerical data, images, and text
Chapter 9, Numerical Optimization, covers minimizing and maximizingmathematical functions This topic is pervasive in data science,
notably in statistics, machine learning, and signal processing Thischapter illustrates a few root-finding, minimization, and curve-fittingroutines with SciPy
Chapter 10, Signal Processing, covers extracting relevant informationfrom complex and noisy data These steps are sometimes requiredprior to running statistical and data mining algorithms This chapterintroduces basic signal processing methods such as Fourier
transforms and digital filters
Chapter 11, Image and Audio Processing, covers signal processing
methods for images and sounds It introduces image filtering,
Trang 32segmentation, computer vision, and face detection with scikit-imageand OpenCV It also presents methods for audio processing and
synthesis
Chapter 12, Deterministic Dynamical Systems, describes the
dynamical processes underlying particular types of data It illustratessimulation techniques for discrete-time dynamical systems, as well asfor ordinary differential equations and partial differential equations
Chapter 13, Stochastic Dynamical Systems, describes the dynamical
random processes underlying particular types of data It illustratessimulation techniques for discrete-time Markov chains, point
processes, and stochastic differential equations
Chapter 14, Graphs, Geometry, and Geographic Information Systems,
covers analysis and visualization methods for graphs, flight networks,road networks, maps, and geographic data
Chapter 15, Symbolic and Numerical Mathematics, introduces SymPy,
a computer algebra system that brings symbolic computing to Python.The chapter ends with an introduction to Sage, another Python-basedsystem for computational mathematics
Trang 33To get the most out of this
book
This book is accessible to beginners However, it may be easier for
you if you are familiar with the contents of Learning IPython for
Interactive Computing and Data Visualization, Second Edition, Packt Publishing (also called the "IPython minibook"), the prequel of this
book The minibook introduces Python programming, the IPythonconsole, the Jupyter Notebook, numerical computing with NumPy,basic data analysis with pandas, and plotting with Matplotlib Thisbook tackles scientific programming topics that rely on all of thesetools
Part 2 is a bit more theoretical It is easier to read if you know the
basics of calculus, linear algebra, and probability theory (real-valuedfunctions, integrals and derivatives, differential equations, matrices,vector spaces, probabilities, random variables, and so on) Thesechapters introduce different topics in data science and applied
mathematics, and how to apply them with Python: statistics, machinelearning, numerical optimization, signal processing, dynamical
systems, graph theory, and others
Installing Python
This book uses the free Anaconda distribution
(https://www.anaconda.com/download/) It includes Python 3, IPython,Jupyter, and almost all of the packages that we will be using in thisbook Anaconda also includes a powerful packaging system namedConda The introduction of this book's first chapter gives you moredetails
The code of this book has been written for Python 3 and is
incompatible with older versions of Python, Python 2 (although
minimal to no changes would be required to make it compatible)
GitHub repositories
This book has a website: http://ipython-books.github.io The text, thecode, and the data from the book are available on several GitHubrepositories at https://github.com/ipython-books/ You can also run thecode interactively in your web browser without installing anything on
Trang 34your computer, thanks to the Binder project.
Be sure to check out http://ipython-books.github.io and the
repositories to get the latest updates and corrections You can alsopropose your own corrections and suggestions on GitHub by openingissues or pull requests
You can also follow the author online (http://cyrille.rossant.net) and onTwitter (@cyrillerossant)
Download the example code files
You can download the example code files for this book from your
account at http://www.packtpub.com If you purchased this book
elsewhere, you can visit http://www.packtpub.com/support and
register to have the files emailed directly to you
You can download the code files by following these steps:
1 Log in or register at http://www.packtpub.com
2 Select the SUPPORT tab.
3 Click on Code Downloads & Errata.
4 Enter the name of the book in the Search box and follow the
on-screen instructions
Once the file is downloaded, please make sure that you unzip or
extract the folder using the latest version of:
WinRAR / 7-Zip for Windows
Zipeg / iZip / UnRarX for Mac
7-Zip / PeaZip for Linux
The code bundle for the book is also hosted on GitHub at
https://github.com/PacktPublishing/ We also have other code bundlesfrom our rich catalog of books and videos available at
https://github.com/PacktPublishing/ Check them out!
Download the color images
We also provide a PDF file that has color images of the
screenshots/diagrams used in this book You can download it here:
https://www.packtpub.com/sites/default/files/downloads/
Conventions used
There are a number of text conventions used throughout this book.CodeInText: Indicates code words in text, database table names, foldernames, filenames, file extensions, pathnames, dummy URLs, user
Trang 35input, and Twitter handles Here is an example:«"
A block of code is set as follows:
Bold: Indicates a new term, an important word, or words that you see
on the screen, for example, in menus or dialog boxes, also appear in
the text like this Here is an example: "Select System info from the Administration panel."
Trang 36In this book, you will find several headings that appear frequently
(Getting ready, How to do it , How it works , There's more , and
See also).
To give clear instructions on how to complete a recipe, use thesesections as follows:
Getting ready
This section tells you what to expect in the recipe and describes how
to set up any software or any preliminary settings required for therecipe
How to do it…
This section contains the steps required to follow the recipe
How it works…
This section usually consists of a detailed explanation of what
happened in the previous section
Trang 37Get in touch
Feedback from our readers is always welcome
General feedback: Email < feedback@packtpub.com > and mention thebook's title in the subject of your message If you have questions
about any aspect of this book, please email us at
< questions@packtpub.com >
Errata: Although we have taken every care to ensure the accuracy of
our content, mistakes do happen If you have found a mistake in thisbook we would be grateful if you would report this to us Please visit,
http://www.packtpub.com/submit-errata, selecting your book, clicking
on the Errata Submission Form link, and entering the details
Piracy: If you come across any illegal copies of our works in any form
on the Internet, we would be grateful if you would provide us with thelocation address or website name Please contact us at
< copyright@packtpub.com > with a link to the material
If you are interested in becoming an author: If there is a topic that
you have expertise in and you are interested in either writing or
contributing to a book, please visit http://authors.packtpub.com
Reviews
Please leave a review Once you have read and used this book, whynot leave a review on the site that you purchased it from? Potentialreaders can then see and use your unbiased opinion to make
purchase decisions, we at Packt can understand what you think aboutour products, and our authors can see your feedback on their book.Thank you!
For more information about Packt, please visit packtpub.com
Trang 38Chapter 1 A Tour of Interactive Computing with Jupyter and
IPython
In this chapter, we will cover the following topics:
Introducing IPython and the Jupyter Notebook
Getting started with exploratory data analysis in the Jupyter
(the name was inspired by the British comedy Monty Python's Flying
Circus) This easy-to-use language is commonly used by system
administrators as a glue language, linking various system componentstogether It is also a robust language for large-scale software
development In addition, Python comes with an extremely rich
standard library (the batteries included philosophy), which covers
string processing, internet protocols, operating system interfaces, andmany other domains
In the last twenty years, Python has been increasingly used for
scientific computing and data analysis as well Other competing
platforms include commercial software such as MATLAB, Maple,
Mathematica, Excel, SPSS, SAS, and others Competing open-sourceplatforms include Julia, R, Octave, and Scilab These tools are
dedicated to scientific computing, whereas Python is a
Trang 39general-purpose programming language that was not initially designed for
scientific computing
However, a wide ecosystem of tools has been developed to bring
Python to the level of these other scientific computing systems Today,the main advantage of Python, and one of the main reasons why it is
so popular, is that it brings scientific computing features to a purpose language that is used in many research areas and industries.This makes the transition from research to production much easier
general-What is IPython?
IPython is a Python library that was originally meant to improve the
default interactive console provided by Python, and to make it
scientist-friendly In 2011, ten years after the first release of IPython,
the IPython Notebook was introduced This web-based interface to
IPython combines code, text, mathematical expressions, inline plots,interactive figures, widgets, graphical interfaces, and other rich mediawithin a standalone sharable web document This platform provides
an ideal gateway to interactive scientific computing and data analysis.IPython has become essential to researchers, engineers, data
scientists, and teachers and their students
What is Jupyter?
Within a few years, IPython gained an incredible popularity among thescientific and engineering communities The Notebook started to
support more and more programming languages beyond Python In
2014, the IPython developers announced the Jupyter project, an
initiative created to improve the implementation of the Notebook andmake it language-agnostic by design The name of the project reflectsthe importance of three of the main scientific computing languagessupported by the Notebook: Julia, Python, and R
Today, Jupyter is an ecosystem by itself that comprehends severalalternative Notebook interfaces (JupyterLab, nteract, Hydrogen, andothers), interactive visualization libraries, and authoring tools
compatible with notebooks Jupyter has its own conference namedJupyterCon The project received funding from several companies aswell as the Alfred P Sloan Foundation and the Gordon and Betty
Moore Foundation
Trang 40What is the SciPy ecosystem?
SciPy is the name of a Python package for scientific computing, but itrefers also, more generally, to the collection of all Python tools thathave been developed to bring scientific computing features to Python
In the late 1990s, Travis Oliphant and others started to build efficienttools to deal with numerical data in Python: Numeric, Numarray, and
finally, NumPy SciPy, which implements many numerical computing
algorithms, was also created on top of NumPy In the early 2000s,
John Hunter created Matplotlib to bring scientific graphics to Python.
At the same time, Fernando Perez created IPython to improve
interactivity and productivity in Python In the late 2000s, Wes
McKinney created pandas for the manipulation and analysis of
numerical tables and time series Since then, hundreds of engineersand researchers collaboratively worked on this platform to make SciPyone of the leading open source platforms for scientific computing anddata science
Note
Many of the SciPy tools are supported by NumFOCUS, a nonprofitthat was created as a legal structure to promote the sustainabledevelopment of the ecosystem NumFOCUS is supported by
several large companies including Microsoft, IBM, and Intel
SciPy has its own conferences, too: SciPy (in the US) and EuroSciPy(in Europe) (see https://conference.sci)
What's new in the SciPy ecosystem?
What are some of the main changes in the SciPy ecosystem since thefirst edition of this book, published in 2014? We give here a very briefselection
Tip
Feel free to skip this section if you are new to the platform
The last version of IPython at the time of writing is IPython 6.0,
released in April 2017 It is the first version of IPython that is no longercompatible with Python 2 This decision allowed the developers tomake the internal code simpler and to make better use of the newfeatures of the language
IPython now has a web-based Terminal interface that can be used