Introduction to Machine Learning Introduction to Machine Learning Tanujit Chakraborty Indian Statistical Institute, Kolkata Email tanujitisigmail com July 10, 2019 Talk by Tanujit Chakraborty Worksho.
Trang 1Introduction to Machine Learning
Trang 2Statistics
e “Statistics is the universal tool of inductive inference, research in natural and social sciences, and technological applications
Statistics, therefore, must always have purpose, either in the pursuit
of knowledge or in the promotion of human welfare”
- P.C Mahalanobis, Father of Statistics in India
e Role of Statistics:
@ making inference from samples
@ development of new methods for complex data sets
© quantification of uncertainty and variability
e Remember: “Figure won't lie, but liars figure”
Trang 3Machine Learning
e “Machine learning is the field of study that gives computers the ability to learn without being explicitly programmed”
- Arthur L Samuel, Al pioneer
e Role of Machine Learning: efficient algorithms to
@ solve an optimization problem
@ represent and evaluate the model for inference
© create programs that can automatically learn rules from data
e Remember: “Prediction is very difficult, especially if it’s about the future” - - Niels Bohr, Father of Quantum
Trang 4Introduction to Machine Learning
e Designing algorithms that ingest data and learn a model of the data
e The learned model can be used to
@ Detect patterns/structures/themes/trends etc in the data
@ Make predictions about future data and make decisions
Blue: Whole Data
2 | Green: Training Set I
e Modern ML algorithms are heavily “data-driven”
e Optimize a performance criterion using example data or past
experience
Talk by Tanujit Chakraborty Workshop on Data analytics
Trang 5
Taxonomy for Machine Learning
Machine learning provides systems the ability to automatically learn
Learning using labeled data Learning using unlabeled data (usually considered harder)
7
Many other specialized flavors of ML also exist,
some of which include
RL doesn't use “labeled” or
“unlabeled” data in the traditional
sense! In RL, an agentlearns via _ „
its Interactions with an environment
(feedback-driven “policy” learning)
Talk by Tanujit Chakraborty Workshop on Data analytics
Trang 6
A Typical Supervised Learning Workflow (for Classification)
Supervised Learning: Predicting patterns in the data
Trang 7
A Typical Unsupervised Learning Workflow (for Clustering)
Unsupervised Learning: Discovering patterns in the data
Note: Unsupervised Learning too can Cluster 1 fA \
have (and often has) a “test” phase
E.g., in this case, given a new cat/dog A
image, predict which of the two
clusters it belongs to / Ni |
3 d đg : | E y
Can do it by assigning the image to the |
cluster with closer centroid \=
Trang 8
A Typical Reinforcement Learning Workflow
Reinforcement Learning: Learning a” policy” by performing actions and getting
rewards (e.g, robot controls, beating games)
- Senses/observes the environment
“— | ` - Takes an action based on its current policy
Ỷ - Receives a reward for that action
- Updates its policy
Trang 9Classification
Example: Credit scoring
Differentiating between low-risk and
high-risk customers from their income and
savings
Discriminant: IF Income > 6; AND
Savings > 02 THEN low-risk ELSE
high-risk
Classification: Learn a linear/nonlinear
separator (the “model”) using training
data consisting of input-output pairs (each
output is discrete-valued “label” of the
corresponding input)
Use it to predict the labels for new “test”
inputs
Other Applications: Image Recognition,
Spam Detection, Medical Diagnosis
Trang 10Regression
e@ Example: Price of a used car
e X: car attributes; Y : price and
Y = f(X, 6)
e f( ) is the model and Ø is the model
parameters
@ Regression: Learn a line/curve (the
“model” ) using training data consisting of
Input-output pairs (each output is a
Process Improvement, Weather =
Forecasting
Talk by Tanujit Chakraborty Workshop on Data analytics
Trang 11Clustering: Learn the grouping
structure for a given set of
unlabeled inputs
Homogeneous groups as latent
structure: Clustering
Other Applications: Topic
Modelling, Image Segmentation,
Social Networking
Talk by Tanujit Chakr:
Original unclustered data
Clustered data
Trang 12
Dimensionality Reduction
e Low-dimensional latent structure:
Dimensionality Reduction
@ Goal: Learn a Low-dimensional
representation for a given set of
high-dimensional inputs
e@ Note: DR also comes in
supervised flavors (supervised
Trang 13A Simple Example: Fitting a Polynomial
e The green curve is the true function
(which is not a polynomial)
@ We will use a loss function that
measures the squared error in the
prediction of y(x) from x The loss for
the red polynomial is the sum of the
squared vertical errors
Talk by Tanujit Chakraborty Workshop on Data analytics
Trang 14Some fits to the data: which is best?
The right model complexity?
Desired: hypotheses that are not too simple, not too complex (so as to not overfit on
the training data)
Talk by Tanujit Chakraborty Workshop on Data analytics
Trang 15Overfitting and Generalization
e Doing well on the training data is not
enough for an ML algorithm
@ Trying to do too well (or perfectly) on
training data may lead to bad
“generalization”
@ Generalization: Ability of an ML
algorithm to do well on future “test”
data
@ Simple models/functions tend to
prevent overfitting and generalize well:
A key principle in designing ML
algorithms (called “regularization” )
e@ No Free Lunch Theorem
Talk by Tanujit Chakraborty Workshop on Data analytics
Trang 16Probabilistic Machine Learning
e Supervised Learning (“predict y given x’) can be thought of as estimating
p(Y|X)
[ ` ng “dog” — [see p(image, class) mam p(class|image)
patti xoa = : A two-step approach “generative modeling”
“cat”
Unlabeled Training Data
e@ Harder for Unsupervised Learning because there is no supervision y
Talk by Tanujit Chakraborty Workshop on Data analytics
Trang 17Function Approximation in Machine Learning
a
‘fle Xe latent representation
Ve ==> | : image — of image (e.g., cluster id
Trang 18
Machine Learning: A Brief Timeline and Some Milestones
- Minsky & Edmonds’ neural net machine (SNARC)
- Arthur Samuels’ Checkers Player based on Machine Learning
- Rosenblatt's Perceptron algo
- Origins of Bayes Theorem:
Thomas Bayes and Pierre-Simon
Laplace (~1800)
- Least Squares method: Legendre
(1805)
- Widrow-Hoff's ADALINE algo
- K-means clustering algo (Lloyd)
- Early origins of Reinforcement
- Neural nets slumber due to lack of compute power
- Support Vector Machines (SVM), Kernel methods, Bayesian methods
- Random Forests, Boosting
- Continued work on neural nets for images, sequences (CNN, LSTM, etc)
- Automatic Differentiation (later’
- Software frameworks (e.g., Tensorflow, PyTorch, ease implementing ML algos)
- Drones, self-driving cars, etc
- A lot of industry/media focus, excitement and hype
- Focus on Fairness, Accountability, and Transparency in ML algorithms
- Early works on PCA (Peason), Factor
Analysis (Spearman), CCA (by
Hotelling), for exploratory data analysis
- Early works on Discriminant Analysis
methods (Fisher) for classification
- McCulloch-Pitts model of the
- Nearest Neighbors algorithm
- Early works on Genetic Algorithms inspired by natural evolution (John Holland)
- Multi-layer Perceptrons (can learn nonlinear functions)
Sunny days of Al are back!
- The Backpropagation algorithm
to train deep neural nets
- Decision Trees, ID3 (Quinlan)
- NetTalk: Neural nets that can lear to pronounce English words
- Modern Reinforcement Learnint
| 2010-2020
—
- Continued focus on classical
statistical and probabilistic models,
connections b/w learning & cognition
as opposed to rule-based methods
- ML based algorithm wins the Netflix Challenge
- Neural nets re-emerged and rebranded as Deep Learning (Hinton, Bengio, LeCun, Ng, and others), thanks to improved training, GPUs
Workshop on Data analytics
Trang 19arning in the real-world
Broadly applicable in many domains (e.g., internet, robotics, healthcare and biology, computer vision, NLP, databases, computer systems, finance, etc.)
Trang 20Machine Learning helps Natural Language Processing
ML algorithm can learn to translate text
English ¥ $ s) er Hindi + 0D })
Trang 21Machine Learning meets Speech Processing
Trang 22Machine Learning helps Computer Vision
e@ Automatic generation of text captions for images:
A convolutional neural network is trained to interpret images, and its output is
then used by a recurrent neural network trained to generate a text caption
e@ The sequence at the bottom shows the word-by-word focus of the network on
different parts of input image while it generates the caption word-by-word
Input image Convolutional feature extraction RNN with attention over image Word by word
generation
flying over
Trang 23Machine Learning helps Recommendation systems
® A recommendation system is a machine-learning system that is based on data
that indicate links between a set of a users (e.g., people) and a set of items (e.g.,
products)
e@ A link between a user and a product means that the user has indicated an interest
in the product in some fashion (perhaps by purchasing that item in the past)
e@ The machine-learning problem is to suggest other items to a given user that he or she may also be interested in, based on the data across all users
Trang 24Machine Learning helps Chemistry
(xÿˆ ————> Zz ——x (XJ) Zz ——> X
RL: Reinforcement learning RNN: Recurrent neural network | Hybrid approaches
Policy gradient with Monte Carlo tree search (MCTS)
Incomplete Next Reward
SMILES action upon
(state) (char) MC search completion Metrics
‘Inverse molecular design using machine learning: Generative models for matter engineering (Science=2018)
Talk by Tanujit Chakraborty Workshop on Data analytics
Trang 25rning helps lmage Recognition
Trang 26Biology
Images Convolutional Fully connected
of neurons layers layers
Talk by Tanujit Chakra
Machine Learning helps Many Other Areas
Finance
Workshop on Data analytics
Trang 27Textbook and References
\ Machine Learning cee
Talk by Tanujit Chakr: Workshop on Data analytics
Trang 30
Talk by Tanujit Chakraborty Workshop on Data analytics