Microsoft Word Demo Draft docx Draft of Chapter 1 With Regards Ritesh Bhagwat Note Following is a draft version Introduction to Data Driven computing AI The journey into the world of artificial inte.
Trang 1Draft of Chapter 1
Trang 2
Note: Following is a draft version
Introduction to Data Driven computing & AI
The journey into the world of artificial intelligence is extraordinary It is extraordinary
because it shows us that by just changing our perspective towards something we already know; we can learn something new and amazing The base of all Artificial Intelligence is built
on things that we all have probably already studied in our school and college If someone has studied math up to high school level, then chances are that there would be nothing new
in this book But I can assure you that we will learn something new from all the things that
we already know Everything will be built on things that we know This is the very beauty of Artificial Intelligence So, let us get started with this journey of a lifetime
When I was growing up as a teenager in the 1990’s the way to stand out in a conversation was by talking intelligent scientific things If we knew what is a “light year” or what does supersonic mean, we appeared Intelligent If we could do a complex Math calculation
quickly, we were hailed as a genius All these traits were essentially about remembering data and manipulating it For a very long-time human intelligence is judged on our memory and how we process the memory (data) stored in our brains With the arrival of
smartphones and similar technology, the definition of Human intelligence is going to
change With Artificial Intelligence, all the data processing is done in a gadget as small as our phone, the need to remember or memorize data will not be so meaningful But we as
human beings will have to evolve at a higher level May be Knowledge and information will
be for a Machine and wisdom will be for us humans which is a good thing as we will evolve
in consciousness
If you are reading this book, I’m sure you must have heard that AI is taking over and AI is going to change the world Have you ever wondered what is AI? How is this AI or modern computing different from traditional computing? Let us do a fun activity to understand why
we need AI and what type of question does AI try to solve
The following table lists down 10 famous personalities of the world along with their domain
of work and gender
S
1 Roger Federer Sports M
2 Sachin Tendulkar Sports M
3 Mahatma Gandhi Leadership M
5 Nelson Mandela Leadership M
6 Robert Downey Junior Art & Movies M
7 Tom Cruise Art & Movies M
Trang 38 Steve Jobs Tech M
9 Scarlett Johansson Art & Movies F
10 Bill gates Tech & Philanthropy M
Now try to answer the following questions
1 How many of the above personalities are female?
2 Name the personalities whose domains of work is Tech
3 Are there any female personalities included in the above list whose domain of work
is Art and movies?
4 Who are the top two most popular personalities from the list?
5 Whose domain of work is the best from the above personalities?
The answers to the first 3 questions are very simple
1 2
2 Steve Jobs & Bill Gate
3 Yes (Scarlett Johansson)
How about the fourth and the fifth questions? Do we have a universal answer to the fourth and the fifth question? No, we don’t These questions are subjective, not well defined or are vague in nature For every individual the definition of a famous personality or best domain
of work is different
If we have 1000 respondents to answer these 5 questions:
• When correct, all the 1000 respondents will give the same answers to 1st three questions
• We will most probably get different answers for the 4th & 5th question
Let us talk about the example in a different way We can get the answers to the first three questions by setting rules We can write a simple program that will scan the domain of work and gender of our personalities and we will get the answers So, we have data we set rules and we get answers
Trang 4Now how can we answer the fourth and the fifth question? The best we can do is ask all our
1000 respondents to vote with their answers The most common answer becomes the rule
If for the fifth question the most common answer is that the best line of work is “Art & Movies” then it becomes our answer Do remember if we change the number of our
respondents then the answer can also change So, what we are doing here is we have data
we give answers to the data and we get a rule
We can see the difference between the two approaches One is rule-driven, and another is answer driven
A problem to which we cannot set rules to get answers qualifies to be a problem
which should be solved by Artificial Intelligence
In the context of AI, the problems to which we can give solutions with rules come under the category of traditional or classical computing and the one where we can’t set rules are the area of Artificial Intelligence which I loosely term as Modern computing
Do note that the rule-based questions can also be solved by artificial intelligence, but it is not worth it to solve those with AI as it is computationally expensive It is like you have a pizza, a knife and a sword You should always use a knife to cut the pizza and not the sword You can cut the pizza with a sword, but it is not worth it
Now that we have a bit of understanding about AI, let us try and understand with a simple example of what we mean when we say that everything in AI is built of things that we already know
Suppose you run an OTT platform like Netflix of Amazon Prime and you have three loyal customers Steve, Natasha and Tony There customers watch movies and give them ratings
A fourth customer Scott logs into the platform and watches two movies, Iron Man and Jerry Maguire and has given his rating to the movies We have to recommend Scott more movies
to keep him hooked to our platform
Trang 5
How can we do this using the data that we have? If we can figure out a way by which we can know who out of Steve, Tony and Natasha has movie preferences like Scott then we can recommend Scott other movies watched by that customer To our surprise we can use high school math to do this! Let us see how it works
Let’s assume the ratings given by customers is represented in the following table
Iron Man Jerry Maguire
Scott logs in
Trang 6Easiest way to see which of the three customers’ preference is closest to Scott is by
subtracting the scores given by Scott from the scores given by other customers In other
words, we can find the “distance” between Scotts’ score from score of Steve, Natasha &
Tony The simplest way to see the distance is by subtracting the corresponding values of the movie rating given by Scott and given by other customers
Let us say the rating of Iron man is represented by x and Jerry Maguire by y So, the distance can be calculated by the formula:
• |x1-x2| + |y1-y2|
Where:
x1 = Rating of Iron man by old customer (Steve or Natasha or Tony)
x2: Rating of Iron man by Scott y1: Rating of Jerry Maguire by old customer (Steve or Natasha or Tony) y2: Rating of Jerry Maguire by Scott
|K| represents the absolute value of K which will always be positive If K= 3 then
• |3| =3
• |-3| =3
This process of computing distances using absolute value is known as Manhattan Distance
Iron Man Jerry Maguire
Scott logs in
Referring the above table, the Manhattan distance between:
• Scott and Steve = |4-2| + |2-1| = 3
• Scott & Natasha = |4-3| + |2-3| =2
• Scott & Tony = |4-5| + |2-4| =2
So, we can see that the Manhattan distance between Natasha and Scott is the lowest of all
three hence Natasha’s movie preferences should be closer to that of Scott We can go ahead and recommend Scott, all the other movies watched by Natasha and highly rated by her Chances are that he will also like those movies
Trang 7This was the Manhattan distance We all are more familiar with something known as
Euclidean distances Euclidian distance can also be used to solve the same problem
Euclidian distance is calculated by formula:
• x1 = Rating of Iron man by old customer (Steve or Natasha or Tony)
• x2: Rating of Iron man by Scott
• y1: Rating of Jerry Maguire by old customer (Steve or Natasha or Tony)
• y2: Rating of Jerry Maguire by Scott
Euclidean distance between
• Scott and Steve: Sqrt (5) = 2.24
• Scott and Natasha: Sqrt (2) = 1.41
• Scott and Tony: Sqrt (5) = 2.24
We can see that the Euclidean distance between Natasha and Scott is the lowest In the case
of Euclidean distance, we have another way of representing the dataset We can represent the dataset in an X-Y coordinate system
Here x axis represents Iron man and y axis represents Jerry Maguire Steve has given a rating
of 2 to iron man and 1 to Jerry Maguire so the coordinate representation is (2 ,1) Same
Trang 8concepts follow to everyone’s ratings Just looking at the plot here, we can see that Natasha and Scott are closest to each other and hence they have same preferences of the movies
We have two movies so we have a two-dimensional space, if we had three movies, we would have moved up to a three-dimensional space and if we have n number of movies, we can move to n dimensional space
By the way, the problem that we just solved is known as Collaborative filtering Essentially,
we just built a recommendation engine using collaborative filtering And all that by just
using school math! How cool is that!
Fun Fact
“We studied about two distances namely Manhattan distance and Euclidean
distances These distances come from a family of distances known as Minkowski
distance A general formula for Minkowski distances is:
In this formula of Minkowski distance if:
p= 1 then it is known as Manhattan distance p= 2 then it is known as Euclidean distance There are many other distances in Minkowski family like hamming distance where
p =0 and so on You can google about other distances as those distances are
beyond the scope of our book The most used distance is the Euclidean distance”
Machine Learning: What does it mean?
At its core machine learning is the ability of a system to learn on its own without being explicitly programmed What sets Machine learning apart from traditional computing is its
“human-like” ability to learn on its own
As kids, we all have made that mistake of touching something that is very hot That burning sensation is unforgettable But what we learn from that experience is never touch
something that is hot In a similar way when a machine is exposed to some data it
remembers that data and makes its decisions based on that memory that it gathered by that data
Trang 9What do we mean when we say human like ability to make decisions? Let’s say it is raining heavily outside and a friend comes to our home as asks you to go for a picnic How will we decide whether it’s worth going for the picnic in heavy rains? You do it on your past experience right? To put it down into a process there are roughly three steps involved
• Recall: Recall what happened in a scenario
• Process: Think of the scenario
• Decide: Take a decision
Applying this 3-point technique to our Picnic decision
• Recall : Whenever it rains very heavily, traffic is hit badly It happened last time
• Process : It is raining very heavily now so traffic should be hit badly
• Decide : Let us stay at home and have a hot cup of green tea!
Humans make decisions based on experience The experience of machines is Data
Machines make decisions based on Data
But how does a machine get experience Let’s try to understand this from the following example Suppose we have a dataset of 5 patients with their blood pressure and whether the patient has a heart disease or not
Patient No Blood Pressure Heart Disease
1 High Yes
2 High Yes
3 High Yes
4 High Yes
5 Normal No
6 High ???
Trang 10Based on the 5 data points we want to predict the heart condition of a 6th patient who has high Blood pressure We pass the data to the machine and ask for an answer The machine will scan this data and find a “pattern” that whoever has a high blood pressure also has heart disease so it is highly likely the machine will tell us that the 6th patient has a heart problem So, the answer is Yes
You can also notice here that we could have also come up to this conclusion by just using the statistical concept of correlation between Blood Pressure and Heart Disease Statistics and Statistical modeling play a very important role in the field of Machine learning
In the context of Artificial Intelligence, which as we studied earlier, is answer driven and not rule-driven, it is also important to note here that to identify whether the sixth patient has heart disease or not, we
• Gave the machine answers in the form of the records of 5 patients
• The machine in response gave us a rule that as per the data whoever has high blood pressure also has heart disease
Important to note here that it may not be medically correct but is correct with respect to the data to which the machine is exposed to If the data was different the outcome would have been different It is generally perceived that the more data a machine has the better the outcomes
To sum up the above activity what we did:
• We “trained” the machine with a data set This data set is called as training data
• Asked for answers to the machine on new data on which it was not trained This is exactly how all machine learning works You train your algorithm (Machine) on huge datasets, the algorithm learns obvious and not so obvious (hidden) patterns in the dataset When you expose the algorithm to a new dataset which it has not seen earlier, the
algorithm tries to answer your question on the new data set based on the learning it has acquired from the training data
In practical scenarios, there is one more step before exposing your algorithm to a new-data set This is called the testing stage We break up our original dataset into 2 parts (ratio of 80:20 or 75:25 etc.)
• Training Data
• Testing Data
You train the algorithm on training data and validate/test your algorithm on the testing data In the testing data we already have the answers So first we hide all the answers as if they are not present We expose out testing data to the model which was built on the training data and we predict the outcomes on the testing data Now we compare the
outcomes of this testing data from the actual outcomes that we kept hidden By comparing the predicted outcomes with the actual outcomes, we can evaluate
Trang 11• How accurate is our model?
• How big is the error in our model?
Once the algorithm gives good results on testing data then the algorithm is good for being used in real-life problems
Types of Machine Learning
As a Beginner we need to know that there are two types of Machine Learning:
• Supervised Machine Learning
• Unsupervised Machine Learning
There is another type of Machine learning known as reinforcement learning Let us leave that for now as it is outside the scope at the beginning level
To understand the difference between supervised machine learning and unsupervised machine learning we have to understand what is labelled and unlabelled data
Labelled Data and Unlabelled Data
Labelled data means it has a tag attached to itself The tag can be anything like a name, a number, a class, a type Unlabelled data does not have A tag attached to it
In the above picture unlabelled data is just bunch of fruits (objects) Imagine if we did not know how fruits look then for us those would be just a bunch of objects as there is no description of those objects available For a machine (computer) the unlabelled data set is just a bunch of objects
On the other hand label data has clear classification that those objects are Apples and Pears
If someone doesn't even know how Apple or Pear looks she can just read the label and understand that it is something called an Apple and something called as a Pear For a
machine these are not just any objects but 2 distinct type of objects one is Apple and one is
a Pear