Data Driven computing AI The journey into the world of artificial inte

Microsoft Word Demo Draft docx Draft of Chapter 1 With Regards Ritesh Bhagwat Note Following is a draft version Introduction to Data Driven computing AI The journey into the world of artificial inte.

Trang 1

Draft of Chapter 1

Trang 2

Note: Following is a draft version

Introduction to Data Driven computing & AI

The journey into the world of artificial intelligence is extraordinary It is extraordinary

because it shows us that by just changing our perspective towards something we already know; we can learn something new and amazing The base of all Artificial Intelligence is built

on things that we all have probably already studied in our school and college If someone has studied math up to high school level, then chances are that there would be nothing new

in this book But I can assure you that we will learn something new from all the things that

we already know Everything will be built on things that we know This is the very beauty of Artificial Intelligence So, let us get started with this journey of a lifetime

When I was growing up as a teenager in the 1990’s the way to stand out in a conversation was by talking intelligent scientific things If we knew what is a “light year” or what does supersonic mean, we appeared Intelligent If we could do a complex Math calculation

quickly, we were hailed as a genius All these traits were essentially about remembering data and manipulating it For a very long-time human intelligence is judged on our memory and how we process the memory (data) stored in our brains With the arrival of

smartphones and similar technology, the definition of Human intelligence is going to

change With Artificial Intelligence, all the data processing is done in a gadget as small as our phone, the need to remember or memorize data will not be so meaningful But we as

human beings will have to evolve at a higher level May be Knowledge and information will

be for a Machine and wisdom will be for us humans which is a good thing as we will evolve

in consciousness

If you are reading this book, I’m sure you must have heard that AI is taking over and AI is going to change the world Have you ever wondered what is AI? How is this AI or modern computing different from traditional computing? Let us do a fun activity to understand why

we need AI and what type of question does AI try to solve

The following table lists down 10 famous personalities of the world along with their domain

of work and gender

S

1 Roger Federer Sports M

2 Sachin Tendulkar Sports M

3 Mahatma Gandhi Leadership M

5 Nelson Mandela Leadership M

6 Robert Downey Junior Art & Movies M

7 Tom Cruise Art & Movies M

Trang 3

8 Steve Jobs Tech M

9 Scarlett Johansson Art & Movies F

10 Bill gates Tech & Philanthropy M

Now try to answer the following questions

1 How many of the above personalities are female?

2 Name the personalities whose domains of work is Tech

3 Are there any female personalities included in the above list whose domain of work

is Art and movies?

4 Who are the top two most popular personalities from the list?

5 Whose domain of work is the best from the above personalities?

The answers to the first 3 questions are very simple

1 2

2 Steve Jobs & Bill Gate

3 Yes (Scarlett Johansson)

How about the fourth and the fifth questions? Do we have a universal answer to the fourth and the fifth question? No, we don’t These questions are subjective, not well defined or are vague in nature For every individual the definition of a famous personality or best domain

of work is different

If we have 1000 respondents to answer these 5 questions:

• When correct, all the 1000 respondents will give the same answers to 1st three questions

• We will most probably get different answers for the 4th & 5th question

Let us talk about the example in a different way We can get the answers to the first three questions by setting rules We can write a simple program that will scan the domain of work and gender of our personalities and we will get the answers So, we have data we set rules and we get answers

Trang 4

Now how can we answer the fourth and the fifth question? The best we can do is ask all our

1000 respondents to vote with their answers The most common answer becomes the rule

If for the fifth question the most common answer is that the best line of work is “Art & Movies” then it becomes our answer Do remember if we change the number of our

respondents then the answer can also change So, what we are doing here is we have data

we give answers to the data and we get a rule

We can see the difference between the two approaches One is rule-driven, and another is answer driven

A problem to which we cannot set rules to get answers qualifies to be a problem

which should be solved by Artificial Intelligence

In the context of AI, the problems to which we can give solutions with rules come under the category of traditional or classical computing and the one where we can’t set rules are the area of Artificial Intelligence which I loosely term as Modern computing

Do note that the rule-based questions can also be solved by artificial intelligence, but it is not worth it to solve those with AI as it is computationally expensive It is like you have a pizza, a knife and a sword You should always use a knife to cut the pizza and not the sword You can cut the pizza with a sword, but it is not worth it

Now that we have a bit of understanding about AI, let us try and understand with a simple example of what we mean when we say that everything in AI is built of things that we already know

Suppose you run an OTT platform like Netflix of Amazon Prime and you have three loyal customers Steve, Natasha and Tony There customers watch movies and give them ratings

A fourth customer Scott logs into the platform and watches two movies, Iron Man and Jerry Maguire and has given his rating to the movies We have to recommend Scott more movies

to keep him hooked to our platform

Trang 5

How can we do this using the data that we have? If we can figure out a way by which we can know who out of Steve, Tony and Natasha has movie preferences like Scott then we can recommend Scott other movies watched by that customer To our surprise we can use high school math to do this! Let us see how it works

Let’s assume the ratings given by customers is represented in the following table

Iron Man Jerry Maguire

Scott logs in

Trang 6

Easiest way to see which of the three customers’ preference is closest to Scott is by

subtracting the scores given by Scott from the scores given by other customers In other

words, we can find the “distance” between Scotts’ score from score of Steve, Natasha &

Tony The simplest way to see the distance is by subtracting the corresponding values of the movie rating given by Scott and given by other customers

Let us say the rating of Iron man is represented by x and Jerry Maguire by y So, the distance can be calculated by the formula:

• |x1-x2| + |y1-y2|

Where:

x1 = Rating of Iron man by old customer (Steve or Natasha or Tony)

x2: Rating of Iron man by Scott y1: Rating of Jerry Maguire by old customer (Steve or Natasha or Tony) y2: Rating of Jerry Maguire by Scott

|K| represents the absolute value of K which will always be positive If K= 3 then

• |3| =3

• |-3| =3

This process of computing distances using absolute value is known as Manhattan Distance

Iron Man Jerry Maguire

Scott logs in

Referring the above table, the Manhattan distance between:

• Scott and Steve = |4-2| + |2-1| = 3

• Scott & Natasha = |4-3| + |2-3| =2

• Scott & Tony = |4-5| + |2-4| =2

So, we can see that the Manhattan distance between Natasha and Scott is the lowest of all

three hence Natasha’s movie preferences should be closer to that of Scott We can go ahead and recommend Scott, all the other movies watched by Natasha and highly rated by her Chances are that he will also like those movies

Trang 7

This was the Manhattan distance We all are more familiar with something known as

Euclidean distances Euclidian distance can also be used to solve the same problem

Euclidian distance is calculated by formula:

• x1 = Rating of Iron man by old customer (Steve or Natasha or Tony)

• x2: Rating of Iron man by Scott

• y1: Rating of Jerry Maguire by old customer (Steve or Natasha or Tony)

• y2: Rating of Jerry Maguire by Scott

Euclidean distance between

• Scott and Steve: Sqrt (5) = 2.24

• Scott and Natasha: Sqrt (2) = 1.41

• Scott and Tony: Sqrt (5) = 2.24

We can see that the Euclidean distance between Natasha and Scott is the lowest In the case

of Euclidean distance, we have another way of representing the dataset We can represent the dataset in an X-Y coordinate system

Here x axis represents Iron man and y axis represents Jerry Maguire Steve has given a rating

of 2 to iron man and 1 to Jerry Maguire so the coordinate representation is (2 ,1) Same

Trang 8

concepts follow to everyone’s ratings Just looking at the plot here, we can see that Natasha and Scott are closest to each other and hence they have same preferences of the movies

We have two movies so we have a two-dimensional space, if we had three movies, we would have moved up to a three-dimensional space and if we have n number of movies, we can move to n dimensional space

By the way, the problem that we just solved is known as Collaborative filtering Essentially,

we just built a recommendation engine using collaborative filtering And all that by just

using school math! How cool is that!

Fun Fact

“We studied about two distances namely Manhattan distance and Euclidean

distances These distances come from a family of distances known as Minkowski

distance A general formula for Minkowski distances is:

In this formula of Minkowski distance if:

p= 1 then it is known as Manhattan distance p= 2 then it is known as Euclidean distance There are many other distances in Minkowski family like hamming distance where

p =0 and so on You can google about other distances as those distances are

beyond the scope of our book The most used distance is the Euclidean distance”

Machine Learning: What does it mean?

At its core machine learning is the ability of a system to learn on its own without being explicitly programmed What sets Machine learning apart from traditional computing is its

“human-like” ability to learn on its own

As kids, we all have made that mistake of touching something that is very hot That burning sensation is unforgettable But what we learn from that experience is never touch

something that is hot In a similar way when a machine is exposed to some data it

remembers that data and makes its decisions based on that memory that it gathered by that data

Trang 9

What do we mean when we say human like ability to make decisions? Let’s say it is raining heavily outside and a friend comes to our home as asks you to go for a picnic How will we decide whether it’s worth going for the picnic in heavy rains? You do it on your past experience right? To put it down into a process there are roughly three steps involved

• Recall: Recall what happened in a scenario

• Process: Think of the scenario

• Decide: Take a decision

Applying this 3-point technique to our Picnic decision

• Recall : Whenever it rains very heavily, traffic is hit badly It happened last time

• Process : It is raining very heavily now so traffic should be hit badly

• Decide : Let us stay at home and have a hot cup of green tea!

Humans make decisions based on experience The experience of machines is Data

Machines make decisions based on Data

But how does a machine get experience Let’s try to understand this from the following example Suppose we have a dataset of 5 patients with their blood pressure and whether the patient has a heart disease or not

Patient No Blood Pressure Heart Disease

1 High Yes

2 High Yes

3 High Yes

4 High Yes

5 Normal No

6 High ???

Trang 10

Based on the 5 data points we want to predict the heart condition of a 6th patient who has high Blood pressure We pass the data to the machine and ask for an answer The machine will scan this data and find a “pattern” that whoever has a high blood pressure also has heart disease so it is highly likely the machine will tell us that the 6th patient has a heart problem So, the answer is Yes

You can also notice here that we could have also come up to this conclusion by just using the statistical concept of correlation between Blood Pressure and Heart Disease Statistics and Statistical modeling play a very important role in the field of Machine learning

In the context of Artificial Intelligence, which as we studied earlier, is answer driven and not rule-driven, it is also important to note here that to identify whether the sixth patient has heart disease or not, we

• Gave the machine answers in the form of the records of 5 patients

• The machine in response gave us a rule that as per the data whoever has high blood pressure also has heart disease

Important to note here that it may not be medically correct but is correct with respect to the data to which the machine is exposed to If the data was different the outcome would have been different It is generally perceived that the more data a machine has the better the outcomes

To sum up the above activity what we did:

• We “trained” the machine with a data set This data set is called as training data

• Asked for answers to the machine on new data on which it was not trained This is exactly how all machine learning works You train your algorithm (Machine) on huge datasets, the algorithm learns obvious and not so obvious (hidden) patterns in the dataset When you expose the algorithm to a new dataset which it has not seen earlier, the

algorithm tries to answer your question on the new data set based on the learning it has acquired from the training data

In practical scenarios, there is one more step before exposing your algorithm to a new-data set This is called the testing stage We break up our original dataset into 2 parts (ratio of 80:20 or 75:25 etc.)

• Training Data

• Testing Data

You train the algorithm on training data and validate/test your algorithm on the testing data In the testing data we already have the answers So first we hide all the answers as if they are not present We expose out testing data to the model which was built on the training data and we predict the outcomes on the testing data Now we compare the

outcomes of this testing data from the actual outcomes that we kept hidden By comparing the predicted outcomes with the actual outcomes, we can evaluate

Trang 11

• How accurate is our model?

• How big is the error in our model?

Once the algorithm gives good results on testing data then the algorithm is good for being used in real-life problems

Types of Machine Learning

As a Beginner we need to know that there are two types of Machine Learning:

• Supervised Machine Learning

• Unsupervised Machine Learning

There is another type of Machine learning known as reinforcement learning Let us leave that for now as it is outside the scope at the beginning level

To understand the difference between supervised machine learning and unsupervised machine learning we have to understand what is labelled and unlabelled data

Labelled Data and Unlabelled Data

Labelled data means it has a tag attached to itself The tag can be anything like a name, a number, a class, a type Unlabelled data does not have A tag attached to it

In the above picture unlabelled data is just bunch of fruits (objects) Imagine if we did not know how fruits look then for us those would be just a bunch of objects as there is no description of those objects available For a machine (computer) the unlabelled data set is just a bunch of objects

On the other hand label data has clear classification that those objects are Apples and Pears

If someone doesn't even know how Apple or Pear looks she can just read the label and understand that it is something called an Apple and something called as a Pear For a

machine these are not just any objects but 2 distinct type of objects one is Apple and one is

a Pear

Định dạng
Số trang	15
Dung lượng	3,64 MB