Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 23

PYTHON PROGRAMMING & DATA SCIENCE

COST FUNCTIONS
Cost Functions
Cost Function
 It is a function that measures the performance of a Machine Learning
model for given data.
Cost Function quantifies the error between predicted values and expected
values and presents it in the form of a single real number.
(Or)
The cost function returns the error between predicted outcomes compared
with the actual outcomes.
 
Cost Functions
Cost Function
 Depending on the problem Cost Function can be formed in many different
ways.
The purpose of Cost Function is to be either:
Minimized or Maximized
Minimized -
returned value is usually called cost, loss or error. The goal is to find the
values of model parameters for which Cost Function return as small number as
possible.
Maximized -
the value it yields is named a reward. The goal is to find values of model
parameters for which returned number is as large as possible.
 
Cost Functions
Cost Function
 The aim of supervised machine learning is to minimize the overall cost, thus
optimizing the correlation of the model to the system that it is attempting to
represent.
Types of the cost function
 There are many cost functions in machine learning and each has its use
cases depending on whether it is a regression problem or classification
problem.
Some of the important cost functions are:
1. Regression cost Function
2. Binary Classification cost Functions
3. Multi-class Classification cost Functions
Cost Functions
1. Regression cost Function:
Regression models deal with predicting a continuous value.
Examples: salary of an employee, price of a car, loan prediction, etc.
A cost function used in the regression problem is called “Regression Cost
Function”.
They are calculated on the distance-based error as follows:
Error = y-y’
Y – Actual Input
Y’ – Predicted output
The most used Regression cost functions are:
1.1 Mean Error (ME)
1.2 Mean Squared Error (MSE)
1.3 Mean Absolute Error (MAE)
Cost Functions
1.1 Mean Error (ME)
In this cost function, the error for each training data is calculated and then
the mean value of all these errors is derived.
Calculating the mean of the errors is the simplest and most intuitive way
possible.
The errors can be both negative and positive. So they can cancel each other
out during summation giving zero mean error for the model.
Thus this is not a recommended cost function but it does lay the foundation
for other cost functions of regression models.
Cost Functions
1.2 Mean Squared Error (MSE)
This improves the drawback we encountered in Mean Error . Here a square
of the difference between the actual and predicted value is calculated to avoid
any possibility of negative error.
It is measured as the average of the sum of squared differences between
predictions and actual observations.
MSE = (sum of squared errors)/n
It is also known as L2 loss.
In MSE, since each error is squared, it helps to penalize even small deviations
in prediction when compared to MAE.
But if our dataset has outliers that contribute to larger prediction errors,
then squaring this error further will magnify the error many times more and
also lead to higher MSE error.
Hence we can say that it is less robust to outliers
Cost Functions
1.3 Mean Absolute Error (MAE)
This cost function also addresses the shortcoming of mean error differently.
Here an absolute difference between the actual and predicted value is
calculated to avoid any possibility of negative error.
So in this cost function, MAE is measured as the average of the sum of
absolute differences between predictions and actual observations.

          MAE = (sum of absolute errors)/n


It is also known as L1 Loss.
It is robust to outliers thus it will give better results even when our dataset
has noise or outliers.
Cost Functions
Cost Function For Linear Regression
A Linear Regression model uses a straight line to fit the model. This is done
using the equation for a straight line as shown : 

For the Linear regression model, the cost function will be the minimum of
the Root Mean Squared Error of the model, obtained by subtracting the
predicted values from actual values. The cost function will be the minimum of
these error values.
Cost Functions
2. Cost functions for Classification problems
Cost functions used in classification problems are different than what we use
in the regression problem.
 A commonly used loss function for classification is the cross-entropy loss.
Example:
Consider that we have a classification problem of 3 classes as follows.
Class(Orange,Apple,Tomato)
The machine learning model will give a probability distribution of these 3
classes as output for a given input data.
The class with the highest probability is considered as a winner class for
prediction.
Output = [P(Orange),P(Apple),P(Tomato)]
Cost Functions
2. Cost functions for Classification problems
The actual probability distribution for each class is shown below.
Orange = [1,0,0]
Apple = [0,1,0]
Tomato = [0,0,1]
If during the training phase, the input class is Tomato, the predicted
probability distribution should tend towards the actual probability distribution
of Tomato.
If the predicted probability distribution is not closer to the actual one, the
model has to adjust its weight. This is where cross-entropy becomes a tool to
calculate how much far the predicted probability distribution from the actual
one is.
Cost Functions
2. Cost functions for Classification problems
In other words, Cross-entropy can be considered as a way to measure the
distance between two probability distributions.
 The following image illustrates the intuition behind cross-entropy:
Cost Functions
2.1 Multi-class Classification cost Functions
This cost function is used in the classification problems where there are
multiple classes and input data belongs to only one class.
Let us now understand how cross-entropy is calculated. Let us assume that
the model gives the probability distribution as below for ‘n’ classes & for a
particular input data D.

And the actual or target probability distribution of the data D is


Cost Functions
2.1 Multi-class Classification cost Functions
Then cross-entropy for that particular data D is calculated as
Cross-entropy loss(y,p) = – yT log(p)
= -(y1 log(p1) + y2 log(p2) + ……yn log(pn) )

Example:
Let us now define the cost function for the image
Cost Functions
2.1 Multi-class Classification cost Functions
p(Tomato) = [0.1, 0.3, 0.6]
y(Tomato) = [0, 0, 1]
Cross-Entropy(y,P) = – (0*Log(0.1) + 0*Log(0.3)+1*Log(0.6)) = 0.51
 The above formula just measures the cross-entropy for a single observation
or input data.
 The error in classification for the complete model is given by categorical
cross-entropy which is nothing but the mean of cross-entropy for all N training
data.
Categorical Cross-Entropy = (Sum of Cross-Entropy for N data)/N
Cost Functions
2.2 Binary Cross Entropy Cost Function
Binary cross-entropy is a special case of categorical cross-entropy when
there is only one output that just assumes a binary value of 0 or 1 to denote
negative and positive class respectively.
 For example-classification between cat & dog.
Let us assume that actual output is denoted by a single variable y, then
cross-entropy for a particular data D is can be simplified as follows –
Cross-entropy(D) = – y*log(p) when y = 1
Cross-entropy(D) = – (1-y)*log(1-p) when y = 0
      The error in binary classification for the complete model is given by binary
cross-entropy which is nothing but the mean of cross-entropy for all N training
data.
Binary Cross-Entropy = (Sum of Cross-Entropy for N data)/N
Cost Functions
The objective of a ML model, therefore, is to find parameters, weights or a
structure that minimises the cost function.
Minimizing the cost function: Gradient descent
Gradient descent is an efficient optimization algorithm that attempts to find
a local or global minima of a function.
Gradient descent enables a model to learn the gradient or direction that the
model should take in order to reduce errors (differences between actual y and
predicted y)
 
Cost Functions
Gradient Descent For Linear Regression
By the definition of gradient descent, we have to find the direction in which
the error decreases constantly.
This can be done by finding the difference between errors. The small
difference between errors can be obtained by differentiating the cost function
and subtracting it from the previous gradient descent to move down the
slope. 

After substituting the value of the cost function (J) in the above equation, we
get Linear regression gradient descent function simplified.
Cost Functions
Implementing Cost Functions in Python
Example:
 take a numpy array of random numbers as our data.
importing important modules.

The numpy array is a 2-D array with


random points.
Each element of the array corresponds to
an x and y coordinate.
Here, x is the input and y is the output
required.
Let’s separate these points and plot them.             
Cost Functions
Implementing Cost Functions in Python

Now, let's set our theta value and store the y values in a different array so we
can predict the x values.                  
Cost Functions
Implementing Cost Functions in Python

Let’s initialize the ‘m’ and ‘b’ values along with the learning rate.

Using mathematical operations, find the cost function value for our inputs.
Cost Functions
Implementing Cost Functions in Python
Using the cost function, you can update the theta value.

Now, find the gradient descent and print the updated value of theta at every
iteration.
Cost Functions
Implementing Cost Functions in Python
On plotting the gradient descent, you can see the decrease in the loss at
each iteration.    

Plotting gradient descent

You might also like