Linear Regression: An Introduction to Machine Learning

Chryssy · Jan 12th 2019, 2:35pm

Over the past two decades Machine Learning has become one of the mainstays of information technology. The really basic idea is to take a dataset D and adapt from it while taking certain factors into account, and then predict an output with a given input. Some overview of all fields in Machine Learning (but we just do a really basic linear regression to get into it):

From: Deep Learning Ian Goodfellow, Yoshua Bengio, Aaron Courville

Example: Predicting house prizes (not-linear regression)
You think about buying a new Ancient Villa and want to know when, which and where to buy it. You could now do different things: Either predict the price on a timeline (like how much the prices in winter differ from summer) or predict the price of houses in different area(e.g. cities). We stick on the first part because its way easier for now.
Example:

Just a simple function: ypredict(x) = 100 + sin(x) * 5

On this (not linear) regression prediction the y-axis I put the price of a house with e.g. 150qm, the x-axis shows all months of a year. We can now see that the price for buying a house is different for each month. If you want to buy a house in the middle of January you probably pay 100.000€ and in the end of April you tend to pay roughly 10.000€ less. Now one would likely buy a house around April-May.
But why is this called regression? Well as said earlier, we have a continuous output for every value we put into our predictor ypredict(x).

Linear regression:
What is a linear function?
In genrel we have:

Where b is our bias, a our slope and y our output value.

However, in Machine Learning one uses different letters:

We would call them Beta-hat. The beta hat indicated with 0 is still the same bias and the other one is also the same slope, just different names. But we change y to y-hat because we say it's a predictor.

But how would one train a model like this and how can we optimize it?
We need:

a dataset with two variables
a training algorithm with a formula to minimize analytically (usually we minimize a cost function/error measure)
an error measure, for our wrong predictions and to look how good our model predicts. In general we look how much difference there is between the real and predicted values.

Our observed dataset contains x values and y values. We later want to predict a y value for any input x.

Now we have our data set we can move on to our training algorithm.
We now have to calculate our betas, bias and slope.
We do that by minimizing the error measure and solve after beta: Residual Sum of Squares(RSS)
Keep in mind: N is the size of the dataset, c.f. indices of our x,y columns above.

It results into a least squares problem by minimizing it after beta0 and beta1

Furthermore, I need to mention that xbar is the mean of all x. Which would be the sum over all x divided by x.

And last we use the Mean Squared Error measure to see how good our model is by using the sum of dividing our predicted y_hat(x) from the real y-value we have in our dataset and then square it.

But since this would be a lot of work to do hand-written, so we instead use Python to compute all those things.

Source Code

"""
@author: Chris
"""
#used for calculating the solution of the predictor
def predictor(beta0, beta1, x):
return beta0 + beta1 * x
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#read dataset
dataset = pd.read_csv("tutorial3.dat", sep = " ",
names= ["x", "y"])
#compute ß1
xmean = np.mean(dataset["x"])
ymean = np.mean(dataset["y"])
numerator = 0
denominator = 0
#calcilate numerator and denominator of the formula
for i, row in dataset.iterrows():
numerator += ((row["x"] - xmean)) * (row["y"] - ymean)
denominator += (row["x"] - xmean)**2
#fraction
beta1 = numerator / denominator
#compute ß0 using the formula given
beta0 = ymean - beta1 * xmean
print("Predictor:")
print("yhat = " + str(beta0) + " + " + str(beta1) + "*x")
#plot our data
plt.scatter(dataset["x"],dataset["y"])
#plot predictor function
x = np.linspace(1,6, 100)
y = beta0 + beta1 * x
plt.plot(x,y, color = "red")
plt.show()
#Calculate Mean Squared error using the formula
MSE = 0
for i, row in dataset.iterrows():
MSE += (row["y"] - predictor(beta0, beta1, row["x"])) **2
MSE = MSE / len(dataset)
print("Mean Squared Error:")
print(np.around(MSE,4))

Display All

In the end we get the following result:

For this specific dataset we see that linear regression performs quite good.

Laerze · Jan 12th 2019, 4:31pm

Why maths in my favourite game forum ? Bro wtf

xPJumper · Jan 12th 2019, 7:38pm

danke mein guter

Resu · Jan 12th 2019, 7:58pm

***

// edited & censored by BrookDE

Warning for spam & insult

Please note that further warnings can lead to a temporary exclusion of game and forum

Linear Regression: An Introduction to Machine Learning

Linear Regression: An Introduction to Machine Learning

Source Code

Share

Users Online 1