Create Your Own Google Stock Prediction Program Using Python And Machine Learning
In this article I will show you how to create your own stock prediction Python program using a machine learning algorithm called Support Vector Regression (SVR) . The program will read in Google (GOOG) stock data and make a prediction of the price based on the day.
It is extremely hard to try and predict the direction of the stock market and stock price, but in this article I will give it a try. Even people with a good understanding of statistics and probabilities have a hard time doing this.
Disclaimer: The material in this article is purely educational and should not be taken as professional investment advice. Invest at your own discretion.
A Support Vector Regression (SVR) is a type of Support Vector Machine,and is a type of supervised learning algorithm that analyzes data for regression analysis. In 1996, this version of SVM for regression was proposed by Christopher J. C. Burges, Vladimir N. Vapnik, Harris Drucker, Alexander J. Smola and Linda Kaufman. The model produced by SVR depends only on a subset of the training data, because the cost function for building the model ignores any training data close to the model prediction.
Support Vector Machine Pros:
- It is effective in high dimensional spaces.
- It works well with clear margin of separation.
- It is effective in cases where number of dimensions is greater than the number of samples.
Support Vector Machine Regression Cons:
- It does not perform well, when we have large data set.
- Low performance if the data set is noisy ( a large amount of additional meaningless information).
Types Of Kernel:
- radial basis function (rbf)
If you prefer not to read this article and would like a video representation of it, you can check out the YouTube Video below. It goes through everything in this article with a little more detail, and will help make it easy for you to start programming your own Machine Learning model even if you don’t have the programming language Python installed on your computer. Or you can use both as supplementary materials for learning about Machine Learning !
The first thing that I like to do before writing a single line of code is to put in a description in comments of what the code does. This way I can look back on my code and know exactly what it does.
# Description: This program predicts the price of GOOG stock for a
# using the Machine Learning algorithm called Support
Vector Regression (SVR)
Now import the packages /libraries to make it easier to write the program.
#Import the libraries
from sklearn.svm import SVR
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Next I will load the Google (GOOG) stock data that I got from finance.yahoo.com into a variable called ‘df’ short for data frame.
NOTE: This is data from Yahoo for the past 30 days, 5–1–2019 to 5–30–2019.
But Yahoo may have not recorded the price of the stock some of the days, so the number of records in the data set may be less than the number of days. For example, no stock prices are recorded on the weekends.
#Load the data
#from google.colab import files # Use to load data on Google Colab
#uploaded = files.upload() # Use to load data on Google Colab
df = pd.read_csv('GOOG_30_days.csv')
Get and print the last row of data. The Adj Close Price is 1117.949951.
actual_price = df.tail(1)
Prepare the data for training. Recreate the data frame by getting all of the data except for that last row which I will use to test the models later, and store the new data with the last row missing back into ‘df’.
df = df.head(len(df)-1)
Create the variables that will be used as the independent and dependent data sets by setting them equal to empty lists.
#Create the lists / X and y data set
days = list()
adj_close_prices = list()
Get all of the rows from the Date column store it into a variable called ‘df_days’ and get all of the rows from the Adj Close Price column and store the data into a variable.
df_days = df.loc[:, 'Date']
df_adj_close = df.loc[:, 'Adj Close Price']
Create the independent data set ‘X’ and store the data in the variable ‘days’.
Create the dependent data set ‘y’ and store the data in the variable ‘adj_close_prices’. Both can be done by appending the data to each of the lists.
NOTE: For the independent data set we want only the day from the date, so I use the split function to get just the day and cast it to an integer while appending the data to the days list.
#Create the independent data set
for day in df_days:
days.append( [int(day.split('/'))] )#Create the dependent data set
for adj_close_price in df_adj_close:
adj_close_prices.append( float(adj_close_price) )
Look and see what days were recorded in the data set.
Next, I will create and train the 3 different Support Vector Regression (SVR)models with three different kernels to see which one performs the best.
#Create and train an SVR model using a linear kernel
lin_svr = SVR(kernel='linear', C=1000.0)
lin_svr.fit(days,adj_close_prices)#Create and train an SVR model using a polynomial kernel
poly_svr = SVR(kernel='poly', C=1000.0, degree=2)
poly_svr.fit(days, adj_close_prices)#Create and train an SVR model using a RBF kernel
rbf_svr = SVR(kernel='rbf', C=1000.0, gamma=0.15)
Last but not least I will plot the models on a graph to see which has the best fit and return the prediction of the day.
#Plot the models on a graph to see which has the best fit
plt.scatter(days, adj_close_prices, color = 'black', label='Original Data')
plt.plot(days, rbf_svr.predict(days), color = 'green', label='RBF Model')
plt.plot(days, poly_svr.predict(days), color = 'orange', label='Polynomial Model')
plt.plot(days, lin_svr.predict(days), color = 'purple', label='Linear Model')
plt.ylabel('Adj Close Price')
plt.title('Support Vector Regression')
The best model from the graph below seems to be the RBF which is a Support Vector Regression model that uses a kernel called radial basis function. However this graph can be misleading.
Now I can start making my stock price prediction. Recalling the last row of data that was left out of the original data set, the date was 05–30–2019, so the day is 30. This will be the input of the model to predict the price which is $1117.949951.
So now I will predict the price giving the models a value or day of 30.
day = []
print('The RBF SVR predicted:', rbf_svr.predict(day))
print('The Linear SVR predicted:', lin_svr.predict(day))
print('The Polynomial SVR predicted:', poly_svr.predict(day))
From this small test, the model that seems to have performed the best seems to be the RBF SVR model. This model predicted a value of $1112.94098222 when the actual price was $1117.949951, so it was only off by about $5 !
That is it, you are done creating your SVR program to predict stock! Again if you want, you can watch and listen to me explain all of the code on my YouTube Video.
If you are interested in reading more on machine learning to immediately get started with problems and examples then I strongly recommend you check out Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. It is a great book for helping beginners learn how to write machine learning programs, and understanding machine learning concepts.
Thanks for reading this article I hope its helpful to you all ! If you enjoyed this article and found it helpful please leave some claps to show your appreciation. Keep up the learning, and if you like machine learning, mathematics, computer science, programming or algorithm analysis, please visit and subscribe to my YouTube channels (randerson112358 & compsci112358 ).