Web Scraping Stock Tickers Using Python

Scrape the web using Python

Image for post
Image for post

As with most interesting projects, this one started with a simple question: where can I get all of the stock symbols and company names for my portfolio? Well, the obvious answer was to gather the data myself from the web !

This brings us to this article where I will show you how to gather stock symbols/tickers and their associated company names from the web.

Understanding The Thought Process Before Programming

First, I need to find a website that contains the data I want. A website that has some of the data I want to scrape is:(https://www.advfn.com/nyse/newyorkstockexchange.asp?companies=A)

This site seems like it would have a good amount of stock symbols/tickers and the associated company name. But looking at the website, I can see that it is missing some data from the New York Stock Exchange (NYSE), but that’s okay as this is just a small project and it doesn’t need to be perfect and contain every single stock from the stock exchange.

Also the page from the link (https://www.advfn.com/nyse/newyorkstockexchange.asp?companies=A) contains some of the company names and symbols for the New York Stock Exchange (NYSE) where the company names begin with the letter ‘A’. I want ALL of the company names, so that includes companies that start with the letter ‘A’, ‘B’, ‘C’, ‘D’, etc. all the way to ‘Z’.

Image for post
Image for post

Luckily, I see a nice way we can get all of that data through the link. We can simply append a letter to the link where it says ‘companies = <letter>’ and then a page with all of the company names that start with that letter will open. For example the link https://www.advfn.com/nyse/newyorkstockexchange.asp?companies=A will open a page with all companies whose name start with the letter ‘A’, and the link https://www.advfn.com/nyse/newyorkstockexchange.asp?companies=B will open a page with all of the companies whose name start with the letter ‘B’, so on and so forth for all of the letters in the alphabet.

All of the pages that contain companies A-Z seem to have a similar structure to the first link/URL that contained all of the companies whose name started with an ‘A’.

This tells me I will need to create a loop through the alphabet to get all of the pages.

So, let’s take a look to see how this first page (the page that contains companies that start with the letter ‘A’) is set up and structured so that we can scrape the data off of it and the other similar pages !!

To look at the structure of the data, we just need to inspect element on the ‘A K Steel’ link, since that is one of the company names that we want to scrape.

Image for post
Image for post

Immediately after inspecting the elements, I can see tags and classes that I am interested in to scrape this data. From the inspection, I can see that the company name and symbol are both within the ‘<tr>’ tag and that tag has one of two classes, either ‘ts0’ or ‘ts1’.

So, when I scrape the data I will search for those tags and attributes. Also within the ‘<tr>’ tags, I can see the ‘<td>’ tag which contains the text for the company name and the stock symbol. There are 3 of these tags per the specific ‘<tr>’ tags with class either ‘ts0’ or ‘ts1’. Within each of those ‘<td>’ tags, I will grab or scrape the text.

If you prefer not to read this article and would like a video representation of it, you can check out the YouTube Video below. It goes through everything in this article with a little more detail, and will help make it easy for you to start programming . Or you can use both as supplementary materials for learning !

Programming

The first thing that I look to do when writing programs is to put a description about the program.

#Description: This program scrapes stock tickers and their company name from a website

Next, let’s import the dependencies that will be used throughout the program.

#Import the dependencies
import requests
import pandas as pd
from bs4 import BeautifulSoup

Now, I will create two empty lists, one that will contain the company names and the other that will contain the companies ticker or stock symbol.

#Create two empty lists for the company name and company ticker symbol
company_name =[]
company_ticker = []

Create a function to scrape the data. This function will allow you to input a letter and then it will scrape the the company name and the company ticker from the website where the company name starts with that specific letter.

It returns a list with all of the company names and tickers.

#Create a function to scrape the data
def scrape_stock_symbols(Letter):
Letter = Letter.upper()
URL = 'https://www.advfn.com/nyse/newyorkstockexchange.asp?companies='+Letter
page = requests.get(URL)
soup = BeautifulSoup(page.text, "html.parser")
odd_rows = soup.find_all('tr', attrs= {'class':'ts0'})
even_rows = soup.find_all('tr', attrs= {'class':'ts1'})
for i in odd_rows:
row = i.find_all('td')
company_name.append(row[0].text.strip())
company_ticker.append(row[1].text.strip())
for i in even_rows:
row = i.find_all('td')
company_name.append(row[0].text.strip())
company_ticker.append(row[1].text.strip())
return (company_name, company_ticker)

Get a list of all of the capital letters in the alphabet.

#Get and show a list of every letter in the alphabet
import string
string.ascii_uppercase
Image for post
Image for post

Loop through every letter so that you can input that letter into the function to get all of the company names and stock symbols.

#Loop through every letter in the alphabet to get all of the tickers from the website
for char in string.ascii_uppercase:
(temp_name,temp_ticker) = scrape_stock_symbols(char)

Put it all together in one data set by creating a DataFrame.

#Create a new dataFrame that contains the company name and company ticker
data = pd.DataFrame(columns = ['company_name', 'company_ticker'])
data['company_name'] = name
data['company_ticker'] = ticker

From earlier, you may remember me mentioning that not all of the stock data for the NYSE was there and we can see some missing data. Well it seems this gave us some empty rows, so let’s remove the rows and clean the data set.

#Data Cleaning
data = data[data['company_name'] != '']

That’s it, now all that is left is to show the data !

#Show the data
data
Image for post
Image for post

We are done creating this program ! If you want to start an investment portfolio, then sign up with WeBull using this link and get 4 FREE stocks (2 stocks just for signing up and 2 stocks for depositing $100 or more for new users). It’s 4 free stocks that you can either sell, play with or create your own trading strategy with. You can just sign up and not deposit $100, but the deal will expire and you won’t be able to get the extra 2 free stocks later (24 hours after signing up) if you decide to just sign up, so why not get the extra 2 stocks by depositing $100 or more? I think it’s a great deal for a limited time and FREE money!

If you are interested in reading more on Python one of the fastest growing programming languages that many companies and computer science departments use, then I recommend you check out the book Learning Python written by Mark Lutz’s.

Image for post
Image for post
Learning Python

Thanks for reading this article I hope its helpful to you all! If you enjoyed this article and found it helpful please leave some claps to show your appreciation. Keep up the learning, and if you like machine learning, mathematics, computer science, programming or algorithm analysis, please visit and subscribe to my YouTube channels (randerson112358 & compsci112358 ).

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store