On this web page I demonstrate some techniques to retrieve data from the internet and put it into excel, which you can call web scraping. I discuss alternative methods that are pretty simple and then move to more complex methods. It is popular these days to discuss data scraping with the python programming language, and to be sure, you can do more things with getting data from internet websites using python than you can with some other methods. But I take the perspective that if you can get data with a simple one line macro — workbooks.open — it is often much better than writing a program where you have to go and find python libraries (like …) and then figure out how to install the library and dechipher code in the website. I also have the philosophy that when cleaning up data it can be simpler to clean things up with excel functions like XLOOKUP and even MATCH rather than writing a lot of code in a program outside of excel (I do understand if you disagree with this). In order to see what I mean, I have separated the discussion into different ways to get financial data. This data includes an incredible set of free resources from the Federal Reserve Economic Data; stock prices (that can be efficiently adjusted by yahoo finance); financial statement data (that used to be available and is not painful to retrieve), and many other data sets including commodity prices from the world bank, electricity prices from the NORDPOOL website, solar panel prices from PVINSIGHT and many other sources.
.
Part 1: Finding a URL and Testing Whether It Will Work
The methods other than using python in one way or another use
When working with databases I find the INDIRECT, MATCH and INDEX helpful. But using the MATCH and INDEX can be difficult if there are many tests. For example you may want to find the second occurrence rather than the first occurance. In addition when you read some data data from websites like marketwatch, you may want to convert 12B to 12,000,000,000 etc. The files below include functions that resolve these and other issues. The files that delete rows, columns and read in data are in the generic macro files. If you want access to the database files, send me an email at edwardbodmer@gmail.com and I will send you the resource library.
.
Step 1: Define the WebRequestURL from a cell in your file
Step 2: Call the WebRequest as shown below
Step 3: Use the .Open to get the WebRequestURL
WebRequestURL = Range(“yahoo_url”) ‘ Need to define the webrequest
Set WebRequest = New WinHttp.WinHttpRequest ‘ This is the core of the method
With WebRequest 'FETCH THE DATA:
.Open "GET", WebRequestURL, False ' Need to define the URL
.send
.waitForResponse (response_wait) '
End With
OutRow = Split(WebRequest.ResponseText, vbLf): RowMax = UBound(OutRow) – 1 ‘ Split a single row when hit vbLF
Part 2: The Simplest Method – Workbooks.Open
Once you find a URL that works, you can plop the URL into a spreadsheet and use this method. The method can be used either to put data from the internet page directly into an excel spreadsheet (such as pvinsights) or the data can be read from a spreadsheet that you can download from the internet.
This page includes a number of techniques and examples for working with large data files.
I am going to post these with explanations of the functions that you can make in excel to improve the database capabilities. If you would like the files in the meantime, send me an e-mail to edwardbodmer@gmail.com and request the resources.
Nigeria Analysis.xlsm
Daily Broadcast 2014 Fixed.xlsm
Nigeria Daily Statistics from 2012.xlsm
Daily Broadcast 2015.xlsm
Daily Broadcast 2014.xlsm
Nigeria Daily Statistics From 2010.xlsm
Daily Broadcast 2015 Fixed.xlsm