Acquiring Data for Corporate Model

One of the real world challenges in creating a financial model is to acquire historic accounting data in an efficient and flexible manner. I believe it is is instructive to put together as many historic years as possible to evaluate trends, volatility of returns and develop downside cases in corporate models. There are a few ways to get historic data these days. One way is to go the an internet site like and put together the data. If your company must register with the U.S. securities and exchange commission, you can get data back through 1994.

This lesson set walks you through acquiring data from PDF files and putting together many years of data in a consistent format. In addition to acquiring data from PDF files, the lesson set demonstrates how to acquire data from companies that make filings to the Securities and Exchange Commission in the U.S. You can receive credit for this lesson set simply by sending in a completed set of historical financial statements along with the original source of the data. If you do this and send me the files I will put your name on the page of the website that lists the star financial analysts who have completed a lesson. Here are the steps for getting a nice set of financial statements put together:

1. Find the financial statements and get them with Chrome (it works better than Firefox)
2. Copy the data into excel and arrange statements together (e.g. all income statements in first set of sheets, followed by balance sheet etc.)
3. Use the Read PDF excel file to format the financial statements.
4. Put all of the statements together — put all income statements in one file; balance sheets in second file etc.
5. Get the UNION function and the INDMAT function into your file from the Read PDF file.
6. Arrange sections of each statement with a few lines and have a lot of blanks. eg. operating section of income statement, current assets of balance sheet.
7. Use the UNION function to put together a common set of titles.
8. Use the INDMAT function to put together data for different years with common titles.
9. Copy and paste as values in new file for analysis.


Read PDF to Excel File that Allows you to Format Data After Copying from PDF File (Press Shift, Cntl, A)

Asian Version of Excel File the Reads PDF (or Internet) to Excel with Macros that are Implemented with SHIFT CNTL Afor


Videos on Reading Financial Statements

Videos with this lesson set are boring and have some sound quality problems and I admit this, sorry. I have left them on the website anyway. Perhaps the idea of reading in financial statements from PDF files and from the SEC is not really interesting. However, efficient use of: (1) the READ PDF file to get data from the PDF’s on the internet and get the data as quickly as possible into excel is a very practical issue; (2) putting financial statements together from different years when the titles change is a very real world pain and is solved with the important UNION function; (3) quickly arranging the data itself after putting together the titles with the UNION function is very helpful and is accomplished with the INDMAT function. These three things are all made possible with functions and macros that are included in the READ PDF file.

Even with all of these tools, putting together financial statements is a pain. But it is a very real problem and anything to help this problem is I think useful. If you go through the videos and then try an example I think you can really improve your efficiency.

The first two videos below explain how to use the read_pdf excel file.  It covers the subject of downloading data from financial reports using alternative sources.


YouTube Embed: No video/playlist ID has been supplied


Carlsberg Raw Data.xlsx

The third video discusses how to move the data around and put different financial statements together.  This is done with the ALT, E, M short-cut.


If you are going to put financial statements together using the read_pdf file, there are inevitably changes in the format of the income statement, balance sheet and cash flow from year to year.  It is a real pain to arrange the data in an easy manner without the horror of moving things around manually in excel.  To solve this problem of different titles, I have created a union function that is intended to create  long list of accounts.  After you have created the accounts, you can use the match and index with the long list and the individual accounts to create a common format.  This is the technique presented in the video.

The union function is a UDF.  you can find the UNION function in the file named read_pdf


The final two videos in this set address cleaning up and graphing the data after you have put it together with the UNION udf.  The first video demonstrates how to put the financials together in a structured manner and the second video shows how to use the INDEX function to make flexible graphs with financial data.



I am sorry about being critical in creating this page.  I am trying to combine practical tools with theory in explaining all of the modelling steps.  This section reviews how can use my database program and how to use the read pdf program to acquire data for corporate modelling. I explain why I created these files and suggestions on how to think about various items in the financial data. Before I developed the read pdf file or the financial database file, I used to try to create a relevant a case example for companies in my classes.  I would get pdf files with financial reports and then struggle with pdf files trying to retrieve the data.  This really sucked. It would take hours and hours.  That is why I started to develop the read pdf file.  I then modified my file for reading in data and after a participant in one of my courses from Switzerland told me he uses Market Watch and wanted an automatic way to download and update the data. I finally understood that you could get historical financial statements from the internet. So, in describing how to set-up a financial model I begin with description of how to retrieve data in a from a practical standpoint. I am using companies in India to illustrate the process.

To operate the database that comes from the automatic database need to find a website that allows you to access their data in excel.  In our case for Indian companies, any company with an ADR stock has detailed financials published in MarketWatch.  You need to go and find the ticker symbols so that you read in the financial data.  What I do is simply to go to google and put in the company name along with MarketWatch.  Note in the screenshot below the symbol is rlniy in the url which is the same name as on the page.  It takes a little while, but I suggest making sure the MarketWatch has financial statements as for some companies there is a Market Watch page, but no financials.



Once you have done the difficult job of getting the tickers, you stick the tickers into the database file and then start retrieving the data.  As both annual data and quarterly data is downloaded for all of the companies, there can be a massive amount of data and cause little problems with excel.  You can also make some mistakes like I do (for example not continuing to count the numbers in the yellow column).  In this case you can just continue the reading of the financial data which takes much longer than anything else. The screenshot below illustrates the list of ticker symbols for the India case study after the painful process of finding the ticker symbols.  To make this yourself you need to get the URL’s as shown below.


Why want real data rather than screens; you want to compare to economic series.

I used to hate classes where I tried to get the financial reports and then come up with different ways to get the data into models.



Carlsberg Raw Data Formatted.xlsm

Carlsberg Raw Data Arranged.xlsm

Carlsberg Raw Data Union and INDMAT.xlsm

Format SEC.xlsm