Do you want to build your financial data science skills? Whether you’re doing this to become a Financial Data Scientist, or to leverage the power of Financial Data Science for your own investments – take your pick from the finance data science projects we’ve created below and build your knowledge and skills as you work on them.
Projects get progressively more challenging as you go further down. Do you have what it takes to complete all the projects?
All of these finance data science projects can be worked on using any/all of the following tools:
- Microsoft Excel® / Google Sheets
- Python
- R
- MATLAB
For the Advanced Level Projects, in particular, we strongly recommend using a programming language instead of Excel® / Google Sheets. It’s possible to work on them using Excel® / Google Sheets, but it’s far from efficient to do so.
Beginner Level Finance Data Science Projects
Project B-1: Calculate Stock Returns
Discover powerful concepts like the Random Walk with this simple project on calculating stock returns.
💼 Project Brief
Calculate historical stock returns for a stock of your choice.
Plot out a line chart of the stock’s historic returns. This will also help you later on, when you’re working on the Intermediate Level Financial Data Science Projects (particularly, Project I-4).
☑️ What You’ll Need
To successfully complete this project, you’ll need access to historical price data for at least 1 stock of your choice.
Historical data for stock prices are available from Yahoo! Finance or from Google Sheets using the GOOGLEFINANCE() function.
Since this is a Beginner Level Project, we’re not suggesting the use of APIs to extract data programmatically. This is covered in an Intermediate Level Project, I-1. If you’re already comfortable using an API to extract data programmatically, feel free to do so for this project, too.
If you’re not sure how to calculate stock returns, check out the article linked.
⏳ Expected Time To Complete
Approximately 30-60 minutes, depending on your experience of working with stock price data.
Project B-2: Calculate The Total Risk of a Stock
💼 Project Brief
Calculate the total risk of your chosen stock using historical data for stock prices.
☑️ What You’ll Need
To successfully complete this project, you’ll need access to historical stock price data for at least 1 stock of your choice. Historical stock price data is available from Yahoo! Finance. Ideally, you should use the same stock you used in Project B-1. This will allow you to compare the stock’s returns (or expected returns) with its total risk.
⏳ Expected Time To Complete
Approximately 30-60 minutes, depending on your experience of working with stock price data.
Project B-3: Calculate The Market Risk of a Stock
Market Risk – aka Systematic Risk – refers to the stock’s exposure to the overall market portfolio.
💼 Project Brief
Calculate systematic risk for your stock.
☑️ What You’ll Need
To successfully complete this project, you’ll need access to historical price data for at least 1 stock of your choice. Historical stock price data is available from Yahoo! Finance. Ideally, you should use the same stock you used in Project B-1 (and/or Project B-2). This will allow you to compare the stock’s returns (or expected returns) and its total risk with its exposure to the stock market portfolio.
If you don’t know how to calculate the market risk (systematic risk) of a stock, take a look at the article linked above.
⏳ Expected Time To Complete
Approximately 30-60 minutes, depending on your experience of working with stock price data.
Related Course: Data-Driven Investing (with Python)
Get ahead of the game and learn the secrets to successful data-driven investing. You’ll gain insights into investment strategies and techniques used by quant hedge funds and the like. Data-Driven Investing Course.
Don’t let your ambition go to waste – enroll now and start building your data-driven investing system today!
Intermediate Level Finance Data Science Projects
These intermediate-level finance data science projects require at least a basic command of financial securities. They build on the projects above, but dive deeper into the data analytics side of financial data science.
You should’ve been able to work on the beginner-level finance data science projects quite seamlessly in order to work on these projects successfully.
Project I-1: Extract Financial Data Programmatically
💼 Project Brief
Connect to a financial data API of your choice and ‘pull’ financial data programmatically.
☑️ What You’ll Need
To successfully complete this project, you’ll need access to a Financial Data API / provider.
Take your pick from:
Or search “finance data API” to explore current paid and free data providers.
If you’re using Microsoft Excel®, you can switch to Google Sheets and extract the data there using the “GOOGLEFINANCE()” function.
Aim to extract data for a minimum of 50 stocks, preferably 100+ stocks.
⏳ Expected Time To Complete
Approximately 1 – 2 hours, depending on your experience of working with large stock price data and programming knowledge.
Project I-2: Evaluate the Historic Performance Of Your Investment Portfolio
💼 Project Brief
Calculate portfolio return and portfolio risk of your investment portfolio.
☑️ What You’ll Need
You’ll need some sort of investment portfolio to successfully complete this project. This can be an actual investment portfolio you own, or one you intend to build from scratch.
Put differently, you’ll need a selection of stocks (and/or other securities from different asset classes if you’re feeling adventurous) and the historical price data for those stocks.
It’s best if you’ve worked on Project I-1 prior to starting this one. If you’ve been unable to work on Project I-1, but still want to work on this project, explore Kaggle stock market datasets here.
⏳ Expected Time To Complete
Approximately 1 – 2 hours, depending on your experience of working with large stock price data and financial data analysis.
Project I-3: Formally Test The Validity of the Capital Asset Pricing Model (CAPM)
If you’re looking at working on intermediate-level finance data science projects, it’s almost guaranteed you’ve at least heard of the Capital Asset Pricing Model.
It’s probably one of the most controversial asset pricing models of any.
Its supporters passionately put the model on a pedestal, and its critics condemn it as something that’s utterly useless.
Who’s right? Find out for yourself!
💼 Project Brief
Statistically test and validate the CAPM by running an OLS regression (a supervised machine learning model). If your t-stat on the coefficient (beta) is statistically significant, then the CAPM is valid. If not, join the critics and pave the way for a better model.
☑️ What You’ll Need
One way to test this would be to set:
- The historical returns of your portfolio as your dependent variable, and
- The historical returns on the market portfolio as your independent variable
Use an appropriate risk-free rate to calculate excess returns.
Data sources for the set of historical returns have already been highlighted earlier in Project I-1 and I-2. You can obtain information on the appropriate risk-free rate on Bloomberg or the FT, for example.
Finally, run your OLS regression on your statistical tool/package of choice.
Be sure to include an intercept term if the tool/package doesn’t already do this by default.
⏳ Expected Time To Complete
Approximately 1.5 – 3 hours, depending on your experience of working with large stock price data, asset pricing models, and financial data analysis.
Project I-4: Test and Validate The Weak Form Of The Efficient Market Hypothesis
You’ve probably heard of the Efficient Market Hypothesis and how it comprises of three different forms, including:
- Weak Form Efficiency
- Semi-Strong Form Efficiency, and
- Strong Form Efficiency
Now, there are broadly 2 groups of people in Finance:
- those that believe in the Efficient Market Hypothesis (EMH), and
- those that don’t believe in it.
Put beliefs and opinions aside, and test the (weak form) of the EMH on your own.
💼 Project Brief
Formally test the validity of the weak form of the Efficient Market Hypothesis (EMH).
☑️ What You’ll Need
You’ll need historical returns of stocks and the market portfolio to formally test the weak form of the EMH.
Remember, the weak form of the EMH is true as long as abnormal returns cannot be earned consistently by using historic price information.
⏳ Expected Time To Complete
Approximately 1.5 – 3 hours, depending on your experience of working with large stock price data, asset pricing models, and financial data analysis.
Related Course: Data-Driven Investing (with Python)
Get ahead of the game and learn the secrets to successful data-driven investing. You’ll gain insights into investment strategies and techniques used by quant hedge funds and the like. Data-Driven Investing Course.
Don’t let your ambition go to waste – enroll now and start building your data-driven investing system today!
Advanced Level Finance Data Science Projects (❗️Not For The Faint Hearted)
Project A-1: Optimize Portfolios
💼 Project Brief
Optimize your financial investment portfolios to:
- achieve a target expected return,
- minimize risk
- maximize risk-adjusted returns
☑️ What You’ll Need
To optimize your investment portfolio, you’ll need data on the historical returns of the individual securities that make up your portfolio.
Data sources for the set of historical returns have already been highlighted earlier in Project I-1 and I-2.
Feeling lost? Don’t worry – we actually teach these investment analysis / financial data science techniques in our course on Investment Analysis & Portfolio Management (with Excel®) as well as in our Investment Analysis & Portfolio Management (with Python) course.
You can enroll in either course to learn how to optimize investment portfolios this way. Both courses are identical in all aspects other than the tools used to conduct investment analysis (i.e., Excel® vs Python).
⏳ Expected Time To Complete
Approximately 3 – 5 hours, depending on your experience of working with large stock price data, asset pricing models, financial data analytics, and programming knowledge.
Project A-2: Formally Test The Validity of the Fama French 3 Factor Model
It paved the way for factor models and factor investing and continues to be applied in academia and in industry. But does the Fama French 3 Factor Model actually work?
Find out for yourself!
💼 Project Brief
Test the validity of the Fama French 3 Factor Model using an appropriate multivariate OLS regression (a supervised machine learning model).
☑️ What You’ll Need
Data on individual factors is available directly from the Kenneth French Data Library. Alternatively, for bonus points, you can replicate the factors from scratch.
⏳ Expected Time To Complete
Approximately 1 – 5 hours, depending on your experience of asset pricing models, multiple linear regression, financial data analytics, and programming knowledge. It’s also influenced by whether you work with the ‘ready-made’ data from the Kenneth French Data Library, or if you opt to compute the factor returns from scratch.
Project A-3: Identify Themes Within Annual Reports
Take a break from structured data and leverage the power of unstructured data, specifically text data.
💼 Project Brief
Identify topics/themes within annual reports by using an appropriate artificial intelligence / unsupervised machine learning model.
☑️ What You’ll Need
To successfully complete this project, you’ll need access to a reasonably large dataset of firm-level annual reports.
You can use 10-K reports sourced from the SEC’s Edgar Database if you’re working with US firms (or those listed on US stock exchanges).
For firms listed in other countries, you’ll likely need to collect the data yourself from the companies’ websites. Databases do exist, but they tend to be quite expensive.
You’ll also need reasonable experience with an artificial intelligence tool like LDA to identify themes in an unsupervised machine learning setting.
⏳ Expected Time To Complete
Approximately 1 – 5 days, depending on your availability of data, programming knowledge, and desired level of rigour (e.g., whether you’re working with a small dataset, or whether you’re working with Big Data that requires time to collect and process).
Project A-4: Conduct an Event Study To Evaluate The Impact of A ‘Major Event’ On Financial Markets
💼 Project Brief
If you read the financial news, you’ll notice a plethora of ‘gurus’ describing how “this ONE major event caused markets to panic more than ever before”.
Take your pick of an event that might have had an impact on financial markets. Be that a presidential election in the US, a generational event like Brexit in the UK, a regulatory paradigm shift in India – whatever you fancy.
Next, run a formal event study to statistically test and validate the precise impact that the major event may or may not have had on a financial market of your choice.
☑️ What You’ll Need
For this project, you’ll need:
- 1 or more major events that plausibly had an impact on a financial market of your choice
- Historical price data of a large cross-section of securities for a period of time that includes, precedes, and succeeds the major event(s)
- Historical price data for an appropriate market portfolio for a period of time that includes, precedes, and succeeds the major event(s) (so that you calculate abnormal returns)
- An appropriate risk-free rate of return (so that you can calculate excess returns)
You’ll also want to think about a variety of control variables if you’re serious about establishing causality vs mere correlation as part of this financial data analysis.
⏳ Expected Time To Complete
Approximately 1 – 2 weeks, depending on your availability of data, programming knowledge, desired level of rigor, and whether you’re looking to establish causality vs just correlation.
Project A-5: Test and Validate An Investment Hypothesis / Thesis
💼 Project Brief
Start by coming up with an investment idea. Next, transform your investment idea into a testable hypothesis. Lastly, statistically test and validate the hypothesis to see if your investment idea generates alpha.
If it doesn’t generate alpha, repeat the steps above with a new investment idea.
Run out of investment ideas? Stick to investing in the stock market as a whole (via a low-cost index fund).
☑️ What You’ll Need
To successfully complete this project, you’ll need:
- 1 or more investment ideas that you can test in the data
- Historical price data of a large cross-section of securities for a reasonably long period of time
- Historical price data for an appropriate market portfolio (so that you can quantify alpha)
- An appropriate risk-free rate of return (so that you can calculate excess returns)
Not sure how to go about this? Take a look at our Data Driven Investing with Python | Financial Data Science Course. An Excel® version is also available in our course on Data Driven Investing with Excel | Financial Data Science. Both courses will teach you how to formally/statistically test and validate an investment hypothesis / thesis from scratch.
BONUS: Found an alpha-generating investment idea? Go ahead and create an algorithmic trading / investment strategy and backtest your idea, including rebalancing portfolio weights when and where appropriate.
⏳ Expected Time To Complete
Approximately 1 – 8 weeks, depending on your availability of data, programming knowledge, desired level of rigor, and whether you’re able to find a strategy that generates alpha.
Summary and Next Steps
Did you complete all of the finance data science projects above? Hats off to you, well done! We’d love to hear about your findings, honestly. Feel free to reach out to us if you want to share what you found and learned and whether you went big with Big Data in your analysis.
If you’re looking to go even further, it’s worth exploring recent academic and practitioner articles and replicating their results on your own. Not only will this build your skills and knowledge further, it’ll also:
- Give you an immediate and authentic insight into the current research in Finance (both in academia and in the finance industry)
- Allow you to verify the findings reported in the articles you’ve read
- Likely give you food for thought for more areas for you to research and explore in applying financial data science
If you completed some (but not all) of the finance data science projects above, take the time to block out your calendar and work on the ones you haven’t worked on yet. They’re challenging for sure. But they’re also incredibly rewarding once you’ve conquered them.
Our projects may not be as easy or seamless as the projects available elsewhere on “the internet”, but that’s because we’ve focused on projects that aren’t just applicable in the real-world, but also those that genuinely build your skills in using financial data science technologies.
By working on these projects and exploring our related investment courses, you’ll genuinely gain a solid command of crucial concepts in financial data science that’ll hold you in good stead for the rest of your life.
Alright, that’s a wrap from us for now though.
Keep learning and loving Finance!
Related Course: Data-Driven Investing (with Python)
Get ahead of the game and learn the secrets to successful data-driven investing. You’ll gain insights into investment strategies and techniques used by quant hedge funds and the like. Data-Driven Investing Course.
Don’t let your ambition go to waste – enroll now and start building your data-driven investing system today!
Leave a Reply
You must be logged in to post a comment.