In this article, you’re going to get an introductory overview into Natural Language Processing or NLP for Finance.
So let’s get into it.
What is Natural Language Processing (NLP)?
Firstly, what is Natural Language Processing / NLP?
Ultimately, it’s just a set of techniques which help us gain meaningful insights from text data.
Or for that matter, any other type of human language data; for instance, voice.
Ultimately the idea is to use these set of techniques to try and gain insights – preferably actionable insights – from language data.
Or indeed, from unstructured documents / data in general.
And for the most part in Finance, at least today, when we think about human language data, we typically work with text data.
But it wasn’t always like this in finance.
NLP for Finance – A Brief History
Historically, academics and practitioners in finance have largely relied on numerical data for investment analysis.
And this ranges from something as simple as ratios to more advanced portfolio optimisation techniques.
But the idea is, regardless of which aspect of finance you look at, be it investment analysis, be it financial modelling or financial analysis, or capital budgeting…
Regardless of which concepts or areas you look at… for the most part, people have worked with structured numerical data.
Related: Investment Analysis with Natural Language Processing (NLP) Course
This Article features a concept that is covered extensively in our course on Investment Analysis with Natural Language Processing (NLP).
If you’re interested in leveraging the power of text data for investment analysis, you should definitely check out the course.
Text Data in Finance
Now this wasn’t because we didn’t have a lot of text data / unstructured data in finance far from it.
In fact, finance has so much text data, that few fields can actually compete with that sort of volume.
Predominantly relying on numerical data instead of text data was largely because analysing these large volumes of text data was extremely time consuming and cumbersome.
Large sizes of unstructured content
To give you just a minuscule idea of the sheer scale of text data that’s available in finance…
Back in 2015, the Wall Street Journal reported that the average annual report or 10-K had about 42,000 words.
And this was in 2013.
That was up from roughly 30,000 words in 2000.
To put this in perspective, the Sarbanes Oxley Act of 2002, which was this really massive piece of legislation that came about as a result of scandals like Enron and WorldCom and all the other corporate scandals during the.com era.,,
Well, that massive piece of legislation had approximately 32,000 words!
Annual reports today, which is something that firms have to publish every single year, at least back in 2013, they had about 42,000 words on average.
And the size is not really getting particularly smaller today.
Importantly, of course, if you’re thinking 42,000 words is not all that much; this is just an average.
So you’ll find plenty of annual reports that have hundreds of thousands of words.
And of course you will find some annual reports that have tens of thousands of words.
But the point is that this is for a single annual report.
And firms listed on the financial market / stock market need to publish these annual reports every single year!
So just take a single firm and, say you’re looking at 10 years worth of data. And the average number of words is 42,000.
Well, you have 420,000 words to analyse now.
So good luck if you’re doing that manually!
I wouldn’t be keen and quite frankly, very few people working.
And this is why until fairly recently, these really massive volumes of text data in finance, which have potentially so much value in them, were just left untouched.
Technical Jargon
Of course, the size isn’t the only factor that meant people weren’t analysing these reports.
For instance, the CFO of GE, Jeffrey Bornstein was taken aback by the sheer size of their own annual report!
Their annual report was about 110,000 words long. And he himself suggested that not a single retail investor on earth could get through it, let alone understand it.
And in terms of this latter part year… this “understanding these annual reports”; that’s ultimately because annual reports tend to have a lot of technical jargon that not a lot of people actually understand.
And this is not limited to just retail investors.
Although mutual fund managers and hedge fund managers and pension fund managers may not openly admit it…
Not all of them necessarily understand what all these annual reports are on about.
Because sometimes they just have terms that one might not have come across.
Want to go further?
Get the Investment Analysis with NLP Study Pack (for FREE!).

Why use NLP for Finance?
The point is, academics and practitioners didn’t really work with text data in finance, despite there being so much text data, partly because of course of the technical jargon involved, but largely because of the sheer size of the alternative data.
Which meant of course, analysing all of this text data manually was simply not feasible.
Fortunately, though, thanks to major advancements in NLP technology, particularly thanks to computational linguistics, it’s now significantly easier to analyse insanely large volumes of text data. The so-called “Big Data”.
But it’s not just about more than just analysing this text data. It’s ultimately about gaining actionable insights or value from that text data.
Current Applications of NLP for Finance
And if we think about the current applications of NLP for Finance… they’re fairly extensive.
They’re certainly increasing.
And I think, with time, they’re only going to get bigger and better.
Specifically though, while the applications of NLP for Finance are fairly wide in their scope, we think we can broadly categorise them into three different types.
NLP Applications in Context
The first of which is Context
This is about using NLP techniques to try and gain context from text data in finance.
For example, it’s a case of using Topic Modelling algorithms to try and establish the context of financial news articles or firm announcements, business descriptions, annual reports, and a whole host of other “Big Data” or “Big Text Data” in Finance.
It’s a case of using these machine learning / artificial intelligence algorithms in unsupervised settings to try and establish the themes or topics that are being discussed or talked about in these various different kinds of text data.
So that’s context.
NLP Applications in Compliance
Then there’s Regulatory Compliance, which focuses on things like detecting insider trading or detecting and preventing fraud within the financial services / financial industry in particular.
And it’s doing so using unique sets of data; for instance, emails or indeed chat transcripts inside firms.
Generally speaking, NLP application in regulatory compliance will require internal unstructured content instead of external ones like earnings calls transcripts, for example.
NLP Applications in Quantitative Analysis
And lastly, there’s the case of NLP application in Quantitative Analysis.
For instance, one major NLP application involves creating trading strategies, using “sentiment analysis“.
This involves firstly estimating the sentiment that firms may display, using unstructured data like annual reports, earnings calls transcripts, social media posts, etc.
And then using that sentiment to create trading strategies (often dubbed sentiment investing strategies).
Your biggest takeaway from this article should be that Natural Language Processing (NLP) allows us to really leverage the power of text data and work on interesting problems in Finance.
Do check out our sister article on NLP applications in Finance for a more in-depth view of applications in context, compliance, and quantitative analysis.
Related Course: Investment Analysis with Natural Language Processing (NLP)
Do you want to build a rigorous investment analysis system that leverages the power of text data with Python?
Leave a Reply
You must be logged in to post a comment.