#18. EQUITY QUANT: From Consumer Complaints to Investment Alpha
Read Time: 8 min
This is EQUITY QUANT, a new experimental column where we add explicit and tangible value to equity investors. Its geographic remit is wider than that of our staple columns on China (i.e. LONG VIEW, THE BRIEF), and may cover both emerging (incl. ex-China) and developed markets.
Plotting Portfolio Performance
Can publicly available data on consumer complaints be reworked in such a way as to generate discernible alpha in investment returns? Turns out that yes. Here are the portfolio results.
Specifically, we use the publicly available Consumer Financial Protection Bureau’s (CFPB) Complaints Database. We tie the data to a universe of 25 publicly traded US consumer facing financial institutions. We extract features from the data and derive variables using textual analysis of the consumer complaint narratives.
The derived variable is year-on-year similarity / change score in the consumer complaint narratives. The main idea is that if the complaints a company receives change year on year, this means something. In this case, it turns out that companies with the largest change (i.e. changers) in their complaints outperform non-changers. Inspecting the differences visually, it’s clearly visible that Portfolio Q1 performed worst and Q5 second best. The 130/30 long-short portfolio is best.
*Not investment advice. Do your own research.
**change_score = 1 - similarity_score
1. Data Sourcing
We use the Consumer Financial Protection Bureau’s (CFPB) Complaints Database. These are complaints made against financial products of particular companies (e.g. credit scoring, credit cards, loans). The database updates daily.
2. Feature Selection
We’re mostly interested in the “Consumer complaint narrative” column, which contains unstructured text. While the database goes back to 2011, these consumer narratives only become available from 2015, giving us 7 years of data (Jan 2015 - Dec 2021) with a total of 841,219 non-empty entries (consumers elect whether to disclose their narrative).
Since not all companies which receive complaints are publicly traded, for the purpose of this exercise we’ll just peel the top 25 from the top and ticker tie them later in the workflow.
3. Textual Analysis
Now we run the textual analysis:
Convert dataframe to corpus format
Make tokens out of the words
Remove stop words (e.g. “a”, “the”)
Stem the words (e.g. so we count “fraudulent” and “fraud” as same since stem would be “fraud”)
Just for demonstration purposes, after all this data wrangling we can run a word cloud visualisation. In this case we’re comparing across companies for 2021.
And here’s a word cloud across time for the same company, Wells Fargo.
For building intuition, we can also run a lexical dispersion plot for particular keywords that may be relevant in the financial context. In this case we plot “fraud”, “identity”, “violat*” for Wells Fargo by year.
4. Deriving Variables
Here’s the key part. We want to calculate the similarity of each Company-Year’s worth of complaints to one another. E.g. We compare Wells Fargo complaints from 2021 to Wells Fargo complaints from 2020 using a similarity score for each year.
Here’s a visualisation of the similarity score (i.e. derived variable). A score of 1 means that the nature of this year’s versus the previous year’s complaints is identical. The closer to zero, the more different is one Company-Year to another. By definition then: 1 - similarity_score = change_score.
Going further, we can plot company migration through time and quantiles like so:
5. Quantile Segmentation
To prepare the similarity score for testing, we need to:
Rank companies by their similarity score by year
Assign these to quintiles
Create portfolio weights
Run calculations and plot the portfolios
Here’s a visualisation of the weights. Since we have 25 stocks in our universe, a quintile will be 5 stocks. Within each quintile, each stock will take a 20% weighting.
And here’s a visualisation of the Q5-Q1 portfolio weight. This is a long-short portfolio. Long the stocks from the favoured quintile and short the stocks from the disfavoured quintile. The favoured quintile was then weighted at 130 and the disfavoured at 30 to give a 130/30 allocation. Notice that the long stocks sum to above 1.00 and the short stocks have negative weights and sum to -1/3.
6. Plotting Portfolio Performance
Here are the resulting portfolios for each quintile. Inspecting the differences visually, it’s clear that Portfolio Q1 performed worst and Q5 second best. The 130/30 long-short portfolio is best.
For our Brazilian subscribers, here’s a company complaints database for Brazil. The analysis done in this post can be replicated regardless of language. Instead of English stop-words, use Portuguese ones. Same goes for other markets, Chinese and Japanese included.
For a single company case study using the similarity / change score, have a look at our previous post on AAPL here.