Research & Analysis
Publication

Leveraging Transactional Data for Micro and Small Enterprise Lending

Highlights

  • This note highlights two case studies that provide evidence of the value of transactional data for credit scoring for different types of micro and small enterprises.
  • The financial service providers are Indian fintechs Fundfina, which offers credit to small shops, and KarmaLife, which provides credit for platform workers.
  • The evidence from the credit scoring models developed and evaluated for this research supports the following messages: (1) Transactional data can have similar predictive power in credit scoring to credit history. (2) Combining transactional data with credit history can result in better predictions than either of these data sets by themselves. (3) These results hold under different circumstances, including for both different types of MSEs and different types of credit histories.

Introduction

The use of transactional data for credit underwriting can play a part in closing the estimated US$4.9 trillion global financing gap for micro and small enterprises (MSEs). It can help expand access to MSEs without formal lending history, and it can improve product fit by more accurately estimating repayment capacity. This paper contributes to building the evidence base for this by measuring the predictive power of transactional data in credit scoring models.

Transactional data is the record generated by a person or firm’s operations, and in this case study, we will focus on MSE transactions both for micro and small businesses and micro-entrepreneurs such as platform workers. Some common types of transactional data include financial transactions about sales, expenses, orders and invoices. Transactional data can also include a variety of other information such as activity records, inventory records, travel records or customer ratings, depending on the activity of the MSE. Much like bill and loan repayment activity recorded by credit bureaus, transactional data trails provide an objective record of a potential borrower’s financial behavior that enables reliable estimates of the ability to repay a loan.

This paper examines the data and experience of two fintechs in India that use different sets of transactional data for credit scoring. Fundfina uses transactional data from enterprise partners to offer credit to small businesses, primarily fast-moving consumer goods (FMCG) shops and financial services agents. KarmaLife uses transactional data from partner platforms to offer loans to drivers and food delivery platform workers. While it is not novel to recognize that transactional data can be of value in credit assessments, the details of how credit providers use it are generally kept confidential. Direct comparisons between how transactional and credit history data contribute to the predictive power of credit scoring models for a given borrower population are rarely shared.

The data analysis and results presented in this paper are not based on the fintech’s proprietary machine learning models . Instead, we use their data sets and consistently apply the scorecard development methodology most widely used in the credit scoring industry (logistic regression models) to develop “traditional” credit scorecards. With this methodology, the relationships between each model characteristic and loan repayment are easy to understand, compare and present with “scorecard points”. Such model transparency facilitates the scorecard model and credit risk management and helps to compare the contribution of different data sets to a scorecard’s predictive power. 

Fundfina

table showing comparative prediction by data types used in models

Fundfina is a fintech in India that partners with fast-moving consumer goods (FMCG) suppliers and agent networks to access their transactional data and provide loans to the MSE retailers they work with. For the analysis, we used a representative sample of over 5,000 loans issued by Fundfina to MSEs under different partnerships. 

Fundfina estimates that about 80% of their customers do not have a formal credit history. In CGAP’s customer research, conducted with 852 of their customers, 62% of respondents said they did not previously have access to a loan or credit as offered by Fundfina. Given that the majority of customers were new to formal credit, Fundfina did not collect credit bureau data for these clients during its underwriting process. Our analysis therefore uses customer loan repayment history with Fundfina as a measure of “credit history”. Sixty percent of the loans in our sample were issued to customers with Fundfina credit history, or “repeat customers”. 

Table 1 shows the comparative predictive power of three logistic regression credit scoring models built from transactional and/or credit history characteristics. Credit scoring models are best judged by accuracy of out-of-sample prediction , so for ease of presentation, we compare the models we build using the Area Under the Curve (AUC), a popular measure of a model’s overall predictive power . AUC ranges from 0.50 (random) to 1 (perfect prediction), where a higher number indicates better prediction. Although the AUC statistic is not directly comparable for models built for different borrower populations, credit scorecards with an AUC of around 0.70 or higher are likely to be considered useful in credit decisioning. A higher AUC will usually mean that, for a given strategy, a lender can approve more total borrowers for a particular risk appetite or acceptable delinquency rate.

table showing comparative prediction by data types used in models

As we can see in the table, a model built using transactional data alone has predictive power comparable to a model built solely on credit history data, which is generally considered the best single type of data to predict future loan repayment . The combination of both data sources offers a better prediction than on its own. Table 1 presents model results for two different samples: the full sample, where 60%    of clients have credit history; and a smaller sample that only includes the clients with credit history. From the repeat customer sample, we can see that, when people have both data sources available, the predictive power is the same for either credit history or transactional data on its own, but that a model based on the combination of both types of data is stronger. For the full sample, the transactional data preforms better than the credit history data, since part of the borrowers in the sample had no credit history data. This shows that transactional data    can be an effective tool for financial inclusion, allowing providers to underwrite those that are new to credit. 

Fundfina also collects other customer data that we grouped into categories: 

  • Demographic: borrower personal characteristics such as marital status, degree type, job type and home ownership.
  • Enterprise: specific enterprise characteristics such as the main line of business and income stability.
  • Enterprise partner: characteristics of the enterprise partner with which the borrower works, such as quality of customer service.
table showing predictive power of transactional scorecard across enterprise partners

To compare the relative predictive power of these additional types of data, we used the full data set of new and repeat borrowers. Table 2 shows analogous results for models built with demographic, enterprise and enterprise partner characteristics.

As shown in Table 2, transactional data outperforms all of these other tested data types. Combining any or all of them with the transactional data only minimally improves the predictive power when compared to transactional data alone. 

Finally, the predictive power and relationships of transactional characteristics to loan repayment were stable across Fundfina’s four main enterprise partners (Table 3). The transactional patterns are predictive of loan repayment irrespective of the type of business that generates the transaction history suggests that the value of transactional data will be similar across different types of businesses. 

Customer Impact

In order to learn more about what transactional-based lending can mean for customers, CGAP conducted research with a sample of 852 Fundfina customers. The research showed some promising results for the impact on MSEs.

The majority of customers were new to credit. Sixty-two percent of respondents did not have access to a similar form of credit before receiving loans from Fundfina. Most customers felt they did not have easy access to other good options for borrowing. When asked if they could easily find a good alternative to Fundfina’s loans, 73% of customers responded “No”, 9% “Maybe” and only 18% “Yes.”

Customers reported that loans had a positive impact on their businesses:

  • Ninety percent of customers described an increase in money earned because of the loan, with 21% citing significant improvements.
  • Eighty-nine percent of customers also reported that their business had grown as a result of the loan.
  • Access to the loan also improved their ability to manage business finances (80%) and pay bills on time (77%).

Having access to credit decreased the amount of time people reported spending worrying about their finances. Seventy-four percent of customers expressed a decrease in financial stress (15% significantly; 59% slightly), while only 4% reported an increase, with the other 22% seeing no change. Finally, while 12% of customers were somewhat burdened by the repayment and 3% felt it was a heavy burden, the majority of customers (85%) did not feel repayments were a problem. This may be in part due to the fintech’s policy of repayment through small daily amount versus a large payment at the end of the loan that could be more challenging to businesses new to credit and with variable revenue streams.

KarmaLife

KarmaLife is an India-based fintech start-up that serves gig platform workers and broader pools of blue-collar workers with different liquidity and savings solutions. Mobility segment workers most commonly cite the need for higher-ticket, instalment-linked loans, often to finance a vehicle or repairs, which enable them to earn more from ride-hailing work. 

KarmaLife has developed its scoring models based on partner platform data. Those models have proven predictive for their target segment. KarmaLife’s experience shows that early-wage access can lead to greater driver engagement, productivity and retention, creating incentives for platforms to extend credit, directly or indirectly, to their drivers. Initial results suggest longer-tenure loans improve driver engagement in the immediate week after getting the loan. Based on a cohort of 8,000 platform workers, 93% of workers who took a loan were available to work the next week, in comparison to 85% of workers who did not take out a loan despite being eligible. The analogous figures six weeks after loan eligibility were 95% for borrowers and 89% for non-borrowers. This suggests that tailored financial services may offer gig worker platforms a way to serve, engage and retain their best workers.

table showing comparative prediction by data types

More transactional activity, particularly more frequent and stable activity, is associated with better loan repayment. KarmaLife’s experience shows that greater earnings, more working hours, and higher driver ratings are all associated with lower repayment risk. In our analysis based on 15,000 loans to drivers for Porter, a truck and bike delivery platform, we found that platform data worked as well as credit bureau data in predicting a driver’s creditworthiness. Platform data also materially improved the predictive accuracy of credit models using bureau data. 

Unlike the case of Fundfina, the vast majority (>95%) of platform workers in the KarmaLife dataset had a file in the credit bureau . 

The relationship between KarmaLife’s platform data and loan repayment make ‘business sense’. In general, more activity on the platform is associated with lower repayment risk. Different measures of platform activity are also correlated, such that models can reach maximum prediction for a given data set without including all possible characteristics or engineered “features” in the scorecard. Table 5 provides examples of some of the most predictive platform data characteristics in the Porter data set and their predictive power as measured by the AUC statistic.

table showing how platform activity is related to loan repayment

Together the Fundfina and KarmaLife cases support the hypotheses that: (i) transactional data can be used as a reliable predictor of loan repayment for clients with no formal credit history; and (ii) that such data are likely to improve the prediction of bureau scores or models based only on credit history data. 

Conclusion

The two case studies present evidence that transactional data have the potential to predict credit risk as well as the “gold standard” of credit history  . Transactional data are also likely to enhance the predictive power of credit scorecards when combined with other types of data. By leveraging transactional data, financial service providers may be able to expand access to credit for new borrowers, including those new to formal credit, without taking on significant additional risk. The results in this brief hold for different types of retail businesses and gig workers, indicating the approach can be applicable across different sectors. Customer research suggests that loans received based on transactional data reached customers who were previously excluded from access to formal credit and confirms the positive impact that access to credit can have on small businesses. 

This note highlights opportunities to use transactional data to further financial inclusion. Traditionally, accessing transactional data has required financial service providers to negotiate partnerships with businesses with access to such data for large groups of customers. While the models presented in this case study rely on fintechs which have developed partnerships with companies that share data on their MSE network, similar data can now be accessed through open finance schemes in some markets. Providers can proactively seek sources of this data to grow their portfolios towards underserved segments. Providers can seek partnerships with non-financial organizations to provide embedded finance products using transactional data. Where available, providers can join open finance regimes and develop credit products geared towards those with no or limited credit history, as well as improve the accuracy of their assessments for all customers. As these data-sharing schemes that empower customers to share their own data grow, the opportunities to use transactional data to better predict risk and expand access to more customers will grow exponentially.

Acknowledgments

The authors would like to thank the following colleagues: Will Cook and Tatiana Alonso Gispert for peer review; Xavier Faz and Arisha Salman for input and guidance; and Feven Getachew Asfaw for editorial support. They would also like to thank Badal Malick, Sachin Tripathi, and Siddharth Singh of KarmaLife and Nishant Bhaskar and Abhijit Naik of Fundfina for contributing samples of their anonymized data, as well as their valuable time and insights to the study.