CRAFT Research Projects

2024 - 2025 Research

AI Compliance Officer

Within the United States, the number of regulations that financial institutions are required to comply with has seen an inexorable rise over time. The rate of new pages of regulation accelerated after the 2008 financial crisis with the Dodd-Frank Act. To guarantee that institutions satisfy these regulations, financial institutions rely on compliance officers who possess the requisite legal understanding. In March 2023, ChatGPT passed the Uniform Bar Exam, demonstrating the ability of modern Large Language Models (LLMs) to comprehend and interpret legal texts. In this project, we plan to train an LLM on legal texts as undertaken by previous studies; with this LLM, financial regulations can be interpreted automatically to provide actionable insights. Such a tool can be used in conjunction with traditional compliance officers to reduce costs.

PI - Zachary Feinstein

Co-PIs - Ionut Florescu, Hamed Amini

Analyzing Financial Information with XBRL-enhanced Foundation LLM

Using large language models (LLMs) in XBRL analysis may revolutionize how financial data is accessed and understood accurately and reliably. XBRL markups capture better semantics, which will help LLMs perform better at financial tasks. An XBRL-enhanced foundation LLM will simplify data aggregation and support informed decision-making. This project demonstrates CRAFT's dedication to making complex financial analytics more accurate, efficient, and accessible. The project enhances the current foundation LLMs to learn XBRL-encoded financial knowledge, making advanced financial insights widely accessible to the public.

PI - Xiaoyang Liu

Co-PIs - Mohammed J. Zaki, Steve Yang

Large Transaction Models (LTMs) for FinTech

This project builds foundation Large Transaction Models (LTMs) using public financial transaction datasets for a wide range of analytics tasks. We are forming a new multi-modal, multi-domain transaction data testbed that will be used to train the LTM and develop its downstream application on financial tasks. It combines our own decentralized finance transaction datasets from lending and exchanges with public benchmark transaction datasets in other domains. As advised by CRAFT IAB members, we target downstream use cases in finance, including fraud/anomaly detection, forecasting, and predictive analytics for loan evaluation. We propose using reinforcement learning (RL) with market data feedback for fine tuning/adapting the model to downstream tasks that specifically leverage financial data and models for tuning, in contrast to the more costly RL with human feedback. LTMs digest massive transaction datasets to capture nuanced patterns and trends in the data sets and their context, thereby providing a more dynamic, efficient, and informed analysis. The RL market data feedback leverages the representations and models used in existing financial analytics for transactions to inform the LTM. Our framework enables future users to digest their own datasets and solve their own use cases with novel actionable insights for well-founded individual and strategic decision-making.

PI - Kristin Bennett

Co-PI - Oshani Senveratne

Team Members - Xiaoyang Liu, John Erickson, Aaron Green, estimated nine undergraduate students

Semantically Enhanced Graph Neural Networks for Event-Driven Financial Impact Analysis

Large language models (LLMs) based on the Transformer architecture have become increasingly popular and have been adopted in various domains in recent years. However, they construct relations arbitrarily between all pairs of words within a (small) context without considering the underlying semantics, and they rely on enormous amount of textual data for training the model. We propose a novel graph learning approach that uses abstract meaning representation (AMR) to create a textual graph for an entire document and uses graph neural networks (GNNs) to generate document-level embeddings that better capture the semantics. Since AMR graphs extract meaning and semantic relations between entities from text, they can be used to produce better textual embeddings. Our approach can thus be applied in a variety of prediction tasks based on textual signals, and preliminary experiments show that our approach yields better results than the state-of-the-art transformer-based approaches, while requiring less time and computational resources. We propose to apply our new AMR-based deep graph learning approach to evaluate the impact of key financial events on target features, focusing on two use-cases: (1) trend prediction based on textual signals, and (2) correlation prediction between entities for both equity risk and credit risk. Given the wide applicability of our approach, we hope to work with the industry partners in alternative use cases of interest, and we also plan to explore commercialization opportunities in keeping with CRAFT’s mission.

PI - Mohammed Zaki

Co-PIs - Aparna Gupta

Team Members - Bolun "Namir" Xia

Smart Encoding and Automation of Over-The-Counter Derivatives Contracts

The absence of a central counterparty leaves the Over-The-Counter (OTC) derivatives market prone to systemic risks. Current manual processes for encoding derivatives contracts into tradable formats are labor-intensive, error-prone, and not uniformly regulated, which can lead to significant operational risks and compliance issues. Our proposal, developed with insights from several CRAFT IAB members, addresses these challenges by introducing an end-to-end system for the smart encoding, verifying, automating, and auditing of OTC derivatives contracts. By leveraging innovative neuro-symbolic AI techniques to translate contract terms into standardized Financial products, Markup Language (FpML) documents, and Ethereum-based Solidity smart contracts, we aim to enhance market efficiency, reduce operational risks, and elevate compliance standards. The dual-output system ensures compatibility with existing trading platforms while embracing the automation and security benefits of blockchain technology. The adoption of this system promises substantial value for financial institutions, regulators, and, ultimately, the broader market by streamlining processes, increasing transaction speed, improving auditability, and fostering trust in OTC markets.

PI - Oshani Seneviratne

Co-PI - Aparna Gupta

Team Members - Maruf Ahmed Mridul (Ph.D. student), Kaiyang Chang (undergraduate student)

2023 - 2024 Research

Blockchain Interoperability for Business Organizations

Business operations that involve multiple parties and organizations face challenges such as a lack of transparency, inefficiency, and disputes. These challenges can significantly impact the smooth functioning of business operations and increase the risk of financial losses. Blockchain technology has the potential to address these challenges by enabling secure and transparent transactions through smart contracts. However, business organizations have many legacy components and must interface with other organizations that use disparate systems, including various blockchain implementations. However, blockchain interoperability solutions are far from ideal and have recently been subjected to numerous attacks.

Our research will investigate risk-aware blockchain interoperability with a business operations lens. The findings will inform the development of new solutions that are streamlined for secure multi-party transactions within and between organizations. Informed by our previous CRAFT project findings, we will focus on several business-specific use cases that would enable multiple business entities to engage in multi-party transactions, encode their service-level agreements and business logic into smart contracts, and provide a mechanism to resolve any issues if disagreements occur later. We will demonstrate the efficacy of our proposed system using several use cases informed by CRAFT IAB members.

PI - Oshani Seneviratne 

CoPI - Aparna Gupta       

Team Member - Inwon Kang

Comprehensive Financial Disclosure Lexicon

The vast amounts of narrative disclosure – whether mandated by accounting regulations or voluntarily disclosed by firms’ management – creates a demand for natural language processing (NLP) in accounting research. In general, researchers have applied either lexicon-based approaches or machine learning approaches.[1] While machine learning approaches are better tools for prediction and are more appropriate in certain contexts (especially where researchers lack ex ante business knowledge), the lexicon-based approaches have the advantages of greater transparency and replicability. Some research combines the approaches, for example, by utilizing machine-learning based approaches to create a prediction and then “reverse-engineering” the outcome to create or expand dictionaries relevant to the specific domain of disclosure.

This project will study the feasibility and potential benefits of a website with a comprehensive repository of the wordlists (aka dictionaries) that have been developed for specific NLP analyses of text-based information relevant to financial markets. The website could serve as a resource for researchers involved in NLP applications for text-based information.

[1] We define lexicon-based approaches as those that treat a textual disclosure as a bag of words and apply dictionaries to capture content. Bochkay et al. (2022) group NLP models into four categories: (i) simple transformations, (ii) text comparisons, (iii) traditional machine learning, and (iv) deep learning.

PI - Elaine Henry                              

CoPI - Jing Chen, Joon Ho Kong  

Team Member - Arion Cheong

Efficient, Private, and Explainable Federated Learning for Financial Crime Detection

We propose to develop a resource-efficient federated learning solution for financial crime detection that preserves data privacy, both during training and inference. We will validate the performance and privacy guarantees of our method through formal analysis and experimental evaluation on datasets from the Privacy-Enhancing Technologies Prize Challenge on Transforming Financial Crime Prevention, co-sponsored by the National Institute of Standards and Technology and the National Science Foundation. Finally, we will construct and evaluate novel privacy-preserving explainability mechanisms for the federated setting.

PI - Stacy Patterson         

CoPI - Oshani Seneviratne and Aparna Gupta

Extending, Simulating and Scaling Decentralized Exchanges Made by Automated Market Makers

Automated market makers [AMMs] are a decentralized approach to creating a financial market. This project has two goals: (i) constructing new decentralized market structures and (ii) simulating dynamics of these decentralized exchanges. This project will create a mathematical and computational platform to test trading strategies, market construction, and regulatory measures before they are introduced in practice.

PI - Zachary Feinstein                     

CoPI - Ionut Florescu and Ivan Bakrac

Federated Learning for Fairness-aware and Privacy-Preserving Financial Risk Assessment

The objective of this project is to study and design new federated learning techniques to preserve privacy and improve fairness in machine learning applications in financial domains. Financial data are generally distributed which means different financial entities save their own data locally and do not share data for learning purposes. However, machine learning algorithms benefit from learning large-scale datasets that cover diverse distributions. Thus, federated learning has become a popular architecture for distributed learning where data are saved locally. The goal of federated learning is to be able to harness data without a third party ever directly interacting with the data, thus ensuring users’ data remain secure. We will study federated learning algorithms for heterogeneous data with fairness constraints. We will design new dynamic and adaptive strategies for parameter sharing to handle heterogeneous financial data distributions. To guarantee different clients receive fair predictions, we will develop random matrix-based aggregation methods to prevent privacy leaking in fairness-constrained federated learning.

PI - Yue Ning                                      

CoPI - Nan Cui

Systemic Risk Implications of Central Bank Digital Currencies

Within the United States, wholesale clearing takes place through Fedwire. This clearing system is centralized and requires significant oversight. Central bank digital currencies [CBDCs] introduce a novel approach for wholesale clearing that could reduce overhead costs. This study aims to investigate the systemic risk implications of introducing CBDCs into the financial system. We develop a clearing model of both the current system (i.e., Fedwire) as a baseline to compare a novel CBDC clearing systems model. In doing so, we investigate the risk of both individual agent failure and systemic failures. These models can be used for stress testing purposes as well as provide a framework for studying the optimal design, from a systemic risk perspective, of the agent network associated with this new asset class.

PI - Zachary Feinstein                     

CoPI - Rui Fan and Stephen Taylor

2022 - 2023 Research

Fast Quantum Methods for Financial Risk Management

We investigate how quantum simulation and optimization can deliver a fast, scalable, and secure computing platform for (1) assessing systemic market and credit enterprise risk, (2) pricing complex financial/insurance instruments for risk analysis/regulatory reporting, (3) executing portfolio analysis/robo-advising.

Causal Inference for Fairness and Explainability in Financial Decision

Properly explained and fairness adjusted algorithms are becoming trending demands in business decisions, from seemingly minor issues like same-day delivery eligibility to getting access to life-changing opportunities such as education, employment, housing, and creditworthiness. Interpreting decisions help to build trust with users, ultimately improving their public images and strengthening their market presence and influence.

Risky Business? Deep Dives into DeFi

CRDecentralized Finance (DeFi) is an emerging new financial ecosystem built on the back of blockchain technology. With over $2 trillion locked in cryptocurrencies and the rapid adoption of new DeFi products by retail users and institutions, we need to understand how the volatile DeFi ecosystem may potentially disrupt the traditional financial sector. We seek to investigate current patterns of usage in DeFi lending protocols and quantify risk and user behaviors across various protocols.

Explainable ML for Credit Risk Analytics

This research project aims to assess the validity of explainable AI/ML(XAI) tools and models in the context of credit analytics by comparing the time-series and cross-sectional stability of XAI algorithms (LIME, Shapley values, etc.) with regard to several common ML algorithms (including logistic regression, tree-based models, and deep neural networks) on real consumer credit data.

High-dimensional Portfolio Design and Optimization using an Explainable Ensemble Learning Framework

Our research is going to deliver a framework and methodology to personalize portfolios. We propose a novel two-step approach to design an optimal portfolio by building an ensemble learning framework. Investment portfolio design requires combining various risky assets with appropriate weights to provide an acceptable trade-off between the portfolio return and risk to the investor, while satisfying policy and diversification requirements.

Risk Mitigation in Cross-Platform Decentralized Finance

The current leading solutions for blockchain interoperability in DeFI, such as Cosmos, Polkadot, and Chainlink, provide inter-blockchain communication to bridge various incompatible technologies, messaging, identity, and data formats. However, there are varying degrees of security, trust, and identity mechanisms utilized in these solutions, which has resulted in several cross-blockchain exploits in the recent past and hesitation by the financial industry in adopting blockchain solutions. The primary goal of this project is to see how cross-chain protocols can be strengthened to minimize such risks in DeFi.

Predictive Learning from Long Financial Documents

The aim of this project is to leverage the vast troves of textual data spanning disclosures, reports, news articles and reviews to extract insights and features for predictive learning. The focus is on effective textual models for prediction of quantitative performance indicators from long documents, which is a particularly challenging domain. Our goal is to develop novel language modeling techniques for the representation of long financial documents, advancing the state of pre-trained language models as well as graph neural networks.