The work is its own reward. — Sir Arthur Conan Doyle, British writer and physician

ongoing efforts#

Cobweb is a symbolic algorithm that aggregates instances into “concepts” and forms a hierarchy over these instances - it is unsupervised, piecemeal, and incremental, making it a comprehensive model for human learning. I am currently working with Dr. Pat Langley to implement the Cobweb algorithm in a proposed theoretical framework to model psychological “chunking” for efficient parsing, storage and retrieval of concepts based on a set of defined relations.
If successful, this project holds large implications for the world of embeddings, able to generate a stable and continuously evolving latent space over a stream of data.

(see notes on Cobweb in my work with ISLE) I am currently leading an independent study under Dr. Chris Maclellan in creating a robust, industry-sustainable neurosymbolic solution to retrieval-augmented generation, using Cobweb as the primary algorithm. After conducting studies, we submitted a paper (link coming soon!) on a preliminary study which not only qualified the efficacy of Cobweb as a search algorithm, but led to the innovation of a new metric that outperforms inner-product similarities for embeddings’ semantic comparison.
My goal is to grow this project as a vehicle for justifying the tangible applicability of neurosymbolic systems. I also strive to analyze differentiable forms of symbolic systems like Cobweb with the goal to remove some of symbolic systems’ most fundamental timesinks.

Recently, I joined Professor Anya Ivanova and began contributing to the development of encoding models, models that can convert latent content that represents stimuli into direct voxel representations of brain states, measured by fMRI data. This project places large emphasis on model interpretability, with the goal of running large studies with brain simulations inspired by encoding models if they become successful.
Currently, I am investigating alternative architectures to the traditional encoding model architecture, and attempting to apply different definitions towards interpretability behind pure sparsity in an effort to produce better fits.

past projects#

I did volunteer work for Dr. Larissa Albantakis on a theory called Integrated Information Theory, an information-theoretic framework that attempted to calculate the distributed “consciousness” of the theory. I used IIT to evaluate the performance of neural networks as a saliency test, identifying the most relevant parts of the network under the “substrate of consciousness” defined by the framework.
While IIT is an extremely assuming framework to attribute to consciousness, as a metric of “usefulness” by analyzing the distributed information across a system, it has a lot of play, and I look forward to seeing information theory take greater prominence in shaping the field of ML interpretability.

At my high school, our student body’s ratio of participation to population was disproportionately small, and through my work with student leadership, I discerned part of the gap to be due to a decentralized bank of resources - conflicting information and not a single place to easily access everything. So, I created DuluthGPT, an LLM agent that scraped and added data from athletics websites, extracurricular Instagrams, and the student body themselves, and was able to quickly answer fast Q+A with a basic RAG pipeline.
In addition to answering over 3,000 queries and increasing student engagement by 10%, we were able to analyze results by measuring the frequency of topics asked about, highlighting AI’s useful two-way value proposition in consumer interaction.

I attended the Georgia Governor’s Honors Program for Computer Science in my junior year, where I had the pleasure of launching a mock-startup and conducting research under my advisor. Over the term, I wrote a piece on the application of finetuning diffusion models for the advertising industry based on user demographic and , with the broader social commentary of introducing profitable AI research into the research space before industry so that it could be appropriately regulated.
My paper was accepted to the Applied Human Factors and Ergonomics Conference, but I was unfortunately unable to attend due to another commitment. Paper can be viewed here!

My first full-production machine learning pipeline was applied to the stock market. I used an XGBoost pipeline, tuned with my Biaswrappers, and finetuned predictions using sentiment analysis on Reddit, X, Google News, and Yahoo Finance investor analysis. My goal was to use the momentum of prior stock movements and current news sentiment to analyze the trends of future stock predictions by OHLC, and my framework, though generally performing worse or equal to day-to-day predictions, had some highly successful calls with returns of up to 30%-40% for stocks that weren’t mainstream.

My first machine learning project and independently published paper to a preprint archive! A brief foray into classical forms of regularization on linear regression, including two of my own attempts at unsupervised regularization with respect to the data distribution. The python package currently stands at ~32k downloads!
Paper can be viewed here, Python Package can be viewed here!