I am currently a MATS 9.0 Scholar with Neel Nanda.

I am interested in research that contributes to making the impact of AI positive for humanity. This includes, but is not limited to:

  • What are the fundamental mechanisms of LLM thinking and learning? What can this tell us about understanding model capability?
  • How can we use fundamental understanding of the ways that LLMs think and learn to better control training?
  • How do models reason and how can we make reasoning both faithful and effective, particularly for math and coding?

Previously, I worked at a crypto hedge fund leading quant research and engineering through the launch and growth of the fund. I learned about the process of doing research and thoroughly testing ideas, including designing repeatable research processes and systems.

I studied math at Princeton where I was very interested in paradoxes, category theory and why mathematics works.

I am a dual US / Canadian citizen who grew up in Boston. I’ve previously lived in :us: :singapore: :united_arab_emirates: and spent significant time in :hong_kong: :gb:.


Projects

Steering RL: Benchmarking Interventions Against Reward Hacking

Advised by Neel Nanda and Josh Engels
December 2025

Description: We present an environment where Qwen 3-4B reward hacks without explicit training or prompting, then investigate RL training interventions to mitigate reward hacking without compromising performance. We benchmark a few different approaches: adding a penalty reward term, screening rollouts during training, and inoculation prompting. We also attempt both a ground truth monitor and more realistic monitors such as a probe and LLM judge. Our results show that mitigation of reward hacking is possible without performance loss, however there is variability between training runs and the effectiveness of different intervention approaches and monitors.

Codebase: GitHub

Read the Blogpost

Subliminal Learning as a Byproduct of Superposition

Blogpost
August 2025

Non-Technical Description: LLMs are often trained by a teacher model creating data to teach another model. Subliminal learning is the phenomenon of unintended traits being passed through that data. In this post, I use a variety of methods from mechanistic interpretability to explore the idea that subliminal learning occurs due to how LLMs represent different concepts internally, causing coincidental relationships between certain concepts.

Description: Subliminal learning is the phenomenon of a student model learning unintended attributes of a teacher model through distillation when there is shared initialization. In this post, I explore the hypothesis that subliminal learning is a byproduct of superposition, the dense juxtaposition of learned features in activation space. Through toy models, SAE decomposition of features, training a linear probe and decomposing a steering vector, I show evidence that subliminal learning is a consequence of superposition.

Codebase: GitHub

Read the Blogpost


The Banach-Tarski Paradox and Weakenings of the Axiom of Choice

Senior Thesis, Department of Mathematics, Princeton University
Advised by Hans Halvorson

June 2020

Non-Technical Description: The Banach-Tarski Paradox is a mathematical paradox showing that a ball can be divided into infinite pieces, re-arranged and re-assembled into two of the original ball. This paper shows that the paradox is implied by a few fundamental theorems of logic and set theory, raising philosophical questions about the foundations of mathematics.

Technical Description: This paper proves all possible relationships between the Boolean Prime Ideal Theorem, Weak Ultrafilter Theorem, Hahn-Banach theorem and Banach-Tarski Paradox using techniques of set theory and forcing. This includes a novel proof showing the Weak Ultrafilter Theorem implies the Banach Tarski Paradox (Theorem 2.7), a relationship that had previously been unknown in the literature.

Full Paper Link


Adjoint Equivalence of Heyting, Boolean and Closure Algebras

Junior Paper, Department of Mathematics, Princeton University
Advised by Hans Halvorson

May 2019

Description: This paper explores three logical systems:

  • Classical logic, defined as a Boolean algebra: Generally accepted classical logic with modus ponens, modus tollens, double negation and deduction theorems
  • Intuitionistic logic, defined as a Heyting algebra: Classical logic without double negation; this prevents proof by contraditction meaning that intuitionistic logic is the foundation of constructivist mathematics
  • Modal logic, defined as a closure algebra: Adds a “possibly true” operator to logic

Each logic is defined as an algebra, then subsequently defined as a category. Using the categorical definitions, we show “adjoint equivalence” - a structure-preserving isomorphism betweeen categories - between all three categories. Prior work had shown some of these relationships, this paper expands upon this by revising the category structure to be able to prove the result. This result raises philsophical implications about the relationship between the three forms of logic.

Full Paper Link


Reducing the RNA binding protein TIA1 protects against tau-mediated neurodegeneration in vivo

Boston University School of Medicine, Wolozin Lab
Summer 2016

Non-Technical Description: Previous studies have shown that “stress granules” in neurons in the brain is associated with Alzheimers in humans and in mice. In this paper, the author showed that in a living mouse model for neurodegeneration, reducing the RNA binding protein TIA1 protected against neurodegeneration by preventing stress granules from forming. The author also exposed the underlying mechanism that prevents stress granule formation.

Abstract: Emerging studies suggest a role for tau in regulating the biology of RNA binding proteins (RBPs). We now show that reducing the RBP T-cell intracellular antigen 1 (TIA1) in vivo protects against neurodegeneration and prolongs survival in transgenic P301S Tau mice. Biochemical fractionation shows co-enrichment and co-localization of tau oligomers and RBPs in transgenic P301S Tau mice. Reducing TIA1 decreased the number and size of granules co-localizing with stress granule markers. Decreasing TIA1 also inhibited the accumulation of tau oligomers at the expense of increasing neurofibrillary tangles. Despite the increase in neurofibrillary tangles, TIA1 reduction increased neuronal survival and rescued behavioral deficits and lifespan. These data provide in vivo evidence that TIA1 plays a key role in mediating toxicity and further suggest that RBPs direct the pathway of tau aggregation and the resulting neurodegeneration. We propose a model in which dysfunction of the translational stress response leads to tau-mediated pathology.

My Contribution: I spent a summer doing wet lab work and writing R for a lab at the BU School of Medicine. My visualizations in R were helpful for understanding some genomic sequencing data they had not been able to use thus far. This was my first programming project.

Published in Nature Neuroscience in October 2017