Cohort 2025

Fellows

Meet the fellows accepted to the Cooperative AI Research Fellowship and explore their bios, research interests, and project proposals.

Portrait of Akash Kundu

Mentors

Vincent Conitzer & Emanuel Tewolde

Akash Kundu

Bio

I am Akash Kundu, a final-year Computer Science undergraduate with several years of experience in technical AI Safety, particularly in evaluating and stress-testing large language models. My work focuses on uncovering behavioural failures in LLMs: dark patterns, sycophancy, harmful reasoning, multilingual vulnerabilities, and other inconsistencies. I’ve co-authored research presented at venues such as ICLR, AAAI, and NeurIPS, and have worked with Apart Research, FAR AI, and Humane Intelligence on evaluation pipelines, adversarial prompting, and cross-cultural red-teaming.

Research Interests

Recently, I’ve been moving toward questions about how model behaviour shifts in multi-agent settings. I’m especially interested in cooperation, signalling, sanctioning, conflict, and emergent group-level failures as models become more agentic and interactive. I plan to pursue a PhD in AI Safety, and this fellowship is a way for me to deepen my understanding of cooperative dynamics, multi-agent safety, and mechanisms that can support aligned behaviour in groups of AI systems.

Project Proposal

Similarity-Based Cooperation and Communication - This project investigates how perceived similarity between LLM agents affects their communication strategies and emergence of cooperation. It combines similarity-based cooperative equilibrium concepts with cheap-talk communication in a feasible three-month scope.

Portrait of Bhavyesh Sajja

Mentors

Max Kleiman-Weiner

Bhavyesh Sajja

Bio

Hey, I am Bhavyesh -- a first-year CS PhD student at the National University of Singapore, advised by Dr. Roger Zimmermann and Dr. Tan Zhi-Xuan.

Research Interests

I am broadly interested in AI alignment, with a special focus on automated negotiation, and Bayesian norm learning.

Project Proposal

LLM Negotiation Benchmarking - Develops automated negotiation frameworks combining LLMs with theory-of-mind, game theory, and computational argumentation to enable rational deliberation and value-aligned bargaining. Aims to reduce transaction costs in human negotiation while addressing LLMs' current failures in selective cooperation and principled reasoning.

Portrait of Joseph Low

Mentors

Michiel Bakker

Joseph Low

Bio

Joseph is a community governance researcher exploring the intersection of technology, collective action, and democratic participation. He is a contributor to Metagov, where he has worked on deliberative tooling, Public AI, and decentralized autonomous organizations. His research is driven by a core tension: technology intended to empower communities often reproduces the very hierarchies and exclusions it aims to solve, a dynamic he terms the "commodification of community". This has led him to focus on pluralistic alignment: ensuring AI systems can serve multiple communities with different, often conflicting norms, rather than imposing universal standards that erase cultural and contextual differences.

Research Interests

Building technology that empowers community; Who gets to build it? Who decides what empowerment means? Who is “community”?

Project Proposal

Latent Opportunities: Leveraging User-Owned Data to Facilitate Human Cooperation Your conversational data contains signals about potential collaborations with others that you didn't know exist. What if we could leverage that to match you with a co-founder, or to find research collaborators for your ideas? This project demonstrates a decentralized mechanism for surfacing cooperation opportunities from chat histories while preserving privacy and user control. By analyzing conversational patterns locally and sharing only anonymized representations, users can discover potential collaborators whose intents and needs complement their own. The system functions as a decentralized marketplace for collaboration, where matching happens based on what people are actually trying to do rather than abstract interest categories. The goal is to explore whether user-owned data can facilitate meaningful cooperation without requiring users to surrender control to centralized platforms.

Portrait of Mariana Meireles

Mentors

Zhijing Jin & David Guzman Piedrahita

Mariana Meireles

Bio

I strive to live poetically, compassionately and fearlessly. In my career, the ideal expression of this is to broadly improve the lives of all beings. As a researcher, I work in improving control over LLMs in order to make them more secure and in discovering what are the fundamental factors that aligns intelligence.

Research Interests

Theoretical neuroscience and evol. biology, symmetry and geometric deep learning, morality, phenomenology, category theory, emergence. Michael Levin and Fernando Rosas have won my favorite researchers 2025 award.

Project Proposal

A: Mechanism-Level Cooperation Control via CAA - Uses Contrastive Activation Addition to inject mechanism-specific cooperation vectors (reciprocity, reputation, clustering) into LLM residual streams, avoiding single-metric Goodhart effects. Evaluates multi-objective trade-offs in GOVSIM and public-goods games to test whether CAA outperforms RLHF for cooperation without degrading general capabilities. B: WebCommons Platform - Builds WebCommons, a deterministic web-based platform where heterogeneous AI agents with distinct goals interact over shared digital resources. Tests various sanctioning mechanisms (reputation, audits, throttling, bans) to identify which deter exploitation without suppressing beneficial behavior across different adversary prevalences and network structures.

Portrait of Omer Kamal Ebead

Mentors

Joel Leibo

Omer Kamal Ebead

Bio

My name is Omer Ebead, and I am from Sudan. I hold a background in Electrical Engineering and a Master's degree in AI for Science from AIMS South Africa, funded by DeepMind.

Research Interests

My research interest lies in multi-agent systems. I am currently completing an internship at InstaDeep, focusing on multi-agent reinforcement learning. Aware of the challenges multi-agent systems will pose in the future, I am now directing my efforts toward multi-agent safety research.

Project Proposal

AI Agent Modeling of Human vs. AI Co-Players - Investigates whether AI agents' Theory of Mind (ToM) modeling is more accurate and cooperation is more successful when interacting with AI co-players versus human co-players in shared resource dilemmas. Uses Concordia or GovSim platforms to compare miscoordination, conflict (exploitation), and collusion frequencies across AI-AI and AI-Human teams, measuring joint payoffs, fairness metrics, ToM inference accuracy, and qualitative dialogue analysis to identify which partner type enables better cooperation and where distinctive failure modes emerge.

Portrait of Oscar Duys

Mentors

Lewis Hammond & Michiel Bakker

Oscar Duys

Bio

Oscar recently completed an Honours degree in Applied Mathematics at the University of Cape Town and will begin a Master’s in Applied Mathematics under Professor Jonathan Shock in 2026. He works at the intersection of large language models, multi-agent systems, reinforcement learning, and evolutionary optimisation, with a focus on how learning agents can cooperate and coordinate in complex environments.

Research Interests

He is particularly interested in a future where AGI emerges from orchestrated multi-agent systems rather than a single monolithic model. Recently, he has been exploring self-play with small language models and using language models as components within evolutionary systems.

Project Proposal

Mediation Through Self-Play - This project trains LLM mediators through three-agent environments where two disputants with conflicting goals interact with a mediator that learns intervention strategies. Success is measured through resolution rates and bridging scores, with potential reinforcement learning adaptation of mediator behavior.

Portrait of Pramod Kaushik

Mentors

Sahar Abdelnabi

Pramod Kaushik

Bio

Pramod Kaushik is an incoming PhD student at FBK and the University of Trento.

Research Interests

His research focuses on decision-making agents and social behavior in AI systems. He previously worked at TRDDC Pune, Inria Bordeaux, and Columbia University, and his recent work on LLM sampling theory received a Best Paper Award at ACL 2025. He has also worked on human decision-making and neurocomputational models of the brain.

Project Proposal

Rational Negotiation Agents - This work aims to build LLM agents capable of principled negotiation and deliberation by combining language models with theory-of-mind, game theory, and computational argumentation. The goal is creating agents that can replicate human rational dialogue while reducing transaction costs in bargaining scenarios.

Portrait of Qi Guo

Mentors

Lewis Hammond

Qi Guo

Bio

My name is Qi, I’m born and raised in China :D My background is computer science and I work as a software engineer for 2+ years. Since closely working with AI in my daily workflow, I’m drawn to understanding how advanced models behave and coordinate in messy, real-world settings.

Research Interests

I love building evaluation tools, exploring agentic behavior, and working across disciplines. My core motivation is reducing AI risks through careful, empirical testing and building systems that improve our understanding of model behavior and make it more trustworthy.

Project Proposal

Collusion Evaluation: This work extends an inter-agent influence evaluation to study how advanced AI models engage in collusive behavior across multi-agent settings. It combines research on scenario design with infrastructure work extending the Inspect AI framework to create a reproducible benchmark for assessing multi-agent coordination risks.

Portrait of Van Quynh Thi Truong

Mentors

Zhijing Jin & David Guzman Piedrahita

Van Quynh Thi Truong

Bio

Van Quynh Thi Truong is a computational scientist and artist obsessed with how AI is reshaping global power dynamics, multi-agent cooperation, and human creativity. She is also a talented artist whose crowdsourced sculptures and scientific illustrations have helped advance interdisciplinary conversation and public engagement between the humanities and STEM. She received her BA in Anthropology (Univ. Florida), MS in Biotechnology (Johns Hopkins), MA in Statistics & Data Science (Wharton), and PhD in Computational Biology (UPenn). She has been awarded the Microsoft Research PhD Fellowship, ACM SIGHPC Computational & Data Science Fellowship, and NIH AIM AHEAD Advanced Data Analytics Traineeship.

Research Interests

Multi-agent safety, how autonomous multi-agent systems could destabilize critical digital infrastructure, technology & society, global power dynamics re: tech capacity & multinational partnerships, entrepreneurship ecosystems

Project Proposal

Sanctioning in Multi-Agent Societies - Extend the SanctSim framework to test whether mediators, tool-calling for transparency, and cross-model benchmarking can prevent reasoning LLMs from collapsing into free-riding. The project studies whether AI agents can keep each other accountable in different social contexts. The goal is to find common threads between SanctSim and GovSim, seeking empirical insights for fine-tuning that correct free-riding without undermining existing cooperative behaviors.

Portrait of Yves Bicker

Mentors

Zhijing Jin & David Guzman Piedrahita

Yves Bicker

Bio

I hold a Master’s degree in Physics with a specialization in AI for Science from the University of Zurich and ETH Zurich. My research experience spans deep learning for science, reinforcement learning, and optimization. More recently, my focus has shifted toward multi-agent learning and RL-based fine-tuning of large language models, particularly in settings where interaction, coordination, and strategic behavior are central.

Research Interests

My current research focuses on multi-agent systems, with an emphasis on RL-based fine-tuning, incentive design, and robustness under non-stationarity and distribution shift. I study how cooperative dynamics can be aligned through structured fine-tuning in multi-agent LLM systems, particularly in safety-critical settings. More broadly, I am interested in principled methodology development and the theoretical foundations of learning algorithms. This perspective extends to AI for scientific discovery at the intersection of generative modeling, optimization, and reinforcement learning, particularly in structured exploration of out-of-distribution spaces for scientific discovery (e.g., molecular design), and in how multi-agent frameworks can accelerate this process. .

Project Proposal

Multi-Agent Cooperative Post-Training: This project investigates cooperative fine-tuning in multi-agent systems. We develop multi-agent post-training algorithms and compare them against independently trained baselines in simulation environments such as GovSim and SanctSim. The goal is to evaluate (1) how joint training strategies improve cooperation and stability, and (2) which alignment mechanisms best sustain coordination under strategic interaction. The broader objective is to design principled training frameworks that promote pro-social behavior in multi-agent systems and to benchmark alignment mechanisms against state-of-the-art multi-agent approaches.