Reinforcement Learning for Pedagogy-Aligned AI
This research project explores how reinforcement learning can be used to align large language models (LLMs) with effective teaching pedagogies. Inspired by recent work on transitioning from problem-solving AI to teaching-oriented AI systems, we investigate methods for training models that prioritize student learning over answer provision.
Research Focus
Building on foundational work by Dinucu-Jianu et al. (2025) on “From Problem-Solving to Teaching Problem-Solving,” this project investigates:
- RLHF for Pedagogy: Using human feedback from educators to train teaching-aligned behaviors
- Scaffolding Strategies: Training models to provide appropriate hints and guidance levels
- Socratic Reward Modeling: Developing reward functions that encourage questioning over answering
- Educational Alignment: Ensuring AI behavior aligns with established learning theories
Key Innovation
While most AI tutors are optimized for accuracy and helpfulness, this project optimizes for pedagogical effectiveness—training models that improve student learning outcomes, not just provide correct answers.
Related Work
Dinucu-Jianu, D., Macina, J., Daheim, N., Hakimi, I., Gurevych, I., & Sachan, M. (2025). From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning. arXiv:2505.15607

I am a tenure-track Assistant Professor in the Computer Science department at Thompson Rivers University. My research centers on the impact of generative AI on the learning behavior and outcome in computer science education. Before joining TRU, I was a Postdoctoral Fellow at the UBC Master of Data Science, where I developed and taught a variety of data science courses, including those on statistical inference, machine learning, and technical communication. In addition to teaching, I coordinated the capstone program, facilitating student collaborations with industry partners on real-world data science projects.
Prior to UBC, I worked as a Postdoctoral Fellow in Learning Analytics at the School of Information, University of Michigan. My research focuses on analyzing students social interactions and peer effects from spatio-temporal large scale data. My work has been recognized with competitive grants, and multiple best paper awards at prominent conferences, including LAK18 and HCI International 17.
I hold a PhD in Learning Analytics at The Open University UK, a BSc and MSc in Economics from Maastricht University, Netherlands.