Reinforcement Learning for Pedagogy-Aligned AI

Feb 1, 2025 · 1 min read

This research project explores how reinforcement learning can be used to align large language models (LLMs) with effective teaching pedagogies. Inspired by recent work on transitioning from problem-solving AI to teaching-oriented AI systems, we investigate methods for training models that prioritize student learning over answer provision.

Research Focus

Building on foundational work by Dinucu-Jianu et al. (2025) on “From Problem-Solving to Teaching Problem-Solving,” this project investigates:

RLHF for Pedagogy: Using human feedback from educators to train teaching-aligned behaviors
Scaffolding Strategies: Training models to provide appropriate hints and guidance levels
Socratic Reward Modeling: Developing reward functions that encourage questioning over answering
Educational Alignment: Ensuring AI behavior aligns with established learning theories

Key Innovation

While most AI tutors are optimized for accuracy and helpfulness, this project optimizes for pedagogical effectiveness—training models that improve student learning outcomes, not just provide correct answers.

Dinucu-Jianu, D., Macina, J., Daheim, N., Hakimi, I., Gurevych, I., & Sachan, M. (2025). From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning. arXiv:2505.15607

Last updated on Feb 16, 2026

Reinforcement Learning LLM Alignment Pedagogy AI Tutoring Machine Learning

Authors

Quan Nguyen

Assistant Professor of Computing Science

I am a tenure-track Assistant Professor in the Computer Science department at Thompson Rivers University. My research centers on the impact of generative AI on the learning behavior and outcome in computer science education. Before joining TRU, I was a Postdoctoral Fellow at the UBC Master of Data Science, where I developed and taught a variety of data science courses, including those on statistical inference, machine learning, and technical communication. In addition to teaching, I coordinated the capstone program, facilitating student collaborations with industry partners on real-world data science projects.

Prior to UBC, I worked as a Postdoctoral Fellow in Learning Analytics at the School of Information, University of Michigan. My research focuses on analyzing students social interactions and peer effects from spatio-temporal large scale data. My work has been recognized with competitive grants, and multiple best paper awards at prominent conferences, including LAK18 and HCI International 17.

I hold a PhD in Learning Analytics at The Open University UK, a BSc and MSc in Economics from Maastricht University, Netherlands.

TRU-CRAFT Jun 1, 2024 →

No results found

Reinforcement Learning for Pedagogy-Aligned AI

Research Focus

Key Innovation

Related Work