§P-005Research

Offline RL for Adaptive Tutoring

May — Jun 2026

StackPyTorch · Offline RL · LLM Systems

StatusIn progress

RoleResearch

Follow-on research to the MCTS curriculum work and the AI tutor. The question: once a tutor is in live conversation with a learner, what is the optimal next item to introduce — given that the tutor wants to (a) cover content the learner is on the edge of mastering, (b) keep the conversation feeling natural, and (c) maximize long-term retention?

This project frames the in-conversation scheduling problem as a sequential decision problem and trains an offline RL policy on logged tutor-learner interactions. The learner-side model is a stochastic simulator with FSRS-inspired forgetting dynamics, layered on a fluency model that penalizes abrupt topic shifts.

Highlights

01Offline RL formulation for in-conversation vocabulary scheduling
02FSRS-inspired stochastic learner simulator
03Multi-objective reward: long-term retention + dialogue naturalness

← All projects Home →