§P-005Research
← All projectsOffline RL for Adaptive Tutoring
May — Jun 2026
StackPyTorch · Offline RL · LLM Systems
StatusIn progress
RoleResearch
Follow-on research to the MCTS curriculum work and the AI tutor. The question: once a tutor is in live conversation with a learner, what is the optimal next item to introduce — given that the tutor wants to (a) cover content the learner is on the edge of mastering, (b) keep the conversation feeling natural, and (c) maximize long-term retention?
This project frames the in-conversation scheduling problem as a sequential decision problem and trains an offline RL policy on logged tutor-learner interactions. The learner-side model is a stochastic simulator with FSRS-inspired forgetting dynamics, layered on a fluency model that penalizes abrupt topic shifts.
Highlights
- 01Offline RL formulation for in-conversation vocabulary scheduling
- 02FSRS-inspired stochastic learner simulator
- 03Multi-objective reward: long-term retention + dialogue naturalness