I was looking for a math tutor. Unfortunately, auto manufacturers are no longer willing to provide this service:
In lieu of buying a car, I built a math tutor MCP server.
This post describes MathTutor, but the main takeaway should be the underexplored design area it lives in: symbiosis between the model and the software. MathTutor is called only from inside a LLM-based agent; the LLM generates most application state on-demand while MathTutor keeps it on task over a long time horizon.
Curriculum as a Tool
Before building MathTutor, I tried the easier alternative of asking a LLM to teach me. Results were similar across frontier Claude, GPT, and Gemini models. They can teach an individual lesson well, but without guidance the model fails to pull many lessons together into a cohesive curriculum. I needed a way to manage a full course.
Thus, MathTutor (MT from now on). It splits responsibilities: the LLM teaches, the software schedules the curriculum. The curriculum is written ahead of time as a human + LLM collaboration, so each tutoring session operates over an existing curriculum rather than making one up as it goes. While tutoring, the model attends to one lesson at a time.
A tutoring session begins when I start a new thread and type "Tutor Me." The LLM calls MT to fetch the next scheduled lesson or quiz topic. Then the LLM generates content for the topic, saves it to MT, and presents it. We have a back-and-forth where it teaches and I answer questions. Finally, the LLM calls MT again to persist my quiz answers before the next fetch.
MT effectively works as an iterator over the curriculum, waiting for me to complete an item before it moves on. If I reach the end of the curriculum, MT returns done and class is over.
Design
MT's design leans into LLM strengths (mathematical knowledge and conversation) while sidestepping its shortcomings in scheduling and long-term memory. The teaching bits improve as models improve, and the deterministic bits never degrade. I can start a tutoring session in a new thread and pick up at the exact point where I left off on a separate device, with a different LLM.
Every MT feature is model-discoverable and model-controlled, so there is no mismatch between app and agent capability. This solves a huge problem for agents. A chat UI provides no info about MCP feature coverage, and if an MCP server exposes the wrong subset of features, requests fail in a most infuriating fashion. The only real solution is feature parity. If the whole app embeds in the agent, parity is solved.
Hot Take: 90% of the "write CLIs, not MCPs" discourse follows from CLIs being structurally discoverable by the agent and MCPs being slapped-on APIs with incomplete coverage.
A tool also avoids introducing new user friction or conflicts of interest. LLM and agent choice is up to the user; they have no new GUI to learn or app to install. MT has no token opex and thus no incentive to degrade model quality or cap conversation length.
Fundamentally, the core design mechanic works. The LLM + tool combo is a great teacher. I can argue with my textbook. I can ask it for alternate explanations, harder quizzes, or more examples. Spaced repetition means I master the material instead of reading and forgetting. I can see this design pattern spreading for education apps, scripted games, and creative tools.
The Future is in Beta
The current user experience suffers from agent quality issues. Claude for iOS cannot add custom MCP servers. The Claude for Mac MCP client fails to follow tool-call schemas correctly. Antigravity fails to connect to custom MCP servers; when I try to troubleshoot, Google's LLM links me to third party blog posts instead of Google developer docs, which to be fair are also incorrect. Uptime reliability hovers between one and two nines. It's all a bit sloppy and immature.
Before adoption can happen in force, agents will need to fix their warts. But in the long term, I foresee a boom in MT-style agent tools and libraries.