Research & Innovation

Stevens Launches Laboratory for Artificial Intelligence in Mathematics Education to Drive Breakthroughs in AI and Math Education

The lab has already set the largest university-level mathematics benchmark of its kind

Artificial intelligence (AI) is transforming learning, including in higher education, but university-level mathematics reveals the limits of large language models (LLMs). Stevens Institute of Technology aims to advance AI in mathematics through its newly established Laboratory for Artificial Intelligence in Mathematics Education, setting new standards for higher education.

Alexei Miasnikov, director of the Laboratory for Artificial Intelligence in Mathematics Education, professor in the Department of Mathematical Sciences and co-founder and scientific advisor at Gradarius explains that LLMs must advance beyond computations to support logical deduction, theoretical understanding and advanced problem-solving — skills essential for decision-making and scientific progress.

"While LLMs have shown impressive reasoning abilities, their capacity for mathematical reasoning remains a critical area of focus," he said.

Located on Stevens’ Hoboken, New Jersey, campus, the lab aims to enhance LLMs’ mathematical reasoning and their applications in education, bridging the gap between human expertise and AI’s potential in math education.

"The lab’s work will help shape the future of AI applications in higher education and beyond by refining tools, exploring AI integration in classrooms, and investigating how AI can solve mathematical problems,” said Jan Cannizzo, associate chair for Undergraduate Studies.

"The work of the lab seeks to fundamentally transform mathematics education by developing AI-driven tools that adapt to individual learning styles, provide personalized instruction and offer new insights into mathematical reasoning,” said Michael Zabarankin, professor and chair of the Department of Mathematical Sciences.

As the lab continues its work, it aims to secure large-scale research grants and leverage industry partnerships for breakthroughs in thoughtful, responsible AI and math education.

Elevating the standards of AI in mathematics education

One recent achievement of the lab is the development of U-MATH, the largest university-level mathematics benchmark. Results were published in U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs. Miasnikov adds: "This benchmark underscores nuances in academic mathematics education. Its multimodal approach and inclusion of a meta-benchmark highlight that even LLMs evaluating LLMs remain an unresolved challenge. We’ve all lived through the initial wave of AI excitement and are now beginning to understand its true nature — not as a magic pill, but as a powerful tool with specific strengths and limitations."

Portrait of professor Alexei MiasnikovAlexei Miasnikov will serve as director of the new Laboratory for Artificial Intelligence in Mathematics Education.

U-MATH is a crucial step in using quantifiable metrics to assess how LLMs can be effectively applied in education. It includes more than 1,000 problems validated by Department of Mathematical Sciences faculty, including Cannizzo; Andrey Nikolaev, teaching associate professor; and Paul Schwartz, lecturer; with contributions from Chloe Weiers, an algebraic cryptography Ph.D. candidate.

The benchmark reveals that even advanced models such as GPT-4o struggle with more than 50% of university-level problems, underscoring the complexity of mathematics education. By contrast, domain-specific models such as Gemini 1.5 Pro and Qwen2.5 have shown promise in areas like visual reasoning.

Addressing the limitations of LLMs goes beyond data, explains Vlad Stepanov, CEO of Gradarius. "We can no longer live with a feeling that LLMs can't do this and that, but give them several more petabytes of data and everything will be solved. We're approaching the limits of what's achievable with LLMs and to use them correctly, we need to understand the limitations pretty well."

"We are confident this comprehensive benchmark will elevate the standards of LLM performance evaluation," said Olga Megorskaya, CEO at Toloka AI, a company that provides data for LLMs. "This benchmarking approach can be applied to any niche topic, ensuring high-quality performance and accountability among LLM developers and fostering trust and integrity in AI technologies."

Launching the lab

The lab, which will leverage advanced computing resources in collaboration with industry partners Gradarius and Nebius AI, will be officially launched on Feb. 19, 2025 at an event that will include a public lecture and demonstrations.

"Nebius AI and Gradarius are thrilled to launch this ambitious endeavor with our long-time partner, Stevens Institute of Technology," said Stepanov. "Partnering with Stevens ensures educational integrity and remains a priority."

Learn more about academic programs and research in the Department of Mathematical Sciences: