Monthly updated dataset of research-level mathematics MCQs derived from recent arXiv publications.
| Category |
|---|
*: One question might have different categories at the same time.
Browse MCQs from the benchmark. Each question is derived from a real arXiv paper theorem.
Understanding the design and methodology behind LiveMathematicianBench.
LiveMathematicianBench is a live, continuously updated benchmark that evaluates LLMs on their ability to understand and reason about cutting-edge mathematical theorems from newly published arXiv preprints.
New papers appear on arXiv every month. We extract theorems from these papers and generate multiple-choice questions that test deep mathematical understanding, ensuring that models cannot rely on memorized training data.
Questions and choices are constructed from theorem statements and proof sketches extracted from arXiv papers. Each question has five carefully crafted choices (one correct, one weaker-but-true, and three false). Only the question and choices are used as input for the model—the original theorem and proof sketch are not provided.