Keynotes

Vishal Misra

Columbia University

Small Models of Large Models: What Transformers Compute, and Why Modeling Found It First

Abstract: Three questions are loose in our community right now. Are the methods we built our careers on still useful in the age of scale? Is there anything left to say about a system whose behavior is studied by benchmark? Does modeling, in the SIGMETRICS sense, still have purchase on the most important computational artifact of our time? This talk argues that the answer to all three is the same.

Transformers are not approximating Bayesian inference. They are implementing it, and the implementation has a geometry we can characterize, predict, and stress-test. I will take you through three results that together establish this. In small wind tunnels where the Bayes posterior is analytically computable, transformers recover it to machine precision while capacity-matched MLPs fail by orders of magnitude. The mechanism is not exotic. Cross-entropy gradients decompose in a way that forces the structure, through routing and value specialization. And the same geometric signatures survive the jump to production models, modulated in interpretable ways by architecture, data, and depth.

What this picture says is that the boundary between what scale can and cannot do is not mysterious. It is analyzable. Transformers compile position-local inference circuits when statistics are stationary, and they fail to construct reusable programs beyond the training horizon. The community that built queueing theory, identifiability, and controlled experiment is the community whose tools this science is missing. I will close with the case for why.

Bio: Vishal Misra is the RKS Family Professor of Computer Science and the Vice Dean for Computing and AI in the School of Engineering at Columbia University. He is an ACM and IEEE Fellow and his research emphasis is on mathematical modeling of systems, bridging the gap between practice and analysis. As a graduate student, he co-founded CricInfo, which was acquired by ESPN in 2007. In 2021 he developed one of the world's first commercial applications built on top of GPT-3 for ESPNCricinfo, and has since been modeling the behavior of LLMs. He also played an active part in the Net Neutrality regulation process in India, where his definition of Net Neutrality was adopted both by the citizen's movement as well as the regulators. He has been awarded a Distinguished Alumnus Award by IIT Bombay (2019) and a Distinguished Young Alumnus Award by the UMass Amherst College of Engineering (2014).

Steve Teig

Amazon

Reasoning about reasoning: towards principled AI

Abstract: Mainstream “AI” systems are astonishing. They are also unreliable and untrustworthy because the technologies that power them are based largely on empiricism / anecdotes and only limited theory. Attempts to develop more principled methods benefit by explicitly questioning the vast folklore: the many unacknowledged assumptions that pervade today's model architectures and training. In that spirit, we will carefully examine several mainstream techniques to identify missing or erroneous reasoning within them. By suggesting scientifically motivated improvements, we aim to take steps towards removing the quotes around “AI”.

Bio: Steve Teig is a Vice President and Distinguished Engineer at Amazon, where he serves as the chief technologist for Edge AI: both the software and the silicon for on-device intelligence. His current research spans efficient inference, knowledge distillation, mixture-of-experts, and linear attention, work that sits at the seam between model architecture and the hardware that runs it. He has founded six companies across software, biotech, and silicon, and was the first to apply machine learning to drug discovery, opening a field now central to modern pharmaceutical research. He holds nearly 500 patents, roughly 150 of them in machine learning, and is a recipient of both an Edison Award and a World Technology Award. He has been thinking about artificial intelligence almost every day since 1978.

Adam Tauman Kalai

OpenAI

Evaluating large language models for accuracy incentivizes hallucinations

Abstract: Large language models sometimes produce confident, plausible falsehoods (“hallucinations”), limiting their reliability. Prior work has offered numerous explanations and effective mitigations, such as retrieval and tool use, consistency-based self-verification, and reinforcement learning from human feedback. Nonetheless, the problem persists even in state-of-the-art language models. Here we show how next-word prediction and accuracy-based evaluations inadvertently reward unwarranted guessing. Initially, next-word pretraining creates statistical pressure toward hallucination even with idealized error-free data: using learning theory, we show that facts lacking repeated support in training data, such as one-off details, yield unavoidable errors, while recurring regularities, such as grammar, do not. Subsequent training stages aim to correct such errors. However, dominant headline metrics like accuracy systematically reward guessing over admitting uncertainty. To align incentives, we suggest two additions to the classic approach of adding error penalties to evaluations to control abstention. First, we propose “open-rubric” evaluations that explicitly state how errors are penalized, if at all, which test whether a model modulates its abstentions to stated stakes while optimizing accuracy. Second, since hallucination-specific benchmarks rarely make leaderboards, we suggest using open-rubric variants of existing evaluations to reverse their guessing incentives. Reframing hallucination as an incentive problem opens a practical path toward more reliable language models.

Joint work with: Santosh Vempala, Ofir Nachum, and Edwin Zhang.

Bio: Adam Tauman Kalai is a Research Scientist at OpenAI whose work spans AI safety and ethics, algorithms, fairness, AI theory, game theory, and crowdsourcing. He earned his BA from Harvard University and his PhD from Carnegie Mellon University, and has held research positions across academia and industry, including at MIT, TTIC, Georgia Tech, and Microsoft Research New England. His work has received numerous honors, including the Majulook Prize.

Additional keynote speakers may be announced as the program is finalized.