MoReBench: evaluating procedural and pluralistic moral
reasoning in language models, more than outcomes.
Proceedings of the International Conference on Learning
Representations 2026. [authors: Yu Ying
Chiu, Michael S. Lee, Rachel Calcott, Brandon Handoko, Paul de
Font-Reaulx, Paula Rodriguez, Chen Bo Calvin Zhang, Ziwen Han,
Udari Madhushani Sehwag, Yash Maurya, Christina Q. Knight,
Harry R. Lloyd, Florence Bacus, Mantas Mazeika,
Bing Liu, Yejin Choi, Mitchell L. Gordon, Sydney Levine |
arXiv |
website
|
ScaleAI blog post |
ScaleAI YouTube interview]
2025:
Disagreement, AI alignment, and bargaining.
Philosophical Studies, 182.7, 1757-87 [pre-publication
pdf |
doi]
Synopsis: When stakeholders disagree about how an AI
should behave, resolving this disagreement through simulated
bargaining is better than resolving it through voting or
expected value maximisation
2023:
Large language models and biorisk. American Journal of
Bioethics, 23.10, 115-8. [coauthors: William D'Alessandro and Nathaniel
Sharadin | pre-publication
pdf |
doi]