Review Arcade: On the Human Alignment and Gameability of LLM Reviews

Original Source

ArXiv AI (cs.AI)

by Hans Ole Hatzel, Sebastian Steindl, Jan Strich

Read Full Article

arXiv:2605.28897v1 Announce Type: new Abstract: LLM-generated reviews for scientific papers are gaining considerable traction and are even being officially piloted by major conferences. We have to assume that not only reviewers are using LLM-assistance, but also that authors use LLMs to revise their papers before submitting. In this work, we perform empirical experiments on papers from the 2025 ACL Rolling Review (ARR) to evaluate LLM reviews from both the author and the reviewer perspective. First, we identify a limited alignment of LLM reviews with human ones. In the best-case scenario, the alignment is reasonable. However, we also find that LLM-human alignment varies substantially across prompts and models. Finally, we investigate the scenario in which the author uses an iterative draft-revise workflow to improve the submission according to the LLM review. We find that this "gaming" of LLM reviews can be effective in specific scenarios, leading to a statistically significant increase of overall scores for up to 35\% of papers. We publish our code: https://github.com/uhh-hcds/reviewarcade.

Tags:LLMAI

Original Content Credit

This summary is sourced from ArXiv AI (cs.AI). For the complete article with full details, research data, and author insights, please visit the original source.

Visit ArXiv AI (cs.AI)

What happens when companies become too AI-pilled?

TechCrunch AI

Industry News1m

What happens when companies become too AI-pilled?

The people deciding that AI can replace your job are also the ones least likely to understand what your job truly involves, according to Box founder Aaron Levie, who pointed to this as an example of “AI psychosis.” Indeed, ClickUp recently cut 22% of its workforce for AI ag

May 29, 2026

After Nvidia’s $20B not-aqui-hire, AI chip startup Groq reportedly raising $650M

TechCrunch AI

Business AI1m

After Nvidia’s $20B not-aqui-hire, AI chip startup Groq reportedly raising $650M

Chipmaker Groq is looking to raise $650 million in internal funding as it pivots from hardware to focus more on AI inference, the process of refining the way AI models respond to prompted requests, per Axios.

May 29, 2026

We Asked the ‘Future of Truth’ Author to Explain How He Used AI. It Didn’t Go Well

Wired AI

Industry News1m

We Asked the ‘Future of Truth’ Author to Explain How He Used AI. It Didn’t Go Well

A book about how AI shapes perceptions of reality came under fire for using AI-generated quotes. Its problems go beyond that.

May 29, 2026

Review Arcade: On the Human Alignment and Gameability of LLM Reviews

Related Articles

What happens when companies become too AI-pilled?

After Nvidia&#8217;s $20B not-aqui-hire, AI chip startup Groq reportedly raising $650M

We Asked the ‘Future of Truth’ Author to Explain How He Used AI. It Didn’t Go Well

After Nvidia’s $20B not-aqui-hire, AI chip startup Groq reportedly raising $650M