2024–25 Evaluator Prize Winners

About The Unjournal

The Unjournal is a nonprofit that commissions expert evaluation of quantitative research informing global priorities, in areas including global health, development economics, animal welfare, institutional reform, climate policy, the impact of technology on society, and catastrophic risk. Our evaluators provide detailed feedback, quantitative ratings across multiple dimensions, and considered assessments of the strengths and limitations of each research object. All evaluations, ratings, author responses, and manager syntheses are publicly shared at unjournal.pubpub.org, and in scholarly databases.

For context, see our mission and approach in a nutshell, our guidelines for evaluators, our current evaluation form, and our draft evaluation interface (in development).

Why an Evaluator Prize?

High-quality peer review is a public good, but it is chronically undervalued. Thoughtful, critical, and constructive research evaluations take real expertise and time, yet they are usually unpaid, invisible, and disconnected from real-world decision-making.

The Unjournal Evaluator Prize aims to change that, and to reinforce The Unjournal’s high standards. We recognize and reward evaluators whose work exemplifies these goals.

The strongest evaluations engage deeply with the research methods and assumptions and their relevance and applicability to the real-world context and decision-relevant questions. They are reasoning-transparent and fair-minded. They offer specific and actionable suggestions. They communicate the research and its strengths, limitations, and implications clearly, providing value to future researchers, funders, and policymakers.

These prizes are part of The Unjournal’s broader effort to:

Professionalise and reward evaluation work
Encourage norms of openness, transparency, and intellectual honesty
Demonstrate what journal-independent, public evaluation can achieve
Foster rigorous and constructive public evaluation and discussion that drives better research and more informed policy

This year, we are allocating $6,500 in evaluator prizes across four prize tiers, selected through a process of evaluation manager ratings, team voting, and advisory board input.

$6,500 Total prize pool

88+ Published evaluations considered

8 Evaluators recognized

Our commitment: For the 2024–25 cycle, The Unjournal set aside $75 per evaluation toward prize incentives, building a pool of $6,500 in evaluator prizes. Combined with base compensation ($100–$300+ per evaluation), promptness bonuses, and intermediate incentive awards, our average evaluator compensation target is approximately $450 per evaluation.

Prize Winners

First Prize

$2,000 + discussion honorarium

Masset and Sharma Waddington

Evaluation of “Water Treatment and Child Mortality: A Meta-analysis and Cost-effectiveness Analysis”
Global Health Development Policy

This exemplified a rigorous evaluation focused on real-world decision-relevance. It was a rare example of evaluators bringing systematic review and policy expertise directly to bear on a prominent and policy-relevant meta-analysis. This yielded unusually concrete, implementable recommendations.

Judicious and specific recognition of research value: Noting the importance of estimating mortality impacts (not just morbidity), the ambition to inform decision-makers, and the use of prediction intervals, sensitivity analyses, and cost-effectiveness work.

The search strategy: They identified deviations from the registered protocol, discrepancies in data extraction between drafts, and specific eligible studies that were missing.

Signs of relevant methodological fragility, linked to headline results and policy-relevant decisions:

“We observed that the odds ratio estimates and 95 percent confidence intervals differed… between the two working paper drafts [representing] around one-third or more of the pooled effect magnitude”

Designs vs outcome measures, interpretation: They distinguished between “standalone” water treatment and “multicomponent” packages (e.g., those including hygiene or cookstoves).

“The conclusions may need to be qualified by observing that the results should be interpreted as approximations of the effects of ‘water treatment and protection’, if… [they] were implemented as multicomponent packages”

Flagging potential overstatements: They considered the potential for Hawthorne effects, limiting implications for scaling. They also questioned whether “18 studies [14 from middle income countries] could represent the [relevant] contextual variability.”

Limitations of cost-effectiveness analysis and policy implications:

“Since two of the cost-effectiveness estimates directly concerned chlorination, it would seem more appropriate to use the pooled meta-analytic effects for chlorination alone…”

The authors’ response was substantive and constructive. They called the evaluations “thorough and thoughtful” and acknowledged receiving “extensive write-up and precise recommendations.” They revised the paper’s framing in response, repositioning from proving water treatment’s necessity to determining which type of intervention is appropriate for which context. They accepted points on study representativeness, CONSORT reporting standards, and the need for deeper cost modelling, noting that several sections are “still being worked on” including expanded sensitivity analyses and meta-regression accounting for combined interventions.

Real-world impact — GiveWell engagement: This evaluation package attracted attention from GiveWell, which relies heavily on the Kremer et al. meta-analysis for its chlorination grantmaking. Teryn Mattox of GiveWell’s water team described the evaluation as “very influential,” noting that GiveWell had been independently considering commissioning a replication. A GiveWell research director confirmed finding the evaluation useful, particularly because it “flagged issues that should be addressed (e.g., improper exclusion of certain studies)” without finding “major smoking guns”—an outcome that helped calibrate confidence in the underlying evidence for water treatment interventions that inform hundreds of millions of dollars in charitable funding.

Read the Full Evaluation Evaluation Summary Authors’ Response

Joint Second Prize — $1,000 each

Three evaluations recognized for outstanding depth, insight, and engagement with the research.

Joint Second Prize

$1,000

David Reiley

Evaluation of “Does Online Fundraising Increase Charitable Giving? A Nationwide Field Experiment on Facebook”
Economics Policy

David Reiley brought decades of experience in both the economics of charitable giving and digital advertising practice to this evaluation. This was among the most detailed and insightful reports in our portfolio.

His evaluation combined:

Technical credibility checks (researcher degrees of freedom, data treatment robustness, assumptions)
Practical sector insights (what the results mean for fundraisers, platform dynamics, and measurement choices)
Clear decision-facing guidance (how to interpret profit claims, spillovers, and uncertainty)

We hope that this evaluation package, with specific suggestions, leads to followup work (including reanalysis of this high-value field experiment data and further experiments), helping fundraisers for high-impact global charities better understand how to optimize their advertising budgets, considering both the direct returns and the potential crowding-out of donations to related charities.

Read the Full Evaluation Evaluation Summary

Joint Second Prize

$1,000

Eleanor Tsai

Evaluation of “Maternal Cash Transfers for Gender Equity and Child Development: Experimental Evidence from India”
Development Policy

This evaluation takes a large, rigorous RCT seriously and assesses issues decision-makers would care about: measurement, interpretation, stress-testing comparisons, and real-world contextual considerations.

Read the Full Evaluation Evaluation Summary

Joint Second Prize

$1,000

Cannon Cloud

Evaluation of “Global Potential for Natural Regeneration in Deforested Tropical Regions”
Environment Policy

“Annual investment in forests must more than triple from US$84 billion in 2023 to US$300 billion by 2030.” — UN Environment Programme, State of Finance for Forests 2025

Williams et al. (2024, Nature) present a model of the potential for natural regeneration across tropical forested countries, estimating that an area greater than Mexico has potential for natural forest regeneration.

Cannon Cloud’s evaluation showed tremendous effort, follow-through, and specific technical insight. It went well beyond surface-level review, raising critical doubts about the methods and interpretation and offering detailed and actionable recommendations. The evaluation also included a constructive dialogue and collaborative synthesis with the second (anonymous) evaluator.

Some key issues raised:

Temporal data leakage, “such as including predictor variables that are derived from the outcome variable itself,” which “can lead to overly optimistic estimates of model performance and poor generalization to new data”
Reliance on older and superseded Global Forest Change data
Confounding by socioeconomic factors and predictor choice
Neglect of intensive margin regrowth

Cloud is in conversation with the authors and pursuing future work in this area—a testament to the kind of productive, ongoing scholarly dialogue that open evaluation can foster.

Read the Full Evaluation Evaluation Summary

Honorable Mentions

Five additional evaluations received commendations for their quality, rigor, and usefulness. Two of these will be selected by transparent random draw to receive 500 USD prizes.

Some honorable-mention evaluators chose to remain anonymous. We fully respect that choice, and we do not disclose identifying information beyond what appears on the public evaluation pages.

Matthew B. Jané

“Meaningfully Reducing Consumption of Meat and Animal Products Is an Unsolved Problem: A Meta-analysis”

Animal Welfare

The authors aim to address a question of great interest to animal welfare advocates—do “interventions” reduce animal product consumption—via a structured meta-analysis of 41 diverse, and often limited, research studies. Jané offers some praise, as well as substantial clearly articulated critiques and specific suggestions aiming to strengthen the rigor of this work.

Anonymous evaluator

“Universal Basic Income: Short-Term Results from a Long-Term Experiment in Kenya”

Development Economics

A careful, highly-detailed, organized, and decision-relevant evaluation of one of the most prominent long-term anti-poverty experiments. Emphasizing interpretation, external validity (e.g., can this “manna from heaven” experiment be extrapolated to a standardized, tax-funded UBI) and what can (and cannot) be inferred from short-term results within a long-run design. The evaluation provided insightful critiques of causal and statistical inference claims (e.g., regarding the claimed “insignificant” price effects on nearby markets), design and interpretation issues (e.g., “program goodwill bias”), and provided a plausible alternative psychological explanation for the differential impacts on lump-sum versus monthly recipients’ investment and consumption.

Gregory Lewis

“The Returns to Science in the Presence of Technological Risks”

Policy Economics

Exceptional reasoning, writing, quantification, and follow-up work. The author, Matt Clancy, praised his “quite a good critique” and responded in detail, adding new analysis. Although the other evaluators raised similar concerns, Lewis made the clearest case that the paper stacks the deck against concluding in favour of a science slowdown by comparing all the benefits of science to only a subset of the risks—namely the risks emerging from biotechnology. As he notes, this matters given that XPT AI extinction estimates are 38x higher than biocatastrophe estimates. He proposed a constructive “like for like” fix: narrowing the upside to bioscience returns only.

Anonymous evaluator

“The Macroeconomic Impact of Climate Change: Global vs. Local Temperature”

Environment Economics

The authors stated “the feedback was very helpful and we will make sure to take it into account in the next iteration of the paper.” The most recent draft strongly suggests that they did. For example, they addressed concerns of sample size and robustness by incorporating a much larger dataset. The evaluator noted that the impacts seemed to persist over a decade—the authors converted their estimates to consider a permanent 1°C temperature rise, yielding the much larger long-run figure of 22–34% GDP reduction.

Anonymous evaluator

“The Wellbeing Cost-effectiveness of StrongMinds and Friendship Bench”

Global Health Wellbeing

This evaluation was detailed, involved extensive followup, and provided a range of helpful suggestions for methods, domain-relevance (leveraging their own clinical expertise), and communication. Careful and thorough engagement with both the systematic review methodology and the charity-specific cost-effectiveness analysis.

Honorable Mention Lottery

Two of the five honorable mentions will receive an additional $500 prize each, selected through a transparent, verifiable random draw. Anyone can observe and verify the randomization process.

Go to the Lottery Draw Page

      Discussion honorarium: We have set aside an additional $500 to further compensate prize winners for their time participating in any post-prize discussions, seminars, or public communications related to their evaluation work.
    

How Winners Were Selected

Prize recipients were selected through a multi-step process involving evaluation managers, the broader Unjournal team, and advisory board members. The process included internal rating by evaluation managers, discussion of nominated evaluations, and a team-wide vote.

🔎

Depth & Rigor

How deeply did the evaluation engage with the paper's methods, assumptions, and evidence? Did it go beyond surface-level review to provide technical substance?

💡

Constructive Insight

Did the evaluation offer specific, actionable suggestions? Did it identify ways to strengthen the work rather than simply cataloging weaknesses?

🌎

Real-World Relevance

Did the evaluation consider the paper's implications for policy, practice, or global priorities? Did it help readers assess how much to rely on the findings?

📝

Communication

Was the evaluation clearly written and accessible? Could it be useful to researchers, practitioners, and policymakers—not just narrow specialists?

🔁

Follow-through

Did the evaluator engage in iterative dialogue—responding to manager queries, collaborating with co-evaluators, or providing additional analysis when needed?

⚖

Expertise

Did the evaluation demonstrate genuine subject-matter expertise and the ability to assess claims critically within the relevant field?

These criteria align with The Unjournal's evaluator guidelines, which ask evaluators to provide substantive evaluation (resembling a high-quality journal referee report, without the binary accept/reject focus), quantitative ratings across nine dimensions with uncertainty ranges, and considered assessments of each paper's main claims.

Learn more: unjournal.org | Published evaluations | Interactive dashboard | Documentation & policies

Get Involved

Whether you're a researcher, evaluator, funder, or policymaker—there's a role for you in improving how research is assessed.

Become an Evaluator Submit Research Follow Us