Expert-in-the-Loop: Strategies for Scaling the World's Best Human Knowledge

Sep 13

As AI grows more sophisticated, we need more sophisticated data to improve it.

Already today, we’re seeing declining demand for ‘sweatshop data’ and an increased need for data from skilled experts. Code samples from senior engineers, expert-annotated medical cases showing diagnostic thinking, and curated creative writing samples are now more valuable than bulk scraped content.

As this trend continues, it increasingly demands higher and higher levels of human expertise, creating a fundamental bottleneck where ever-more-specialized knowledge—not computing power or raw data—becomes the scarcest resource constraining progress.

We’ve found that this is especially true for subjective domains, such as news or sensitive topics, where careful, nuanced reasoning is required.

Unlike more technical fields where expertise can be more easily validated, these areas require human judgment calls about bias, context, cultural sensitivity, and ethical implications—the kind of sophisticated reasoning that should only come from seasoned domain specialists who understand not just what to think, but how to think responsibly about complex, contested issues.

For example, we've found that labeling for bias requires a nuanced and constantly evolving understanding of the political landscape. Within the conservative movement alone, there are diverse opinions on many topics, and understanding how to label data for these varying perspectives requires deep expertise in both politics and the specific subject matter.

The path forward: scaling expertise.

As AI demands higher and higher levels of human expertise, we need to get really good at scaling experts. Unlike today's approach of hiring thousands of people for millions of hours, highly skilled experts are scarce, expensive, and busy.

At Remark, we work with some of the world's most reputable experts to create data for the most challenging, sensitive topics. Throughout this work, we've identified several tactics for effectively scaling expert involvement.

Tactic 1 - Scenario Selection: Identifying the Optimal Scenarios for Generalization

We've found that AI can effectively scale expertise to adjacent scenarios when those scenarios are carefully chosen.

For example, in building our expert-in-the-loop source labeling system, we created a hierarchy of news topics—from broad verticals like geopolitics → subverticals like global conflicts → specific narratives like the Russia-Ukraine conflict → individual stories. We also mapped out the different label types such as bias, missing context, editorial significance, and source credibility.

This mapping of our domain allows us to systematically test where expert input generalizes well versus where it requires more specificity.

Through evals and qualitative checks, we discovered that expert input generalizes differently across levels. For example, bias judgments work well at the subvertical level, while editorial significance requires narrative-level specificity. This allows us to strategically optimize expert usage—we can gather bias labels more broadly at the subvertical level with fewer experts, but focus greater expert attention on editorial significance at more granular levels.

Over time, this structure enables surgical deployment of human expertise while leveraging AI generalization wherever possible.

Tactic 2 - Expert Selection: Surgically Involving the Right Experts at the Right Time

The second tactic is focused on using the right experts for the right scenarios.

Starting with our domain structures from above, we can map experts to different specialties based on where they uniquely demonstrate the greatest accuracy and knowledge. For instance, if one expert excels at labeling editorial significance for global conflict sources, we focus their efforts there rather than areas where they're less effective.

But surgical involvement goes further than basic mapping. We build detailed internal profiles of our experts including work history, published topics, life experiences, and affiliations. For example, when we're looking to gather insights from experts to generate net new content as part of our retrieval or training data offerings, we use LLMs to search through these profiles and identify where each expert can provide uniquely valuable input.

Interestingly, this approach has allowed us to leverage experts in more ways than we would have expected. While we might map an expert to editorial significance for global conflicts based on their apparent expertise, our detailed profiles revealed we could also leverage Economics experts to analyze trade implications in geopolitical stories like the Trump/Putin meeting.

Tactic 3 - Rubrics for Everything

The third tactic involves creating detailed rubrics that capture expert reasoning and enable AI systems to generalize expert judgment to certain specific scenarios. Expert-established rubrics can be an effective tool for instructing AI to think like domain specialists in targeted contexts.

For example, rubrics for evaluating an AI System’s output can scale LLM-as-a-judge evaluations, allowing AI to apply expert-level judgment across thousands of examples rather than requiring individual expert review. We work with experts to create rubrics for specific verticals and subverticals and then use that to inform an LLM-as-a-judge for scaled evals.

This approach is also emerging in training methodologies, where expert rubrics function as sophisticated reward models that guide AI behavior during fine-tuning.

The future of AI progress increasingly depends on our ability to capture and scale the world's best human expertise. As models become more sophisticated, the bottleneck shifts from raw computational power to accessing the nuanced judgment and specialized knowledge that only true experts possess.

The tactics we've outlined—strategic scenario selection, surgical expert involvement, and comprehensive rubric development—are probably just the beginning of what's possible. At Remark, we're continuing to refine these approaches as we work with leading experts across domains.

If you're building AI systems that need expert-level training data, or if you're interested in implementing these expert-scaling tactics in your own work, we'd love to hear from you. Reach out to discuss how we can help unlock the expertise your AI systems need to reach their full potential.

Robbie Goldfarb

Robbie Goldfarb is an entrepreneur and technology leader known for working on some of the industry's most consequential challenges at the scale of billions of users.

As the founding partner at Revise Ventures and cofounder of Forum AI, he's focused on ensuring the next generation of technology and AI systems cement human values at their foundation. Prior to his work with Forum AI, Robbie led teams combating misinformation at Facebook, developing systems that helped safeguard integrity during the COVID-19 pandemic, multiple elections, and other critical moments.

He worked on youth safety at Instagram, helping build out the platform's infrastructure to curb online hate, and later developed first-of-its-kind AI safety measures at Meta's frontier lab. Earlier in his career, Robbie helped scale startups from zero to one, with a focus on affordable education and news.