RESEARCH

Case Study: Light-Weight Policy Moderators for Indeed Job Postings

2025-05-258 min read

Loading blog content...

Loading page...

Back to blog

RESEARCH

Case Study: Light-Weight Policy Moderators for Indeed Job Postings

2025-05-258 min read

Hero

Overview

Our policy moderation models achieve human performance for job postings at <1% of today’s per-review cost of human moderation. Each policy employs a dedicated, compact model. Our model (F1 0.840) outperforms GPT-4o (F1 0.623), o4-mini (F1 0.837), and o3 (F1 0.835) while also operating faster and costing approximately 87% less.

Figure 1: Model performance over harmful categories, measured in F1 score.

Problem Setup

In this study we train high-performing, policy-specific moderation models using the Indeed dataset on Kaggle, applying an adversarial approach to evaluations. (Note that the public Indeed dataset mainly contains compliant postings, so we supplement it with synthetic data as detailed in the Data section.)

We moderate job advertisements according to 10 specific policies, grouped into two high-level classes: legal and internal. Each policy represents a clearly defined rule published on Indeed’s public Employer Help Centre, Partner Docs or Legal pages. Policies listed below are extracted directly from Indeed web resources.

Policy label	Class	One-sentence scope
Equal-Opportunity / Non-Discrimination	legal	Job postings must not indicate preferences, limitations, or exclusions based on legally protected characteristics such as race, gender, age (40+), disability or religion.
Illegal / Unlawful Work	legal	Listings cannot solicit or facilitate employment involving activities that violate local laws, including drug trafficking, unauthorized gambling, or computer hacking.
Offensive & Harassing Content	internal	Titles, descriptions, and application instructions must remain professional, avoiding vulgarity, hateful speech, or explicit sexual language.
Non-Inclusive Language	internal	Indirect or implicit bias terms – such as age-related descriptors (“young”), gender-specific titles (“handyman”), and lifestyle requirements (“non-smoker”) – are prohibited, except when justified by a legitimate, occupational necessity.
Fraud / Fees / Misrepresentation	internal	Ads must accurately identify the hiring employer, clearly describe the job duties, and never require candidates to make payments or buy products as a prerequisite for consideration.
Non-Job / Commercial Content	internal	Indeed forbids postings unrelated to genuine employment, such as franchise opportunities, paid training courses, or promotional events requiring ticket purchases.
Unpaid / Commission-Only & Sponsorship	internal	Jobs offering solely unpaid or 100% commission-based compensation must clearly disclose these conditions and typically require paid sponsorship for visibility in organic results.
MLM & Pyramid Schemes	internal	Multi-level marketing (MLM), pyramid schemes, or any role that primarily generates earnings through recruiting additional members is disallowed.
Low-Quality Titles / Keyword-Stuffing	internal	Job titles must clearly and succinctly state the position; listings containing clickbait, excessive emoji usage, or keyword repetition will be demoted or removed.
Application Flow Integrity	internal	Candidates must apply through Indeed or an approved applicant tracking system (ATS). Ads must not redirect applicants to unverified channels such as personal emails, WhatsApp, or external Google forms.

Data

We utilize the publicly available “Indeed Job Posting Dataset”, compiled by PromptCloud and DataStock and hosted on Kaggle. The full corpus includes 325,946 job advertisements collected from Indeed.com between August 1 and October 31, 2019. All content appears under a CC-0 (public domain) license, allowing unrestricted redistribution and annotation. For this study, we randomly select a subset of 10,000 data points.

Class imbalance

The dataset is heavily skewed toward compliant postings, which is a common pattern in fraud detection and non-compliant data. Violations of even the most common rule occur in <8% of records, while several policies appear in ≤1% of cases and one shows no positives at all.

Figure 2: Class frequency of the Indeed Kaggle dataset.

To address the imbalance, we created a synthetic violation dataset containing 2,000 posts. We generated each sample using few-shot prompting with OpenAI's o4 (high reasoning) model. Each prompt comprised one randomly selected compliant ad and two curated examples violating the target rule. From these prompts, o4 produced a new, non-compliant post reflecting the tone, length, and domain cues of the provided examples.

Figure 3: Class frequency after augmenting with synthetic data.

Ground-truth labelling procedure

We automatically generate policy labels through structured debates between our strongest available reasoning models. Each debate involves two models: one arguing for compliance of the datapoint and the other arguing against it. After two rounds of argument, a judge evaluates both sides and selects the most persuasive argument. To improve reliability, we repeat this process three times using a self-consistency pipeline and assign the final label based on majority voting.

Our Approach

Figure 4: Workflow of our Guardrail Training Pipeline.

Our moderation system leverages reinforcement learning, domain-specific fine-tuning, and various data science tricks.

We begin with running supervised fine-tuning for each policy classifier on a comprehensive dataset combining real and synthetic examples. This initial step provides the model a robust representation of known compliant and non-compliant cases.

After establishing the base model, we deploy a proprietary red-teaming agent designed specifically to uncover potential vulnerabilities. This attacker, a learned policy network, crafts adversarial examples intended to bypass the moderation classifiers. It utilizes an extensive suite of known attack vectors, both from external research and developed internally by our red-teaming experts.

During training, the attacker and defender evolve jointly under conflicting objectives: the attacker aims to generate challenging examples capable of circumventing moderation rules, while the defender attempts to detect these adversarial attempts accurately. A jury of reasoning models assess the defender's performance on these difficult samples.

Through an iterative process, this adversarial setup systematically identifies and corrects weaknesses within the moderation system. Specific training methodologies are proprietary.

Cost Analysis

Figure 5: Cost-performance coordinates of different moderation methods.

Figure 5 summarizes the trade-off between cost and performance. While reasoning-based models deliver strong F1 scores, their inference costs are several times higher than non-reasoning alternatives. Our approach achieves competitive performance comparable to reasoning models but at significantly lower cost. This cost reduction comes from leveraging domain-specific fine-tuning on a smaller model, and our library of proprietary red-teaming algorithms.

Estimated savings and performance improvements per 100,000 moderation requests:

Compared to o4-mini, we save ~87% of the cost with ~200 fewer errors (FP + FN, trade off is tuneable)
Compared to GPT-4o, we save ~60% of the cost with ~20,500 fewer errors
Compared to human labelers we can operate at <1% the cost

Assuming $20/hr, 2 min per review, it would cost ~$67k for human reviews.
Our inference cost is ~$100 (excluding initial R&D expenses).

Enforce your own custom policies

Have domain-specific rules beyond job postings? We build compact, high-accuracy moderators for any policy set (legal, safety, quality, brand, workflow integrity) and integrate them into your stack. If you want us to enforce custom policies on anything you care about, reach out at info@generalanalysis.com.