DemoUp Cliplister

Greenwashing Models

Greenwashing Model Lab

Decide which model to bet on for the Green Claims pipeline. Paste an image description, video transcript, or marketing claim and compare detection results, per-claim categorization, cost, and latency across models.

Phase 1 (today): ClimateBERT live. Phase 2: Anthropic / OpenAI / Google adapters producing the same structured claim-report shape so they can be judged on the real task.

Target report format

What the customer-facing deliverable will look like once we’ve picked a model and wired evidence matching. The Model Lab below helps us decide which model produces the best Claim + Risk columns. The Evidence column is future work (PIM / LCA / certification lookup).

ClaimEvidenceResultWhy
"eco-friendly"nonehigh riskGeneric claim, no substantiation → ECGT bans.
"50% recycled plastic"exact matchlow riskSpecific & measurable, matched against product data.
"climate neutral"offset doc onlyhigh riskOffsets alone don't satisfy neutrality under ECGT.
"recyclable"unclear regionmedium riskRecyclable where? Depends on real-world collection.

Source: internal Green Claims deep-dive (March 2026). Aligns with EU Directive 2024/825 (ECGT), enforced 27 September 2026.

Monthly cost projection

Assuming 600 input + 150 output tokens per classification. LLM rates and VM baseline as of April 2026.

Google Gemini
Gemini 2.5 Flash-Lite — stable, cheapest ($0.10 / $0.40 per 1M)
$36/mo
ClimateBERTFixed
distilroberta-base-climate-f + 5 classification heads
Self-hosted on AWS m5.large (2 dedicated vCPU, 8 GB, non-burstable). Sustains 3-6 req/s.
$70/mo
OpenAI GPT
GPT-5.4 nano — cheapest ($0.20 / $1.25 per 1M)
$92/mo
Google Gemini
Gemini 3.1 Flash-Lite Preview — latest gen ($0.25 / $1.50 per 1M)
$113/mo
OpenAI GPT
GPT-5.4 mini — balanced ($0.75 / $4.50 per 1M)
$338/mo
Anthropic Claude
Haiku 4.5 — fast, cheapest ($1 / $5 per 1M)
$405/mo
Anthropic Claude
Sonnet 4.6 — balanced ($3 / $15 per 1M)
$1.2k/mo

ClimateBERT is priced as a self-hosted AWS m5.large (~$70/mo, 2 dedicated vCPU, 8 GB, non-burstable). Burstable instances like t3.medium are not safe for sustained ML inference because CPU credits deplete. With vanilla transformers this instance sustains 3–6 req/s — ~30× headroom at 300k/month. Cost is fixed and does not scale with volume until you outgrow the instance. Cheaper options exist: Hetzner CCX13 (2 dedicated vCPU, 8 GB) is ~$14/mo, and ONNX Runtime + int8 quantization can cut the VM requirement in half. LLM providers are priced per token, so their cost scales linearly with volume and will dominate at scale.

ClimateBERT
Six DistilRoBERTa heads trained on corporate sustainability disclosures.
Known limitations. ClimateBERT was trained on corporate CDP disclosures, so vague ad copy (“Mother Earth”, “eco-conscious”) and offset-based neutrality claims are out of distribution — it tends to score them as MODERATE instead of HIGH. It also cannot extract individual claims or categorize them against the 7-type PDF taxonomy. Use it as a fast detector, not as a report generator. Phase 2 LLMs should close both gaps.
Anthropic ClaudePhase 2
Claude 4.x family — latest generation.
OpenAI GPTPhase 2
GPT-5.4 value tier — flagship excluded.
Google GeminiPhase 2
Gemini 2.5 stable and 3.1 preview, Flash-Lite tier only.
ClimateBERT scoring weights

Live aggregation of the six-head output. Adjust the weights to see how each signal contributes to the score. No re-inference needed.

System prompt (used by LLM providers in Phase 2)

Edits apply to the next run. ClimateBERT ignores this field (no prompt).