Greenwashing Model Lab
Decide which model to bet on for the Green Claims pipeline. Paste an image description, video transcript, or marketing claim and compare detection results, per-claim categorization, cost, and latency across models.
Phase 1 (today): ClimateBERT live. Phase 2: Anthropic / OpenAI / Google adapters producing the same structured claim-report shape so they can be judged on the real task.
Target report format
What the customer-facing deliverable will look like once we’ve picked a model and wired evidence matching. The Model Lab below helps us decide which model produces the best Claim + Risk columns. The Evidence column is future work (PIM / LCA / certification lookup).
| Claim | Evidence | Result | Why |
|---|---|---|---|
| "eco-friendly" | none | high risk | Generic claim, no substantiation → ECGT bans. |
| "50% recycled plastic" | exact match | low risk | Specific & measurable, matched against product data. |
| "climate neutral" | offset doc only | high risk | Offsets alone don't satisfy neutrality under ECGT. |
| "recyclable" | unclear region | medium risk | Recyclable where? Depends on real-world collection. |
Source: internal Green Claims deep-dive (March 2026). Aligns with EU Directive 2024/825 (ECGT), enforced 27 September 2026.
Monthly cost projection
Assuming 600 input + 150 output tokens per classification. LLM rates and VM baseline as of April 2026.
ClimateBERT is priced as a self-hosted AWS m5.large (~$70/mo, 2 dedicated vCPU, 8 GB, non-burstable). Burstable instances like t3.medium are not safe for sustained ML inference because CPU credits deplete. With vanilla transformers this instance sustains 3–6 req/s — ~30× headroom at 300k/month. Cost is fixed and does not scale with volume until you outgrow the instance. Cheaper options exist: Hetzner CCX13 (2 dedicated vCPU, 8 GB) is ~$14/mo, and ONNX Runtime + int8 quantization can cut the VM requirement in half. LLM providers are priced per token, so their cost scales linearly with volume and will dominate at scale.
ClimateBERT scoring weights
Live aggregation of the six-head output. Adjust the weights to see how each signal contributes to the score. No re-inference needed.
System prompt (used by LLM providers in Phase 2)
Edits apply to the next run. ClimateBERT ignores this field (no prompt).