Critic Gate v1: how we stopped shipping wrong atoms
Behind the four-step verification chain that runs on every chemistry answer cheemly produces. Why we picked RDKit, OPSIN, and Semantic Scholar — and what we still want to fix.
What is the Critic Gate?
The Critic Gate is a deterministic verification layer that checks every chemistry answer Cheemly produces before it is shown to the user. Unlike the language model that drafts the answer, the gate runs no AI — it is plain, auditable code that either passes a result or sends it back to be rewritten. Its job is to make hallucinated structures and invented citations impossible to ship.
Why a separate gate
Large language models are fluent but unreliable on exact facts: they invent plausible-looking SMILES and cite papers that do not exist. You cannot fix this by asking the model to "be careful." The only durable fix is to verify the output against ground truth with software that has no incentive to please. That separation — a fluent writer plus a strict checker — is the core of Cheemly's design.
The four checks
- SMILES validity: RDKit parses every structure. If it does not parse, it does not ship.
- Atom conservation: For any reaction, reactant atoms must equal product atoms. A reaction that loses or invents atoms is rejected.
- Name round-trip: OPSIN converts IUPAC names to structures and back; a mismatch flags a naming error.
- Citation existence: Every DOI is checked against Semantic Scholar. A paper that cannot be verified is removed.
What happens on failure
A failed check does not produce an error message to the user. Instead the answer is routed back to the writer with the specific failure as feedback, and — if needed — escalated to a stronger model. After a bounded number of retries, anything still failing is surfaced with a low-confidence badge rather than presented as fact. The user never silently receives a wrong structure.
Why this is the moat
Atom conservation is a law of nature, not a style preference. Encoding it as a hard gate means Cheemly's floor is set by chemistry, not by how confident the model sounds. That is the difference between a chemistry tool and a chatbot that talks about chemistry.
Frequently asked questions
- Does the Critic Gate use AI?
- No. The Critic Gate is deterministic code — RDKit parsing, atom-balance arithmetic, OPSIN round-trips, and citation lookups. It has no language model and no randomness, so its verdicts are reproducible and auditable.
- What happens when an answer fails the Critic Gate?
- The answer is sent back to the writer with the specific failure as feedback and may be escalated to a stronger model. After a bounded number of retries, anything still failing is shown with a low-confidence badge instead of being presented as fact.
- Why can’t ChatGPT just do this?
- General LLMs generate chemistry probabilistically and have no built-in ground-truth check, so they ship plausible but wrong structures and citations. A deterministic verification gate is an architectural choice, not a prompt you can add.