Vibe check
Also known as: Vibe checking, Vibe-based evaluation
Definition
Eyeballing a handful of AI outputs and concluding the model 'seems fine.' The dominant industry practice for AI quality assessment — and the practice that systematic evaluation replaces.
Vibe checking is not a term Lovelaice coined, but Lovelaice owns the framing of why it's the central methodological problem in shipping AI. A vibe check is the unstructured, intuition-based review of a small number of AI outputs that ends in a verdict like 'looks good.' It is what teams reach for when nobody has yet documented what 'good' actually means for the product, the user, and the domain. It is also the implicit methodology behind a generic LLM-as-judge prompted with 'is this helpful?' — automation that codifies a vibe, not a standard.
Origin
The framing behind the term
The term predates Lovelaice; the framing — that vibe checks are the diagnostic of an evaluation system that hasn't earned the right to automate yet — is Lovelaice's.
Why it matters
- It doesn't scale. Reviewing 5 outputs by gut feel can't tell you what happens at 5,000.
- It's inconsistent across team members. Two PMs reviewing the same response disagree more often than they agree.
- It misses edge cases entirely. Vibe checks happen on happy paths; failures live in the long tail.
- It produces false confidence. A generic LLM-as-judge running on vibe criteria achieves only 60-70% agreement with human evaluators — barely better than random for anything beyond surface formatting.
Industry prevalence
In roughly 90% of teams shipping AI features, quality evaluation is manual and vibe-based — no structured criteria, no systematic testing, no grounding in actually observed failures.
What replaces it
Structured evaluation: deterministic checks for measurable criteria (format, length, allowed values, required fields) and a surgical, validated LLM-as-judge for the subjective dimensions that genuinely require human-like judgment. See the Evaluation Ladder.
Related terms
Source
Developed in Why Your AI Evaluation Is Lying to You.