Week 3 AI Showdown: Which Platform Wrote the Best Prompt Post?
-
Topic: Teaching AI Your Brand Voice in Five Examples
Week: Week 4
Rubric version: v2.0
Platforms compared: ChatGPT, Gemini, Claude
Winner: Statistical Tie — ChatGPT (79.8) and Claude (79.5)
Third place: Gemini (65.8 / 100)
Margin: 0.3 points (within 3.0-point tie threshold)
Tags: ai-comparison, prompt-engineering, chatgpt-vs-claude-vs-gemini, weekly-showdown, ai-quality, rubric, brand-voice, few-shot-prompting, week-4
Categories: AI Comparison, Prompt Engineering
Estimated reading time: 8-10 minutes
SEO title: Week 4 AI Showdown: ChatGPT vs Claude vs Gemini on Brand Voice Prompts
SEO description: ChatGPT and Claude tie in our 7-dimension scoring of brand voice prompts. See the evidence, the scores, and what each platform does best.
Every week, Ketelsen.ai runs the same prompt topic through three of the biggest AI platforms on the planet — ChatGPT, Gemini, and Claude — and publishes all three results side by side. Same topic, same template, same rules. The only variable is the AI doing the thinking. This is Week 4, and the topic is one that every entrepreneur and content creator has struggled with: getting AI to actually sound like your brand instead of producing polished but generic copy. All three platforms took their shot at teaching readers how to use few-shot prompting to transfer brand voice into AI-generated content — and for the first time in this series, the race ended in a dead heat at the top.
The Topic: Teaching AI Your Brand Voice in Five Examples
This week's topic asked each AI platform to produce a complete blog post featuring three prompt variations — Beginner, Intermediate, and Advanced — all built around the same idea: using few-shot prompting to teach AI your brand voice by showing it real examples of your writing. The topic matters because most AI-generated content fails not on accuracy but on voice — it sounds competent but anonymous, and few-shot prompting is the most reliable fix available today.
How We Score: The 7-Dimension Quality Rubric
We do not pick winners by gut feeling. Every comparison post on Ketelsen.ai uses a structured rubric that scores each platform across seven dimensions, each weighted by how much it matters to you — the reader. The dimensions are not abstract quality labels; they measure whether the post actually does its job. Can you copy the prompt and use it today? Does the breakdown teach you something transferable? Are the industry examples specific enough that someone in that field would recognize their own problem? Those are the questions behind the numbers.
Each dimension is scored on a 1-to-10 scale with specific anchor descriptions at every level, so a "7" means the same thing whether we are scoring ChatGPT or Claude. The dimension scores are then weighted by importance and normalized to a 0-to-100 overall score. If two platforms land within 3 points of each other, we call it a statistical tie and explain the trade-offs instead of forcing a winner. The rubric is version 2.0 — it will continue to evolve as we discover new ways to measure what makes one AI post genuinely better than another.
| Dimension | Weight | What It Measures |
|---|---|---|
| D1: Prompt Quality | 20% | Are the prompts well-engineered, genuinely usable, and clearly differentiated across difficulty levels? |
| D2: Breakdown Clarity | 15% | Does the prompt breakdown teach you WHY each part matters, not just what it says? |
| D3: Industry Examples | 15% | Are the practical examples specific enough that a professional in that field would say "that is exactly my problem"? |
| D4: Writing Quality | 15% | Does the writing match the Ketelsen.ai voice — fun, informative, accessible, and publication-ready? |
| D5: Creative Use Cases | 10% | Do the suggested use cases go beyond the obvious — would the reader think "I never would have considered that"? |
| D6: Actionability | 15% | Can the reader immediately USE what they read — tools, checklists, workflows they walk away with? |
| D7: Completeness | 10% | Does the post cover all template sections fully without filler or thin spots? |
Platform-by-Platform Breakdown
ChatGPT: 79.8 / 100
Strengths
ChatGPT delivered the most detailed post of the three at 100,431 characters — nearly double Gemini's output. The standout is in prompt breakdown clarity, where ChatGPT leads all three platforms. Every breakdown segment follows a consistent and highly effective pattern: quote the specific line, explain what it does, describe what goes wrong if you remove it, and close with a transferable principle the reader can apply to any future prompt. The result reads like a prompt engineering course embedded inside a blog post. One representative example: "Transferable principle: when style matters, show the pattern instead of describing it abstractly." That kind of line teaches a mental model, not just a technique.
The creative use cases are the strongest of the three platforms. ChatGPT's memorial and tribute writing suggestion — using few-shot prompting to help someone draft a remembrance in the voice of a loved one — is the single most emotionally resonant and unexpected application across all nine variations this week. The post also includes the most complete template adherence: 12 industry examples, 4 per variation, each with exact input format, expected output description, and why-it-matters reasoning. Nothing feels like filler.
Weaknesses
The writing voice is clean and professional but slightly less distinctive than Claude's elegance or Gemini's energy. ChatGPT reads like a capable consultant's memo — competent, thorough, trustworthy — but it does not have the personality signature that would make a reader say "that sounds like Ketelsen.ai" without seeing the logo. The Beginner variation's prompt, while well-engineered, is relatively long for a true beginner and might benefit from a shorter quick-start option.
Signature Move
ChatGPT treats prompt breakdown as a teaching discipline — every segment ends with a transferable principle that makes the reader a better prompt engineer, not just a better user of this one prompt.
Claude: 79.5 / 100
Strengths
Claude delivered the most sophisticated prompt engineering of the three platforms. The Advanced variation implements a six-step workflow — source analysis, voice modeling, assignment fit check, content generation, and editorial review with a quantified voice match score — that turns brand voice replication into a repeatable professional process rather than a one-shot experiment. The self-correction loop in the pro tips section, where readers paste the AI's draft back in and ask it to compare against the original examples, is the single most actionable technique across all three posts. Claude also leads in industry example quality: the independent coffee roaster scenario, with its specific voice description of "warm, nerdy about beans, slightly irreverent" and sample line, is vivid enough that a small business owner would immediately see their own brand in the example.
The writing voice is the most refined of the three — elegant, confident, and distinctly publication-ready. Claude opens with a sharp problem statement and sustains that energy without filler across 87,993 characters. The difficulty progression from Beginner to Advanced feels like a genuine escalation in capability, not just longer prompts with more words.
Weaknesses
The creative use cases are the least surprising of the three platforms. Journal consistency, volunteer newsletters, ghostwriter reference material, and onboarding documentation are all logical applications of brand voice prompting, but none of them make the reader rethink what the prompt could do. Where ChatGPT's memorial writing suggestion reframes the emotional potential of few-shot prompting, Claude's suggestions stay within the expected professional orbit.
Signature Move
Claude treats brand voice replication as a multi-stage engineering discipline — and the self-auditing workflow with quantified scoring makes the Advanced variation feel like a professional methodology, not just a prompt.
Gemini: 65.8 / 100
Strengths
Gemini brings the most distinctive personality of the three platforms. The writing opens with high energy and an immediate emotional hook, and sustains that conversational momentum throughout. The brand voice is the most fun to read — Gemini writes like someone who genuinely enjoys explaining things, not someone completing a template. The creative use cases include the most contemporary and personality-driven suggestion of the week: training AI to write a dating app bio by feeding it messages from a group chat, then asking for "a witty 50-word profile that captures your exact sense of humor." That is the kind of use case that makes a non-technical reader laugh and immediately want to try it.
Gemini also leads in writing quality alongside Claude, with a voice that feels naturally engaging rather than carefully constructed.
Weaknesses
At 62,021 characters, Gemini's post is the shortest by a significant margin — roughly 60% of Claude's and 62% of ChatGPT's. That size difference shows up in the scoring. The prompt breakdowns are the weakest of the three: they identify what each part of the prompt does but rarely explain what goes wrong if a part is removed or teach a transferable principle the reader can apply elsewhere. The difficulty progression across variations is harder to distinguish — the engineering escalation from Beginner to Advanced is not as structurally visible as ChatGPT's or Claude's. The overall impression is of a post with genuinely engaging writing and creative ideas that does not develop them with the same depth as the other two platforms.
Signature Move
Gemini has the best instinct for making AI concepts feel approachable and fun — if depth matched personality, this would have been a three-way tie.
The Verdict
For the first time in this series, we have a statistical tie. ChatGPT finishes at 79.8 and Claude at 79.5 — a margin of 0.3 points on a 100-point scale, well within the 3-point tie threshold. These two platforms arrived at the same overall quality from different directions. ChatGPT wins on breakdown clarity (8.5 vs 7.5), creative use cases (8.0 vs 6.5), and template completeness (8.5 vs 8.0). Claude wins on prompt engineering depth (8.5 vs 8.0), industry example quality (8.0 vs 7.5), and actionability (8.5 vs 8.0). Neither platform is clearly better — they are differently excellent, and which one serves you best depends on what you value most.
Gemini finishes third at 65.8, held back primarily by shallow prompt breakdowns and less visible difficulty progression across variations. But it would be a mistake to dismiss the Gemini version — it has the most engaging writing voice and the most contemporary creative suggestions of the three. If you want to be entertained while learning, Gemini is the place to start. Each platform brought something the others did not, and this week's results suggest that the gap between top-tier AI content platforms is narrowing.
What This Means for You
If you want the deepest education on how prompts work and a transferable framework for thinking about any future prompt you write, start with the ChatGPT version — its breakdown sections are a masterclass in prompt reasoning. If you want the most sophisticated prompt engineering and a self-auditing workflow you can use as a repeatable professional process, go with Claude — especially the Advanced variation. If you want the fastest, most enjoyable read and creative inspiration for unexpected applications of brand voice prompting, Gemini is your entry point. All three posts are published on Ketelsen.ai — read the one that matches your learning style, or read all three and decide for yourself which platform sounds most like your brand.
Score Summary
| Dimension | Weight | ChatGPT | Claude | Gemini |
|---|---|---|---|---|
| D1: Prompt Quality | 20% | 8.0 | 8.5 | 6.0 |
| D2: Breakdown Clarity | 15% | 8.5 | 7.5 | 5.0 |
| D3: Industry Examples | 15% | 7.5 | 8.0 | 6.0 |
| D4: Writing Quality | 15% | 7.5 | 8.0 | 8.0 |
| D5: Creative Use Cases | 10% | 8.0 | 6.5 | 7.0 |
| D6: Actionability | 15% | 8.0 | 8.5 | 7.5 |
| D7: Completeness | 10% | 8.5 | 8.0 | 7.0 |
| OVERALL SCORE (0-100) | 79.8 | 79.5 | 65.8 | |
Source: Rubric scoring data (v2.0, 1-10 scale, weighted by dimension importance)
Visual Comparison
ChatGPT
Claude
Gemini
Methodology Note
This comparison uses Rubric v2.0 — the second version of the Ketelsen.ai cross-platform scoring system, featuring a 1-to-10 scale (anchors at 2/4/6/8/10, odd scores for intermediate quality), a pre-scoring calibration step that reads all three posts side by side before assigning any scores, and a statistical tie threshold of 3 points. The seven dimensions and their weights reflect our best current judgment about what makes a prompt-focused blog post genuinely useful to a reader. We expect the rubric to continue evolving. If you have opinions about what should be measured differently, we want to hear them — the whole point of publishing the methodology is to make it better.
Every score in this post is backed by specific evidence cited from the original blog posts. All three Week 4 posts are published on Ketelsen.ai so you can read them yourself, apply your own criteria, and decide whether you agree with our call. Transparency is the point. If the rubric is good, the verdict should be obvious to anyone who reads all three posts. If it is not obvious, the rubric needs work — and we will fix it.