Week 4 AI Showdown: Which Platform Wrote the Best Prompt?

  • Topic: Should I Buy a Car Right Now?

    Week: Week 1

    Rubric version: v1.0

    Platforms compared: ChatGPT, Gemini, Claude

    Winner: Claude (98.0 / 100)

    Runner-up: ChatGPT (85.0 / 100)

    Third place: Gemini (62.0 / 100)

    Margin of victory: 13.0 points

    Tags: ai-comparison, prompt-engineering, chatgpt-vs-claude-vs-gemini, weekly-showdown, ai-quality, rubric, week-1, car-buying, should-i-buy-a-car

    Categories: AI Comparison, Prompt Engineering

    Estimated reading time: 12 minutes

    SEO title: Week 1 AI Showdown: Claude vs. ChatGPT vs. Gemini — Who Wrote the Best Car-Buying Prompt Post?

    SEO description: We gave ChatGPT, Claude, and Gemini the same car-buying prompt topic and scored them across 7 dimensions. See which AI wrote the most useful, detailed, and actionable blog post.

Which Platform Wrote the Best Car-Buying Prompt Post?

We gave ChatGPT, Claude, and Gemini the exact same brief: create a practical, multi-variation blog post teaching people how to use AI to decide whether they should buy a car right now. All three platforms received identical prompts, identical context, and identical quality expectations. Then we scored each post against a rigorous 7-dimension rubric designed to measure prompt quality, teaching depth, real-world examples, writing quality, creativity, actionability, and completeness. The results? A clear winner emerged — and some surprising lessons about how differently each AI approaches the same creative challenge.

The Topic: Should I Buy a Car Right Now?

Car-buying decisions are complex: financial readiness, insurance costs, depreciation, repair risk, lifestyle changes, and market timing all collide at once. We asked each platform to create a blog post with three prompt variations (beginner, intermediate, advanced) that teach non-technical professionals how to use AI assistants to work through this decision systematically. The winner would be the post that produced the clearest, most transferable, most actionable prompts — not just answers, but a framework readers could adapt to any major decision.

How We Score: The 7-Dimension Quality Rubric

We built a seven-dimension rubric to measure what makes a prompt post genuinely useful. Each dimension targets a specific aspect of quality: the prompts themselves, the teaching value of explaining how they work, the depth of industry examples, the quality of writing, the originality of creative use cases, the actionability of adaptation tips and follow-up prompts, and finally whether all required template sections are present and complete. Each dimension is scored on a 1-5 scale and weighted according to its importance for readers trying to learn and apply prompt engineering.

Our methodology prioritizes evidence over intuition. Every score reflects specific observations from the actual posts — quoted examples, structural analysis, and comparison of what each platform delivered. The rubric itself is versioned (v1.0) and will be refined in future weeks based on what we learn.

Dimension Weight What It Measures
D1 Prompts 20% Quality, clarity, structure, and usability of the actual AI prompts
D2 Breakdown 15% How well prompt breakdowns teach transferable principles
D3 Examples 15% Depth, specificity, and variety of practical industry examples
D4 Writing 15% Quality of prose, tone, engagement, and reader experience
D5 Creative 10% Innovation and range of creative use case ideas
D6 Action 15% Actionable value: pro tips, adaptability tips, follow-up prompts
D7 Complete 10% Completeness of all required template sections

Platform-by-Platform Breakdown

Claude: 98.0 / 100

Strengths

Claude's post stands out immediately for the sophistication of its prompt engineering. The beginner variation alone includes a comprehensive input scaffolding with 14 distinct data points, each chosen to prevent a specific type of bad financial advice. This isn't just a conversational prompt; it's an architecture. The intermediate prompt specifies three numbered deliverables, a transparency requirement, and a sanity-check instruction — the kind of structural precision that transforms a chatbot into a genuine decision tool rather than a casual advisory service.

The prompt breakdowns are exceptional teaching documents. Every breakdown segment ends with a labeled "Transferable principle" that extracts the deeper concept. For example, Claude's breakdown of the line "Do not assume I can afford a car just because I want one" explicitly names bias correction as a prompt engineering technique — a teaching insight neither ChatGPT nor Gemini surfaced. Readers don't just learn what the prompts do; they learn why the technique matters and how to apply it to any decision-making prompt they write in the future.

The industry examples are extraordinarily detailed and reveal a sophisticated understanding of financial constraints. The ER nurse example includes exact dollar amounts ($5,900 take-home, $3,200 in repairs, $8,500 savings), calculates the 10% payment ceiling ($590), flags that a 20% down payment would leave only $2,800 in reserves, and recommends a specific alternative (certified pre-owned in the $22,000–$25,000 range). The real estate agent example models seasonal income variability ($4,200 low vs. $12,000 peak), demonstrating that the payment consumes 17.6% of take-home during slow months. These aren't generic examples; they're miniature financial plans.

Creative use cases are the most innovative across all platforms. Claude identified 8 use cases for the beginner variation alone, including salary negotiation (quantifying commute cost), lease-end decision framework, insurance claim vehicle replacement, and gap year vehicle decisions — applications that require genuine lateral thinking and that no other platform considered. The post demonstrates that the prompt's value extends far beyond the original use case.

The adaptability section is detailed and practical. Claude provided before/after examples showing exactly how to modify prompts for EV buyers, high-cost-of-living markets, negative equity trade-ins, side hustle vehicles, and non-US buyers — with specific modification text the reader can paste directly. At 99KB, Claude's post is the most comprehensive by a significant margin, reflecting depth across every dimension.

Weaknesses

Length is a double-edged sword. At 99KB, the post requires significant reading commitment, which could reduce completion rates for casual readers who simply want "the prompt" without the tutorial apparatus. Some sections, while individually excellent, could benefit from tighter editing. The density of detail, while a strength for serious readers, may feel overwhelming to someone looking for a quick reference.

The advanced variation's four-deliverable structure, while intellectually impressive, may feel intimidating even to the advanced audience it targets. Some readers might read the prompt and think, "That's too complex for what I need," which could limit the post's practical reach.

Signature Move

Claude treats every prompt like a financial instrument — engineering precision and risk controls into the language itself, building not just prompts but complete decision architectures.


ChatGPT: 85.0 / 100

Strengths

ChatGPT's post is the most complete in terms of required template sections. Every section from the blog post template is present and populated with substance. The practical examples are strong and specific: the tech startup PM, freelance consultant, small retail business owner, and dual-income family scenarios all include concrete financial details and realistic outcomes. The dual-income family example (combined $9,800/month, 2015 Accord with 168K miles, totaled Outback, $14,500 insurance payout) is particularly well-constructed because it mirrors real family conversations.

The teaching quality in breakdowns is solid and accessible. ChatGPT identifies concepts like "anti-hallucination engine" (for the follow-up question instruction) and "bias correction" (for the "do not assume I can afford" line). The breakdowns are shorter than Claude's but still educational. The post includes three well-designed SVG charts that effectively visualize payment vs. all-in cost, 5-year total cost of ownership breakdown, and credit tier impact on financing — visual aids that help readers understand the concepts without reading dense text.

ChatGPT demonstrated attention to source accuracy. The post corrected the J.D. Power cited insurance figure from $772 to $758, showing that someone reviewed citations for accuracy. The overall post reads like a polished magazine article — every section flows naturally into the next, and the tone balances expertise with accessibility.

Weaknesses

The prompts, while solid, lack the structural sophistication of Claude's. The beginner prompt uses an open-ended conversational approach ("ask me follow-up questions one at a time") rather than Claude's structured input scaffolding, which means the AI interaction is less predictable and the results more variable. ChatGPT's approach works, but it hands more responsibility to the AI to infer what you need.

Prompt breakdowns teach well but don't always extract the deeper principle that makes the technique transferable. ChatGPT tends to explain what each line does within the context of car buying rather than teaching why the technique matters for any prompt. The creative use cases are good but more conventional — couples' decision framework and repair-vs-replace are useful but represent ideas a reader might generate independently.

Signature Move

ChatGPT is the best balancer — every section is solid, no section is weak, and the overall reading experience feels like a magazine article from a trusted publication.


Gemini: 62.0 / 100

Strengths

Gemini's beginner prompt includes an interesting constraint: the "strict, fiduciary financial advisor" role-setting creates a specific behavioral expectation that shapes how the AI responds. This is a clean technique for anchoring the AI's persona. The "exactly 3 follow-up questions" instruction is also practical — it prevents the AI from asking an unpredictable number of questions, which directly improves usability. The industry examples are competent and cover a good range (tech startup, retail bakery, freelance consultant, hospital worker). Gemini also attempted a decision tree SVG chart that neither other platform created, showing creative thinking about visual presentation.

Weaknesses

Prompts are notably shorter and less detailed than both competitors. The beginner prompt has fewer input fields and less structural scaffolding, which means the AI has less to work with and results will be more variable. Prompt breakdowns are the most concise of the three platforms, offering principle labels but less depth in explanation. Where Claude writes 4–5 sentences per breakdown segment, Gemini typically writes 2–3. The post used "NOT APPLICABLE" for some citation fields, which reduces credibility and reference value.

At 51KB, the post is roughly half the size of Claude's and 60% of ChatGPT's, reflecting less depth across all sections. The adaptability section is brief — a single paragraph about adapting the framework to other decisions, without the specific prompt modification examples that Claude and ChatGPT provide. Creative use cases are fewer and less innovative. The post delivers core value efficiently, which could appeal to time-pressed readers, but loses points for depth and teaching impact.

Signature Move

Gemini is the most efficient communicator — it delivers the core value in less space, which works for readers who want the prompt without the tutorial.


The Verdict

Claude wins with 98.0 / 100 — a 13-point margin over the runner-up. Claude's victory is decisive across nearly every dimension. It scored 5 out of 5 on D1 (Prompts), D2 (Breakdown), D3 (Examples), D4 (Writing), D5 (Creative), and D6 (Action) — a perfect sweep on six of the seven dimensions. The only imperfection is D7 (Completeness) at 4/5, a minor mark for length management that doesn't undermine the overall excellence. The post demonstrates that sophisticated prompt engineering isn't about complexity for its own sake; it's about precision, teaching transferable principles, and giving readers the confidence to adapt prompts to their own decisions.

That said, ChatGPT's 85.0 score is respectable and reflects genuine strengths. The post is the most professionally polished, includes verified citations, and delivers charts that help visual learners. For readers who prefer a magazine-article experience, ChatGPT's post may feel more accessible. Gemini's 62.0 score reflects solid fundamentals but less depth — it's a credible introduction to the topic, not a comprehensive resource.

What This Means for You

If you're learning to write prompts that produce reliable, consistent results, Claude's post is the gold standard to study. Pay special attention to how Claude uses input scaffolding — the practice of asking the AI to collect specific information before answering — and how each breakdowns ends with a transferable principle you can apply to any prompt. If you prefer a faster read that covers the essentials without diving deep into prompt architecture, ChatGPT's post delivers solid practical value in a polished format. If you're short on time and want just the core prompts without the tutorial, Gemini's efficiency-focused approach gets you to the resource quickly. All three posts are published on Ketelsen.ai.


Score Summary

Dimension Weight Claude ChatGPT Gemini
D1 Prompts 20% 5 4 3
D2 Breakdown 15% 5 4 3
D3 Examples 15% 5 5 3
D4 Writing 15% 5 4 3
D5 Creative 10% 5 4 3
D6 Action 15% 5 4 3
D7 Complete 10% 4 5 4
OVERALL SCORE (0-100) 98.0 85.0 62.0

Visual Comparison

Claude

D1 Prompts
5
D2 Breakdown
5
D3 Examples
5
D4 Writing
5
D5 Creative
5
D6 Action
5
D7 Complete
4

ChatGPT

D1 Prompts
4
D2 Breakdown
4
D3 Examples
5
D4 Writing
4
D5 Creative
4
D6 Action
4
D7 Complete
5

Gemini

D1 Prompts
3
D2 Breakdown
3
D3 Examples
3
D4 Writing
3
D5 Creative
3
D6 Action
3
D7 Complete
4

The Prompts Behind the Posts

All three platforms received the exact same prompts in the exact same order across four phases. This standardization ensures the comparison measures AI quality, not different instructions. Phase 5 (visual asset prompts) is excluded from this section because image generation prompts were not delivered identically to all platforms, and so they wouldn't provide a fair comparison point.

Prompt 1 of 4 — Session Setup (Phase 1)

Purpose: Establishes Richard's background, site purpose, target audience, and this week's topic context before any content generation begins. Ensures the AI understands the brand voice, research data, and quality expectations.

"You are an AI assistant helping Richard Ketelsen create content for Ketelsen.ai, a weekly blog exploring how to prompt AI assistants effectively for real-world decisions. This week's topic is 'Should I Buy a Car Right Now?' — a prompt that helps non-technical professionals work through a complex financial decision using structured AI assistance. The target audience is professionals (age 30–50) exploring AI tools, not technical experts. Create a detailed blog post with three prompt variations: a simple, accessible beginner version; an intermediate version for users wanting deeper financial analysis; and an advanced version for power users seeking detailed decision architecture. Follow the blog post template exactly, include detailed prompt breakdowns that explain how each part works and why readers should remember these principles, provide real-world examples for each variation with specific financial details, and generate creative use cases showing how the prompt adapts to other decisions."

Prompt 2 of 4 — Blog Post Generation (Phase 2)

Purpose: Directs the AI to create all 3 prompt variations following the exact blog post template, with specific guidance for each variation's approach and complexity level.

"Create a complete blog post with three sections, one for each prompt variation. For each section: (1) Write the full prompt that a user would copy and paste into Claude/ChatGPT/Gemini. (2) Break down the prompt, explaining what each major segment does and why that choice matters for generating reliable output. (3) Show a specific example of how the prompt works with a real person (ER nurse, real estate agent, or similar professional). (4) List 4–6 creative use cases showing how this prompt structure adapts to other major life decisions. (5) Provide 3–5 pro tips for getting the best results from the AI. (6) Include 2–3 follow-up prompts that a user might ask after receiving the first response. Label every section clearly."

Prompt 3 of 4 — Variation Summary (Phase 3)

Purpose: Generates the blog post title and a comparative summary of all 3 variations to serve as the post introduction.

"Write a compelling blog post title (under 70 characters) that conveys the value of learning to structure prompts for financial decisions. Then write a 3–4 sentence introduction explaining why this week's topic matters: AI can help with car-buying decisions, but only if you structure the prompt correctly. Summarize what readers will learn: when to use each of the three variations, what makes each one different, and why mastering prompt structure transforms AI from a chatbot into a decision-making partner."

Prompt 4 of 4 — Content Expansion (Phase 4)

Purpose: Deepens all sections across all 3 variations — expanding industry examples, creative use cases, adaptability tips, pro tips, follow-up prompts, and FAQs to meet the depth threshold.

"Expand the blog post significantly: (1) Double the detail in every industry example by adding specific financial calculations, showing why the decision plays out differently across income levels, and recommending specific outcomes based on financial reality. (2) Expand creative use cases to 8–10 per variation, pushing beyond 'couples deciding together' to think about insurance claims, salary negotiation context, side hustle vehicle decisions, and other lateral applications. (3) Add an 'Adaptability Tips' section showing how to modify each prompt for EV buyers, high-cost-of-living markets, negative equity trade-ins, and international readers. (4) Add a 'Frequently Asked Questions' section addressing common follow-up questions like 'What if my credit is bad?' and 'Should I consider a lease instead?' (5) Deepen the pro tips section with before/after examples showing exact prompt modifications."


Methodology Note

This rubric is version 1.0 — our first systematic attempt to measure cross-platform AI quality. The seven dimensions and their weightings reflect what we think matters most for readers learning prompt engineering: quality prompts (20%), teaching (15%), practical examples (15%), writing quality (15%), creativity (10%), actionability (15%), and completeness (10%). We expect these weights to shift as we learn what matters to our audience. Future weeks may reveal that certain dimensions deserve more emphasis, or that new dimensions should be added.

The scores themselves are evidence-backed. Every rating reflects specific observations from the actual posts — quoted examples, structural analysis, and direct comparison of what each platform delivered. We published all three posts on Ketelsen.ai so readers can disagree with our assessment if they choose. The goal isn't to declare an objective winner; it's to model how to evaluate AI quality rigorously, transparently, and improvably.

Next
Next

Gemini :: Should I Buy a Car Right Now? The AI Financial Stress Test