Week 7 AI Showdown :: Claude vs. ChatGPT vs. Gemini :: Researching Dealers and Test Driving Like a Pro

  • Metadata

    Topic: New vs. Certified Pre-Owned: Let AI Make the Case

    Week: Week 2

    Series: AI at the Dealership: 7 Weeks of Prompts That Could Save You Thousands

    Rubric version: v2.0

    Platforms compared: ChatGPT, Gemini, Claude

    Result: Statistical Tie — ChatGPT (88.5) and Claude (87.5), margin 1.0 point

    Third place: Gemini (81.0 / 100)

    Margin to third: 6.5 points

    Tags: ai-comparison, prompt-engineering, chatgpt-vs-claude-vs-gemini, weekly-showdown, ai-quality, rubric, new-vs-cpo, car-buying, week-2

    Categories: AI Comparison, Prompt Engineering

    Estimated reading time: 12 minutes

    SEO title: Week 2 AI Showdown: ChatGPT vs Claude vs Gemini on New vs CPO Car Buying

    SEO description: We ran the same new-vs-CPO car buying prompts through ChatGPT, Claude, and Gemini, then scored the results across 7 dimensions. Week 2 ends in the series' first statistical tie.

Week 7 AI Showdown: Which Platform Wrote the Best Prompt Post?

Global Introduction

Every week, Ketelsen.ai publishes the same topic—this week, "Researching Dealers and Test Driving Like a Pro"—across three AI platforms: ChatGPT, Gemini, and Claude. Each platform produces independent prompt variations, breakdowns, practical examples, and creative extensions. The question readers ask: which version should I read first? Which platform gives me the most useful guidance? To answer that, we've developed a 7-dimension rubric that scores prompt quality, clarity, practical relevance, writing voice, creative novelty, actionability, and template completeness. Today we're releasing the scores for Week 4 and explaining why the winner won.

The Topic: Researching Dealers and Test Driving Like a Pro

This week's topic covers pre-visit dealer research, reputation analysis, structured test-drive scoring, and the strategic moves that transform a passive dealership visit into a controlled diagnostic evaluation. This is Week 4 of a 7-week "AI at the Dealership" series designed to help readers use AI at every stage of car buying.

How We Score: The 7-Dimension Quality Rubric

Each week, we evaluate all three platforms against seven dimensions that matter to readers: does the prompt work? Is the breakdown clear? Are the examples realistic and specific? Does the voice fit our audience? Are there surprising creative uses? Is the advice actionable? Is the post complete? Each dimension is scored on a 1-10 scale and weighted by importance: prompt quality (20%), breakdown clarity (15%), practical examples (15%), writing quality (15%), creative novelty (10%), actionability (15%), and template completeness (10%).

We normalize all scores to a 0-100 scale for easy comparison. A margin of 3.0 points or higher indicates a clear winner; margins below 3.0 are treated as statistical ties, invoking an editorial tiebreak to declare a publication winner. This rubric is versioned (currently v2.0) and will refine based on reader feedback and evolving quality benchmarks.


Rubric Dimensions

Dimension Weight What It Measures
D1: Prompt Quality & Engineering Depth 20% Role clarity, role modifiers, deliverable naming, variable structure, constraint specificity, and the precision of the engineering.
D2: Prompt Breakdown Clarity 15% Clarity of the breakdown section, depth of insight, and explicit teaching of transferable principles.
D3: Practical Examples & Industry Relevance 15% Specificity of financial/domain parameters, realism, depth of example personas, and transferability of insights across industries.
D4: Writing Quality & Brand Voice 15% Brand alignment with Ketelsen.ai voice (fun, entertaining, informative), distinctive platform voice, and readability.
D5: Creative Use Cases & Unexpected Angles 10% Novelty of creative use cases, cross-domain transferability, and how well they reframe the prompt's potential.
D6: Actionability & Reader Value 15% Practical tools, checklists, templates, specific URLs, defined scoring anchors, and measurable reader improvement.
D7: Completeness & Template Adherence 10% All required sections present and substantive, citations included, tags and prerequisites complete.

Platform-by-Platform Breakdown

Claude: 84.5 / 100

Strengths

Claude's prompt engineering is exceptional. The Beginner variation opens with "Act as my plain-English car-buying coach" — pairing the role with a register modifier to control both reasoning depth and vocabulary level. The Intermediate prompt doubles the role ("senior automotive consumer advocate and research analyst with 15+ years of experience") to force three-dimensional thinking across advocacy and analysis. Claude's breakdown section contains 29 explicit transferable principles, teaching readers not just what the prompts do but why each segment works — e.g., "The number 5 is doing real work here. Without a count, the AI will produce 12 items of uneven quality." This is education-grade clarity.

Claude's practical examples are the deepest across all three platforms, with specific financial parameters embedded in every persona: the pediatric nurse gets $35,000 at 6.4%, the landscaping contractor gets $32,000 at 8.1%, the video editor flags arriving without pre-approval as a leverage gap. The Advanced section examples escalate in complexity — a CFO with CPO luxury sedan comparisons, a dual-income couple managing a $92,000 household replacement, a wealth manager coordinating a three-vehicle fleet refresh — each with specific scoring outcomes (19/20, 17/20, 20/20) that show the framework in action. The examples use persona-rotation across difficulty tiers, escalating in analytical rigor rather than just varying the character names.

Weaknesses

Claude's voice, while authoritative and principle-teaching, is less conversational than ChatGPT's and less dramatic than Gemini's. The analytical tone can feel slightly clinical for an audience seeking entertainment alongside education. The creative use cases, while solid (boats, motorcycles, RVs, e-bikes, fine art), are mostly direct transpositions of the framework rather than surprising recontextualizations. The prompt for using it on private-party sellers is well-reasoned but not novel.

Signature Move

Claude's 29 explicit transferable principles transform a prompt post into a prompt-engineering masterclass, teaching readers not just this week's prompts but the reasoning patterns they can apply to every future prompt they write.


Gemini: 82.75 / 100

Strengths

Gemini's voice is the most distinctive of the three, using intensifiers and dramatic language that matches Forbes editorial style. The opening describes the dealership as "stepping onto a battlefield" and warns that buyers "surrender all that leverage the moment you hand over your driver's license." This language creates urgency and engagement. The Beginner prompt uses a structured parameter list (Intent, Condition, Vehicle Type, Region, Financials) that is slightly more scannable than Claude's fill-in-the-blank approach.

Gemini's practical examples are strong and role-diverse: a startup founder leasing a luxury EV, a graphic designer needing cargo space, a realtor seeking a professional vehicle, a fleet manager handling commercial vans, a rideshare contractor optimizing for passenger comfort, and a medical software sales rep prioritizing highway autonomy. The persona-rotation strategy shows escalating complexity—each example demonstrates how the framework adapts to different professional contexts. The creative use case "choosing a daycare or school" is genuinely unexpected and shows true cross-domain thinking.

Weaknesses

Gemini's prompt breakdown section contains only 12 explicit transferable principles, compared to Claude's 29. While the principles are well-explained, the teaching depth is lighter. The Beginner section deliverables are less granular than Claude's (three sections instead of four detailed subsections). Some intensifier language ("astonishing," "passive participant," "surrender") can feel slightly overwrought for readers seeking cool professionalism.

Signature Move

Gemini's dramatic, engaging voice makes dealership research feel like a strategic victory rather than administrative drudgery, and its cross-domain creative uses (daycare selection, music school evaluation) demonstrate that the framework transcends automotive purchasing.


ChatGPT: 82.5 / 100

Strengths

ChatGPT's voice is warm, folksy, and deeply conversational. The opening analogy—"walking into a dealership without a plan is a little like walking into a final exam after only reading the book jacket: technically possible, emotionally spicy, and rarely recommended"—immediately establishes a friendly, humorous tone. The Beginner prompt includes a "My biggest concern" field (pressure, hidden fees, test drive anxiety) that adds emotional scaffolding beyond what peers offer, making nervous first-time buyers feel heard.

ChatGPT's practical examples are thorough and realistic: a tech startup professional with a hybrid commute, a freelance consultant needing a client-facing vehicle, a healthcare worker managing early shifts and bad weather, a finance professional comparing three dealers, a young family with cargo needs, and a CPO early-adopter verifying warranty backing. Each example includes specific vehicle recommendations and real-world constraints. The most detailed example—a business owner buying a $55,000 truck—includes trade-in payoff ($4,500), private-sale estimate ($16,000), trade estimate ($13,500), and shows how dealer-installed accessories can hide the real OTD price. This is actionable financial specificity.

ChatGPT's FAQ section is more comprehensive than peers, with five questions instead of three or four. The questions address real concerns: "What if the AI gives me generic advice?" (expected and correct), "Do I really need all four sections?" (yes, especially for test-drive-only visits), "What if I'm shopping for an EV?" (add EV-specific criteria). The post is also the longest (755 lines, 96KB) with substantive coverage throughout.

Weaknesses

ChatGPT's prompt engineering is solid but less granular than Claude's on negative constraints and role modifiers. The conversational warmth, while engaging, occasionally sacrifices precision — some sections are longer without adding more actionable specificity. The creative use cases are good but not surprising (similar cross-domain territory as Gemini without a standout element like "daycare selection").

Signature Move

ChatGPT's warm, accessible voice and extensive FAQ section make it the most reader-friendly entry point for nervous or inexperienced dealership visitors, and its financial-specificity examples ("$16,000 private-sale estimate vs. $13,500 trade estimate") show the calculator-level detail that busy professionals appreciate.


The Verdict

Claude takes the win at 84.5 / 100, finishing 1.75 points ahead of Gemini (82.75) and 2.0 points ahead of ChatGPT (82.5). Because that 1.75-point margin sits within our 3.0-point statistical tie threshold, we still treat the top of the leaderboard as a competitive tie window — and we apply the editorial tiebreak to confirm the publication choice. Claude wins the three prompt-engineering dimensions decisively (D1: Prompt Quality 8.5 vs. Gemini 8.0; D2: Breakdown Clarity 8.5 vs. Gemini 8.0; D3: Practical Examples 9.0 vs. Gemini 8.5). For a prompt-focused blog, the platform that engineers the most precise prompts, teaches the most transferable principles, and provides the most financially-detailed examples should win publication — and that's Claude on both raw score and editorial criteria. Gemini's distinctive dramatic voice (D4: 8.5 vs. Claude 8.0) and creative novelty (D5: 8.0 vs. Claude 7.5) make it a strong second choice — readers who prefer engaging narrative over analytical precision should read Gemini first.

All three posts are ship-ready and publishable. The differences are subtle and reflect different reader preferences: Claude for prompt engineers and precision-seekers, Gemini for narrative-driven readers who want urgency and engagement, and ChatGPT for warm, accessible guidance with financial specificity.

What This Means for You

If you're preparing for a dealership visit this week, start with Claude if you want to understand the "why" behind every prompt design choice and extract transferable principles for future prompting. Choose Gemini if you want a more dramatic, motivating narrative that makes dealership research feel strategic and high-stakes. Pick ChatGPT if you're nervous about the process and want warm, reassuring guidance with specific financial examples. All three posts are published on Ketelsen.ai and cover the same topic with different strengths — reading all three will give you the deepest understanding of dealer research and test-drive strategy. Consider reading them in this order: Claude (technical foundation), Gemini (strategic urgency), ChatGPT (practical warmth).


Score Summary

Dimension Weight Claude Gemini ChatGPT
D1: Prompt Quality 20% 8.5 8.0 8.0
D2: Breakdown Clarity 15% 8.5 8.0 8.0
D3: Examples 15% 9.0 8.5 8.5
D4: Writing Quality 15% 8.0 8.5 8.0
D5: Creative Novelty 10% 7.5 8.0 8.0
D6: Actionability 15% 8.5 8.5 8.5
D7: Completeness 10% 9.0 8.5 9.0
OVERALL SCORE (0-100) 84.5 82.75 82.5

Visual Comparison

Claude

D1 Prompts
8.5
D2 Breakdown
8.5
D3 Examples
9.0
D4 Writing
8.0
D5 Creative
7.5
D6 Action
8.5
D7 Complete
9.0

Gemini

D1 Prompts
8.0
D2 Breakdown
8.0
D3 Examples
8.5
D4 Writing
8.5
D5 Creative
8.0
D6 Action
8.5
D7 Complete
8.5

ChatGPT

D1 Prompts
8.0
D2 Breakdown
8.0
D3 Examples
8.5
D4 Writing
8.0
D5 Creative
8.0
D6 Action
8.5
D7 Complete
9.0

Source: Rubric scoring data


The Prompts Behind the Posts

All three platforms received the same four prompts in the same order: Session Setup (Phase 1), Blog Post Generation (Phase 2), Variation Summary (Phase 3), and Content Expansion (Phase 4, optional). Image prompts (Phases 5-14) are excluded from this comparison per Ketelsen.ai publication template guidelines. Here are the first four prompts that generated these posts, in full:

Prompt 1 of 4 — Session Setup (Phase 1)

Purpose: Establish context, audience, content goals, and quality expectations before generating the blog post itself.

"I need your help creating content for my blog, Ketelsen.ai. Let me give you the full context before we begin. PART 1 — PERSONAL BACKGROUND: I am Richard Ketelsen, based in Minneapolis, MN, USA. I have a professional background in Computer Science and Graphic Design. I currently serve as a Senior Cybersecurity Incident Responder at a Fortune 100 Company, with 6 years in cybersecurity, 10 years in Identity Design, and 10 years of entrepreneurship experience. PART 2 — SITE PURPOSE: Ketelsen.ai is an ongoing AI prompt crafting experiment. The blog section features an exclusive prompt collection of in-depth AI prompts covering real-world problems, generated weekly by multiple AI services (Claude, ChatGPT, Gemini). PART 3 — TARGET AUDIENCE: Demographics: Ages 25-45, global (English-speaking), professionals or entrepreneurs with moderate to high discretionary income. Psychographics: Enthusiastic about AI-driven innovation, enjoy experimenting with new technology, prefer transparent behind-the-scenes exploration. Please confirm you understand this context by summarizing my site, audience, and this week's topic in 2-3 sentences. Then wait for my next instruction."

Prompt 2 of 4 — Blog Post Generation (Phase 2)

Purpose: Generate three difficulty-tiered prompt variations (Beginner, Intermediate, Advanced) for dealer research and test-drive strategy, following the Blog Post Template structure exactly.

"I am also providing a BLOG POST TEMPLATE file as a separate attachment: CFT-PROJ-CP-059c_BLOG-POST-TEMPLATE-v1_0.txt. This template defines the EXACT structure and format for each prompt variation. You MUST follow the template for every section. Do not skip any section — if a section does not apply, write 'NOT APPLICABLE'. Now I need you to create 3 prompt variations for this week's topic. Each variation should target a different skill level so readers can copy and paste them into any AI platform to get help researching dealerships and running a structured test drive. TOPIC: 'Researching Dealers and Test Driving Like a Pro' — Pre-visit dealer research, dealer reputation analysis, digital dark pattern detection, structured test drive scoring, and OEM CPO verification. VARIATION 1 — BEGINNER: A prompt designed for a first-time dealership visitor. VARIATION 2 — INTERMEDIATE: A prompt for buyers visiting 2–4 dealerships. VARIATION 3 — ADVANCED: A prompt for buyers who want maximum control and institutional-grade analytical rigor. FOR EACH VARIATION, follow the attached Blog Post Template file EXACTLY."

Prompt 3 of 4 — Variation Summary (Phase 3)

Purpose: Summarize the three variations and provide guidance on which readers should use based on their situation.

"Now that you've created all three variations (Beginner, Intermediate, Advanced), create a comprehensive VARIATION SUMMARY section that readers can use to decide which prompt to try first. The summary should: (1) Recap the three variations in 2-3 sentences each; (2) Identify which readers would benefit from each variation based on their situation; (3) Suggest a reading order or usage pattern; (4) Note the transferable principles that apply across all three variations; (5) Recommend a follow-up workflow after using each variation."

Prompt 4 of 4 — Content Expansion (Phase 4)

Purpose: Expand any variation section that feels thin or underdeveloped, adding depth where needed.

"Review the three variations you've created and identify any sections that could use additional depth or detail. For each variation, expand 2-3 sections by adding more detailed examples, additional creative use cases, extended FAQ answers, or deeper prompt breakdown insights. Prioritize sections that would add the most value to a reader trying to use the prompts in real-world scenarios. Add specific financial figures, realistic timelines, detailed role descriptions, or scenario-specific customization tips. Do not dilute the quality with padding — only expand sections where more detail genuinely improves the reader's ability to use the prompt successfully."


Methodology Note

This rubric represents our current quality framework (v2.0) for comparing prompt-engineering posts. All seven dimensions and their weightings were designed to reflect what matters most to Ketelsen.ai readers: technical precision in prompt design (40% of weight: D1 + D2), practical relevance and specificity (40% of weight: D3 + D4 + D6), and comprehensive delivery (20% of weight: D5 + D7). As we publish more comparisons, we will refine these weights and thresholds based on reader feedback and emerging quality benchmarks. We invite reader disagreement — if you think a different platform should have won, tell us why, and we'll incorporate that feedback into future iterations.

All scores are evidence-backed and defensible with specific quotes and analysis. All three original posts are published in full on Ketelsen.ai, and readers can compare them directly. This comparison is not a judgment that one platform is categorically "better" than another — it's a reflection of how each platform performed on this specific topic for this specific audience, on this specific rubric. Readers' mileage may vary, and we encourage you to read all three and form your own opinions.

Claude finished 1.75 points ahead of Gemini (84.5 vs. 82.75) — within our 3.0-point statistical tie threshold. We confirmed the result via editorial tiebreak by evaluating which platform won the three core prompt-engineering dimensions (D1, D2, D3). Claude won those dimensions decisively, making it the publication winner for a prompt-focused blog on both raw score and editorial criteria. However, Gemini's dramatic voice and creative novelty make it a strong alternative for readers who prioritize narrative engagement over analytical precision. Both posts are equally valuable; the choice between them is one of reader preference.

Metadata

Topic: Researching Dealers and Test Driving Like a Pro

Week: Week 4 of 7 (AI at the Dealership series)

Rubric version: v2.0

Platforms compared: ChatGPT, Gemini, Claude

Winner: Claude (84.5 / 100, raw winner + editorial tiebreak confirmation)

Runner-up: Gemini (82.75 / 100, within tie window)

Third place: ChatGPT (82.5 / 100, within tie window)

Margin of victory: 1.75 points (within our 3.0-point statistical tie threshold; editorial tiebreak via D1+D2+D3 prompt-engineering dimension dominance confirms Claude as publication winner)

Tags: ai-comparison, prompt-engineering, chatgpt-vs-claude-vs-gemini, weekly-showdown, ai-quality, rubric, week-4, dealer-research, test-drive, car-buying

Categories: AI Comparison, Prompt Engineering, Weekly Showdown

Estimated reading time: 8 minutes

SEO title: Week 4 AI Showdown: Which Platform Wrote the Best Dealer Research & Test Drive Prompt Post?

SEO description: Claude, Gemini, and ChatGPT compete on prompt engineering quality. Detailed 7-dimension rubric scoring reveals who excels at dealer research and test-drive guidance. All three posts ship-ready. Read the verdict.

Previous
Previous

Week 7 Deep Research Prompt :: The Dealership Intelligence Investigation

Next
Next

Gemini :: Week 7 :: Getting Your Money Right Before You Shop (Copy)