Week 5 AI Showdown: Which Platform Wrote the Best New vs. CPO Post?

  • Topic: Should I Buy a Car Right Now?

    Week: Week 1

    Rubric version: v1.0

    Platforms compared: ChatGPT, Gemini, Claude

    Winner: Claude (98.0 / 100)

    Runner-up: ChatGPT (85.0 / 100)

    Third place: Gemini (62.0 / 100)

    Margin of victory: 13.0 points

    Tags: ai-comparison, prompt-engineering, chatgpt-vs-claude-vs-gemini, weekly-showdown, ai-quality, rubric, week-1, car-buying, should-i-buy-a-car

    Categories: AI Comparison, Prompt Engineering

    Estimated reading time: 12 minutes

    SEO title: Week 1 AI Showdown: Claude vs. ChatGPT vs. Gemini — Who Wrote the Best Car-Buying Prompt Post?

    SEO description: We gave ChatGPT, Claude, and Gemini the same car-buying prompt topic and scored them across 7 dimensions. See which AI wrote the most useful, detailed, and actionable blog post.

Week 2 AI Showdown: Which Platform Wrote the Best New vs. CPO Post?

Every week at Ketelsen.ai, we take a single prompt topic and run it through all three major AI platforms — ChatGPT, Claude, and Gemini — using the exact same prompts in the exact same order. Then we score the results against a structured rubric to see which platform delivers the most useful, well-written, and genuinely actionable content for readers learning prompt engineering. This week's topic tackled one of the most expensive decisions most people make: should you buy new or go certified pre-owned? The three posts attacked this from beginner-level decision support through institutional-grade financial analysis, and the scores were the closest we have seen yet. Two platforms finished within a single point of each other, making this Week 2's first statistical tie.

The Topic: New vs. Certified Pre-Owned: Let AI Make the Case

Week 2 of the "AI at the Dealership" series asked each platform to build a decision framework that forces AI to argue both sides of the new-versus-CPO question, commit to a clear recommendation, and then help the reader verify that recommendation at the dealership. The topic matters because the average new vehicle now tops $52,600 while CPO supply remains constrained from pandemic-era production cuts — and most buyers still do not understand the critical difference between manufacturer CPO and dealer-certified programs.

How We Score: The 7-Dimension Quality Rubric

We evaluate every post across seven dimensions that reflect what matters most to readers learning to use AI effectively. Prompt Quality measures whether the three variations are genuinely engineered with role-setting, constraints, and output formatting — not just rephrased versions of the same question. Breakdown Clarity asks whether the post teaches you transferable prompt engineering principles, not just what each line of the prompt says. Practical Examples checks whether the industry scenarios use real financial parameters that professionals would recognize. Writing Quality evaluates whether the prose matches the Ketelsen.ai brand voice — fun, confident, Forbes-accessible, and free of filler. Creative Use Cases rewards platforms that go beyond the obvious and suggest applications you would not have thought of. Actionability measures whether you can walk away from the post and immediately do something useful. Completeness checks that every template section is present, substantive, and genuinely scaled across difficulty levels.

Each dimension is scored on a 1-to-10 scale, with anchor descriptions at the even numbers to keep scoring consistent. Dimensions are weighted by importance — Prompt Quality carries the heaviest weight at 20%, while Creative Use Cases and Completeness each carry 10%. The weighted scores are normalized to a 0-100 scale. If two platforms finish within 3 points of each other, we declare a statistical tie and explain the trade-offs rather than splitting hairs over fractions of a point. This is rubric version 2.0, updated from Week 1 to use a 10-point scale for finer discrimination.

Dimension Weight What It Measures
D1: Prompt Quality & Engineering Depth 20% Are the 3 prompt variations genuinely engineered with role-setting, constraints, output formatting, and difficulty differentiation?
D2: Prompt Breakdown Clarity 15% Does the breakdown teach WHY each prompt element works, not just restate what it says?
D3: Practical Examples & Industry Relevance 15% Are the industry scenarios specific, financially grounded, and recognizable to professionals?
D4: Writing Quality & Brand Voice 15% Is the writing confident, fun, and Forbes-accessible without filler or hedging?
D5: Creative Use Cases & Unexpected Angles 10% Do the suggested applications go beyond the obvious and make readers think differently?
D6: Actionability & Reader Value 15% Can the reader immediately use what they read? Are pro tips, follow-ups, and checklists genuinely helpful?
D7: Completeness & Template Adherence 10% Are all template sections present, substantive, and genuinely scaled across difficulty levels?

Source: Ketelsen.ai Quality Rubric v2.0


Platform-by-Platform Breakdown

ChatGPT: 88.5 / 100

Strengths

ChatGPT delivered the most structurally ambitious post of the three at 608 lines. Its prompt engineering across all three variations is meticulous — Variation 1 alone specifies 8 numbered output tasks with explicit anti-hedging guardrails, and Variation 2 scales to 16 structured input fields with same-model comparison controls that prevent brand noise from contaminating the analysis. The prompt breakdowns are exceptionally pedagogical, with 24 total breakdown points across all three variations, each featuring an "if you remove it, the AI may..." impact analysis that teaches readers what breaks when a prompt element is missing. This removal-impact technique is a genuinely useful teaching device that appeared most consistently in ChatGPT's version.

Where ChatGPT truly separated itself was in actionability. The post delivers 14 pro tips, 9 chained follow-up prompts, and specific dealership verification protocols across all three variations. The credit-tier specificity is outstanding — one example demonstrates how CPO at 6.1% actually beats new at 5.8% for a Tier 3 borrower in Nashville, a counterintuitive finding that only emerges when the financial parameters are granular enough. The manufacturer CPO program forensics go deep: Toyota 160-point inspection versus GM 172-point, with explicit warnings about coverage running from original in-service date versus CPO purchase date.

Weaknesses

At 608 lines, ChatGPT produced the longest post, which creates a scanning challenge. Some readers may find the sheer volume of content overwhelming before they reach the section they need. The writing quality, while clean and professional, occasionally prioritizes completeness over editorial tightness — a few passages could be sharper without losing substance. Writing Quality was the one dimension where all three platforms scored identically.

Signature Move

ChatGPT's distinctive strength is exhaustive structural engineering — it builds prompts like architectural blueprints, with every input field, output requirement, and verification step explicitly specified and numbered.


Claude: 87.5 / 100

Strengths

Claude matched ChatGPT on five of seven dimensions and delivered the most intellectually distinctive content of the three. The Variation 3 prompt introduces an "institutional-grade analytical standards" frame with an explicit epistemic protocol — the AI is instructed to flag assumptions, mark unverifiable data, and operate under capital-expenditure-level rigor. This framing produced a gated deliverable workflow where the AI generates one section, pauses for user confirmation, then proceeds — a multi-turn design that prevents overwhelming output and creates natural decision checkpoints.

Claude's prompt breakdowns go 3-4 levels deep per point, following a consistent structure: what the element does, why it matters, what happens if you remove it, and the transferable principle for other domains. The 11 practical examples across 6+ industries include specific financial parameters throughout — exact budgets, credit scores, interest rates, and ownership durations that make each scenario immediately recognizable. The dealer economics forensics in Variation 3 are a standout: Claude explicitly teaches that CPO certification costs dealers $800-$1,200 per vehicle but generates $1,800-$2,500 in additional front-end profit, enabling readers to evaluate whether their premium is justified by actual warranty value versus markup.

Weaknesses

Claude scored one point below ChatGPT on Creative Use Cases (D5). While its dealer economics analysis and cloud infrastructure transfer are genuinely creative, ChatGPT produced a higher volume of unexpected applications — 15 total versus Claude's smaller set. Claude's nonprofit fleet optimization and military PCS examples appeared less frequently than ChatGPT's equivalent coverage of those same scenarios. The one-point gap in creative breadth is what placed Claude into statistical-tie territory rather than a clear win.

Signature Move

Claude's distinctive strength is analytical depth — it builds prompts that produce institutional-grade financial analysis with explicit epistemic standards, gated workflows, and adversarial transparency about dealer economics.


Gemini: 81.0 / 100

Strengths

Gemini produced the most efficient post at 530 lines — the shortest of the three, yet still comprehensive across all template sections. Its Variation 2 introduces a distinctive dual-role assignment: "Act as a Senior Automotive Analyst AND Consumer Protection Advocate" — a technique that balances quantitative analysis with skeptical consumer protection in a single prompt. The Variation 3 multi-session workflow design, which instructs the AI to generate only Deliverable 1 and wait for confirmation before proceeding, demonstrates sophisticated prompt architecture that prevents AI degradation on long-form outputs.

Gemini's creative use cases include some genuinely unexpected angles: a corporate sabbatical road trip scenario requiring a 14-month ownership horizon optimized for minimal-friction liquidation, an investment property vehicle acquisition modeling Turo rental ROI, and a legacy gift analyzer for a grandparent prioritizing advanced safety technology. The adaptability sections are well-crafted, explicitly teaching readers to transfer the decision framework to equipment purchases, SaaS evaluations, real estate leases, and hiring decisions.

Weaknesses

Gemini scored 8 across five of seven dimensions — one point below ChatGPT and Claude on D1 (Prompt Quality), D2 (Breakdown Clarity), D3 (Examples), D5 (Creative Cases), and D6 (Actionability). The intermediate variation's prompt constraints were slightly less explicit than the other two platforms, and the total number of breakdown points and pro tips was lower. The practical examples, while professional and diverse, included slightly less financial granularity — fewer exact interest rates, fewer credit-tier calculations, and fewer manufacturer-specific program comparisons than what ChatGPT and Claude delivered.

Signature Move

Gemini's distinctive strength is efficiency and framework design — it delivers a complete, well-structured post in fewer lines, with innovative prompt architecture like dual-role assignments and multi-session workflows.


The Verdict

Week 2 ends in a statistical tie between ChatGPT (88.5) and Claude (87.5) — a margin of just 1.0 point on the 100-point scale, well within the 3-point threshold where we refuse to split hairs. Both platforms scored 9 out of 10 on Prompt Quality, Breakdown Clarity, Practical Examples, Actionability, and Completeness. The single dimension that separated them was Creative Use Cases (D5), where ChatGPT's higher volume of unexpected applications — nonprofit fleet optimization, military PCS moves with SCRA rate caps, and EV battery health frameworks — earned a 9 versus Claude's 8. Claude answered with its own creative standout: an adversarial dealer economics analysis that teaches buyers the dealer's actual certification cost and profit margin, a level of transparency rare in consumer financial content.

Gemini finished at 81.0 — a clear third place, 6.5 points behind the tied leaders. That gap is meaningful but not damning. Gemini delivered the most efficient post (530 lines versus 598 and 608), scored a perfect match on Writing Quality (8/10, same as both leaders) and Completeness (9/10), and introduced the week's most innovative prompt architecture with its dual-role assignment. The separation came from depth — fewer breakdown points, fewer pro tips, and less financial granularity in the practical examples. Each platform brought something the others did not: ChatGPT brought structural exhaustiveness, Claude brought analytical rigor, and Gemini brought design efficiency.

What This Means for You

If you want the most detailed prompt engineering education with the highest volume of actionable tips and follow-up prompts, start with the ChatGPT version — it is a comprehensive toolkit. If you want the deepest financial analysis with institutional-grade rigor and adversarial transparency about dealer economics, read the Claude version — it will change how you think about CPO certification pricing. If you want a clean, efficient framework you can absorb quickly and adapt to non-automotive decisions, the Gemini version delivers the core value in fewer lines. All three posts are published on Ketelsen.ai, and we recommend reading at least two to see how the same prompt produces genuinely different outputs across platforms.


Score Summary

Dimension Weight ChatGPT Claude Gemini
D1: Prompt Quality 20% 9 9 8
D2: Breakdown Clarity 15% 9 9 8
D3: Examples & Relevance 15% 9 9 8
D4: Writing Quality 15% 8 8 8
D5: Creative Use Cases 10% 9 8 8
D6: Actionability 15% 9 9 8
D7: Completeness 10% 9 9 9
OVERALL SCORE (0-100) 88.5 87.5 81.0

Source: Rubric scoring data (Ketelsen.ai Quality Rubric v2.0)

Visual Comparison

ChatGPT (88.5 / 100)

D1 Prompts
9
D2 Breakdown
9
D3 Examples
9
D4 Writing
8
D5 Creative
9
D6 Action
9
D7 Complete
9

Claude (87.5 / 100)

D1 Prompts
9
D2 Breakdown
9
D3 Examples
9
D4 Writing
8
D5 Creative
8
D6 Action
9
D7 Complete
9

Gemini (81.0 / 100)

D1 Prompts
8
D2 Breakdown
8
D3 Examples
8
D4 Writing
8
D5 Creative
8
D6 Action
8
D7 Complete
9

Source: Rubric scoring data (Ketelsen.ai Quality Rubric v2.0)


The Prompts Behind the Posts

All three platforms received the exact same prompts in the exact same order across four phases. This standardization ensures the comparison measures AI quality, not different instructions. Phase 5 (visual asset prompts) is excluded from this section because image generation prompts were not delivered identically to all platforms.

Prompt 1 of 4 — Session Setup (Phase 1)

Purpose: Establishes Richard's background, site purpose, target audience, and this week's topic context. Provides research data on CPO market conditions, depreciation rates, manufacturer inspection programs, and financing differentials so all platforms work from the same factual foundation.

"I need your help creating content for my blog, Ketelsen.ai. [PART 1: Personal background — cybersecurity, graphic design, entrepreneurship.] [PART 2: Site purpose — AI prompt crafting experiment with weekly content across Claude, ChatGPT, Gemini.] [PART 3: Target audience — ages 25-45, professionals, 'Alex the AI Trailblazer' persona.] [PART 4-6: Value proposition, competitive edge, elevator pitch.] [PART 7: Content goals — Week 2 of 'AI at the Dealership' series, topic is new vs. CPO decision framework with 12 research data points including $52,600 average MSRP, 15-20% first-year depreciation, CPO certification costs $800-$1,200 but generates $1,800-$2,500 profit, and in-service date warranty distinction.] [PART 8: AI role — expert prompt engineer and content strategist.] [PARTS 9-12: Constraints, depth expectations (15,000 char minimum per variation), citation requirements (minimum 3 per variation).]"

Prompt 2 of 4 — Blog Post Generation (Phase 2)

Purpose: Directs the AI to create all 3 prompt variations following the exact blog post template, with specific guidance for each variation's approach — from beginner decision-maker through institutional-grade analysis engine.

"Create 3 prompt variations for this week's topic. VARIATION 1 — BEGINNER: 'The New vs. CPO Side-by-Side' — force the AI to argue BOTH sides then commit to a recommendation with 2-3 specific models, no hedging. User provides budget, vehicle type, ownership duration, annual mileage, credit score, priority ranking. VARIATION 2 — INTERMEDIATE: 'The New vs. CPO Financial & Risk Analysis' — 4-section analysis: financial comparison with same-model control, CPO program evaluation with in-service date distinction, vehicle shortlist with ratings, and red flags with verification questions. VARIATION 3 — ADVANCED: 'The Multi-Variable Vehicle Selection & Risk Framework' — 4 deliverables: category decision matrix with 7 factors and crossover-point analysis, CPO program forensic analysis, curated shortlist with 1-10 scoring across 6 dimensions, decision risk assessment across financial/mechanical/market/CPO risks. Multi-session design. Follow the attached Blog Post Template EXACTLY for all sections."

Prompt 3 of 4 — Variation Summary (Phase 3)

Purpose: Generates the blog post title and a comparative summary of all 3 variations to serve as the post introduction, helping readers immediately identify which variation suits their needs.

"Now that all 3 prompt variations have been completed, provide: (1) A TITLE for this blog post — engaging, SEO-friendly, under 70 characters, capturing the tension between the new-car premium and the CPO value proposition. (2) A BRIEF SUMMARY comparing all 3 variations — 3-5 sentences explaining the shared goal, how each variation approaches the topic differently, and helping readers decide which to try first. Format: TITLE: [title] SUMMARY: [summary]. Keep the tone fun, entertaining, and informative."

Prompt 4 of 4 — Content Expansion (Phase 4)

Purpose: Deepens all sections across all 3 variations — expanding industry examples with specific financial calculations, creative use cases to 3-5 per variation, adaptability tips for EV buyers and commercial vehicles, pro tips with before/after prompt modifications, and FAQs addressing common confusion points.

"Expand on ALL 3 variations: (1) PRACTICAL EXAMPLES — 3-4 detailed buyer profiles including budget-constrained first-time buyer, family upgrading to 3-row SUV, luxury/premium buyer where depreciation makes CPO compelling, small business owner needing Section 179 analysis. (2) CREATIVE USE CASES — add CPO vs. used + third-party warranty comparison, real-time dealership evaluation using verification questions, lease return cycle inventory prediction, co-buyer disagreement resolution, EV/PHEV battery health analysis. (3) ADAPTABILITY TIPS — EV modifications with battery degradation and tax credits, truck/commercial adaptations, luxury segment adjustments, first-time buyer simplifications. (4) PRO TIPS — before/after prompt modifications. (5) FAQs — common follow-ups like credit score uncertainty, actual pricing limitations, and when to consider leasing instead."


Methodology Note

This comparison uses rubric version 2.0, updated from Week 1 to use a 10-point scoring scale with anchors at 2, 4, 6, 8, and 10 for finer discrimination between platforms. The seven dimensions and their weightings remain the same: Prompt Quality (20%), Breakdown Clarity (15%), Practical Examples (15%), Writing Quality (15%), Creative Use Cases (10%), Actionability (15%), and Completeness (10%). We followed a calibration procedure where all three posts were read side by side for each dimension before any scores were assigned — this prevents the first-scored platform from anchoring the others. Dimensions and weights will continue to evolve as the series progresses and we learn what matters most to readers.

Every score is backed by specific evidence from the actual posts — quoted passages, structural counts, and direct comparison of what each platform delivered. All three original posts are published on Ketelsen.ai so readers can apply their own criteria and reach their own conclusions. We welcome feedback on whether the rubric dimensions capture what you care about most when evaluating AI-generated content.

Metadata

Topic: New vs. Certified Pre-Owned: Let AI Make the Case

Week: Week 2

Series: AI at the Dealership: 7 Weeks of Prompts That Could Save You Thousands

Rubric version: v2.0

Platforms compared: ChatGPT, Gemini, Claude

Result: Statistical Tie — ChatGPT (88.5) and Claude (87.5), margin 1.0 point

Third place: Gemini (81.0 / 100)

Margin to third: 6.5 points

Tags: ai-comparison, prompt-engineering, chatgpt-vs-claude-vs-gemini, weekly-showdown, ai-quality, rubric, new-vs-cpo, car-buying, week-2

Categories: AI Comparison, Prompt Engineering

Estimated reading time: 12 minutes

SEO title: Week 2 AI Showdown: ChatGPT vs Claude vs Gemini on New vs CPO Car Buying

SEO description: We ran the same new-vs-CPO car buying prompts through ChatGPT, Claude, and Gemini, then scored the results across 7 dimensions. Week 2 ends in the series' first statistical tie.

Previous
Previous

Week 4 Deep Research Prompt: Should You Really Buy a Car Right Now?

Next
Next

Gemini :: Week 5 :: New vs. Certified Pre-Owned: Let AI Make the Smarter Car-Buying Call