The rivalry between Anthropic and OpenAI has intensified, from competing Tremendous Bowl advertisements to launching new coding fashions on the identical day. Anthropic’s Claude Opus 4.6 and OpenAI’s Codex 5.3 at the moment are stay. Each present sturdy benchmarks, however which one really stands out? I’ll put them to the take a look at and evaluate their efficiency on the identical job. Let’s see which one comes out on high.
OpenAI Codex 5.3 vs Claude Opus 4.6: Benchmarks
Claude 4.6 Opus scores for SWE-Bench and Cybersecurity are described as “industry-leading” or “high of the chart” of their launch notes, with particular high-tier efficiency indicated of their system playing cards.
| Benchmark | Claude 4.6 Opus | GPT-5.3-Codex | Notes |
|---|---|---|---|
| Terminal-Bench 2.0 | 81.4% | 77.3% | Agentic terminal abilities and system duties. |
| SWE-Bench Professional | ~57%* | 56.8% | Actual-world software program engineering (multi-language). |
| GDPval-AA | Main (+144 Elo) | 70.9% (Excessive) | Skilled data work worth. |
| OSWorld-Verified | 72.7% | 64.7% | Visible desktop atmosphere utilization. |
| Humanity’s Final Examination | First Place | N/A | Advanced multidisciplinary reasoning. |
| Context Window | 1 Million Tokens | 128k (Output) | Claude helps 1M enter / 128k output restrict. |
| Cybersecurity (CTF) | ~78%* | 77.6% | Figuring out and patching vulnerabilities. |
Claude 4.6 Opus (Anthropic):
- Focus: Distinctive at deep reasoning and long-context retrieval (1M tokens). It excels at Terminal-Bench 2.0, suggesting it’s at the moment the strongest mannequin for agentic planning and sophisticated system-level duties.
- New Options: Introduces “Adaptive Considering” and “Context Compaction” to handle long-running duties with out shedding focus.
Right here’s our detailed evaluate on Claude Opus 4.6.
GPT-5.3-Codex (OpenAI):
- Focus: Specialised for the complete software program lifecycle and visible laptop use. It exhibits a large leap in OSWorld-Verified, making it extremely efficient at navigating UI/UX to finish duties.
- New Options: Optimized for pace (25% sooner than 5.2) and “Interactive Collaboration,” permitting customers to steer the mannequin in real-time whereas it executes.
Right here’s our detailed weblog on Codex 5.3.
Learn how to Entry?
- For Opus 4.6: I’ve used my Claude Professional account price $17 per 30 days.
- For Codex 5.3: I’ve used the macOS app of codex and my ChatGPT plus account (₹1,999/month) for logging-in.
Claude Opus 4.6 vs OpenAI Codex 5.3 Duties
Now that we’re executed with all the premise, let’s evaluate the efficiency of those fashions. You will discover my immediate, mannequin responses and my tackle the identical:
Process 1: Twitter‑type Clone (net app)
Immediate:
You might be an knowledgeable full‑stack engineer and product designer. Your job is to construct a easy Twitter‑type clone (net app) utilizing dummy frontend knowledge.
Use: Subsequent.js (App Router) + React + TypeScript + Tailwind CSS. No authentication, no actual backend; simply mocked in‑reminiscence knowledge within the frontend.
Core Necessities:
- Left Sidebar: Emblem, predominant nav (Dwelling, Discover, Notifications, Messages, Bookmarks, Lists, Profile, Extra), major “Publish” button.
- Middle Feed: Timeline with tweets, composer on the high (profile avatar + “What is going on?” enter), every tweet with avatar, title, deal with, time, textual content, non-obligatory picture, and actions (Reply, Retweet, Like, View/Share).
- Proper Sidebar: Search bar, “Traits for you” field (subjects with tweet counts), “Who to comply with” card (3 dummy profiles).
- High Navigation Bar: Fastened with “Dwelling” and a couple of tabs: “For you” and “Following”.
- Cell Habits: On small screens, present a backside nav bar with icons as a substitute of the left sidebar.
Dummy Knowledge:
- Create TypeScript varieties for Tweet, Person, Development.
- Seed app with:
- 15 dummy tweets (quick/lengthy textual content, some with pictures, various like/retweet/reply counts).
- 5 dummy traits (title, class, tweet rely).
- 5 dummy customers for “Who to comply with”.
Habits:
- Publish Composer: Kind a tweet and immediately add it to the highest of the “For you” feed.
- Like Button: Toggle favored/unliked state and replace like rely.
- Tabs: “For you” exhibits all tweets, “Following” exhibits tweets from 2–3 particular customers.
- Search Bar: Filter traits by title because the consumer varieties.
File and Element Construction:
- app/structure.tsx: International structure.
- app/web page.tsx: Important feed web page.
- parts/Sidebar.tsx: Left sidebar.
- parts/Feed.tsx: Middle feed.
- parts/Tweet.tsx: Particular person tweet playing cards.
- parts/TweetComposer.tsx: Composer.
- parts/RightSidebar.tsx: Traits + who-to-follow.
- parts/BottomNav.tsx: Cell backside navigation.
- knowledge/knowledge.ts: Dummy knowledge and TypeScript varieties.
Use Tailwind CSS to match Twitter’s design: darkish textual content on gentle background, rounded playing cards, refined dividers.
Output:
- Present a brief overview (5–7 bullet factors) of the structure and knowledge circulate.
- Output all information with feedback on the high for file paths and full, copy-paste-ready code.
- Match imports with file paths used.
Constraints:
- No backend, database, or exterior API—every part should run with
npm run dev.- Use a regular create-next-app + Tailwind setup.
- Preserve all content material dummy (no actual usernames or copyrighted content material).
Learn how to Run:
After making a Subsequent.js + Tailwind venture, run the app with the precise instructions supplied.
Output:
My Take:
The Twitter clone constructed by Claude was noticeably higher. Codex did handle to create a sidebar panel, however it had lacking pictures and felt incomplete, whereas Claude’s model seemed much more polished and production-ready.
Process 2: Making a Blackjack Recreation
Immediate:
Recreation Overview:
Construct a easy, honest 1v1 Blackjack sport the place a human participant competes in opposition to a pc seller, following normal on line casino guidelines. The pc ought to comply with fastened seller guidelines and never cheat or peek at hidden data.
Tech & Construction:
- Use HTML, CSS, and JavaScript solely.
- Single-page app with three information:
index.html,type.css,script.js.- No exterior libraries.
Recreation Guidelines (Normal Blackjack):
- Deck: 52 playing cards, 4 fits, values:
- Quantity playing cards: face worth.
- J, Q, Okay: worth 10.
- Aces: worth 1 or 11, whichever is extra favorable with out busting.
- Preliminary Deal:
- Participant: 2 playing cards face up.
- Supplier: 2 playing cards, one face up, one face down.
- Participant Flip:
- Choices: “Hit” (take card) or “Stand” (finish flip).
- If the participant goes over 21, they bust and lose instantly.
- Supplier Flip (Fastened Logic):
- Reveal the hidden card.
- Supplier should hit till 17 or extra, and should stand at 17 or above (select “hit on gentle 17” or “stand on all 17s” and state it clearly within the UI).
- Supplier doesn’t see future playing cards or override guidelines.
- Final result:
- If the seller busts and the participant doesn’t, the participant wins.
- If neither busts, the upper whole wins.
- Equal totals = “Push” (tie).
Equity / No Bias Necessities:
- Use a correctly shuffled deck at the beginning of every spherical (e.g., Fisher-Yates shuffle).
- The seller should not change conduct based mostly on hidden data.
- Don’t rearrange the deck mid-round.
- Preserve all sport logic in
script.jsfor audibility.- Show a message like: “Supplier follows fastened guidelines (hits till 17, stands at 17+). No rigging.”
UI Necessities:
- Structure:
- High: Supplier part – present seller’s playing cards and whole.
- Center: Standing textual content (e.g., “Your flip – Hit or Stand?”, “Supplier is drawing…”, “You win!”, “Supplier wins”, “Push”).
- Backside: Participant part – present participant’s playing cards, whole, and buttons for Hit, Stand, and New Spherical.
- Present playing cards as easy rectangles with rank and swimsuit (textual content solely, no pictures).
- Show win/loss/tie counters.
Interactions & Movement:
- When the web page masses, present a “Begin Recreation” button, then deal preliminary playing cards.
- Allow Hit/Stand buttons solely throughout the participant’s flip.
- After the participant stands or busts, run the seller’s automated flip step-by-step (with small timeouts).
- At spherical finish, present the end result message and replace counters.
- “New Spherical” button resets fingers and reshuffles the deck.
Code Group:
- Capabilities in
script.js:
createDeck(): Returns a contemporary 52-card deck.shuffleDeck(deck): Shuffles the deck (Fisher-Yates).dealInitialHands(): Offers 2 playing cards every.calculateHandTotal(hand): Handles Aces as 1 or 11 optimally.playerHit(),playerStand(),dealerTurn(),checkOutcome().- Monitor variables for
playerHand,dealerHand,deck, and win/loss/tie counters.Output Format:
- Briefly clarify in 5–7 bullet factors how equity and no bias are ensured.
- Output the complete content material for:
index.htmltype.cssscript.js- Make sure the code is copy-paste prepared and constant (no lacking features or variables).
- Add a “Learn how to run” part: instruct to position the three information in a folder and open
index.htmlin a browser.
Output:
My Take:
The hole grew to become much more apparent within the Blackjack sport. Codex 5.3 produced a really boring, static output. In distinction, Claude Opus 4.6 was manner forward. It delivered a correct inexperienced on line casino mat, a way more engaging UI, and an total partaking net expertise.
Claude Opus 4.6 vs OpenAI Codex 5.3: Last Verdict
Opinions on whether or not Codex 5.3 or Opus 4.6 is best stay divided within the tech group. Codex 5.3 is favored for its pace, reliability in producing bug-free code, and effectiveness in advanced engineering duties, notably for backend fixes and autonomous execution. Then again, Opus 4.6 excels in deeper reasoning, agentic capabilities, and dealing with long-context issues, providing extra engaging UI designs. Nonetheless, it might probably face challenges with iterations and token effectivity.
After my hands-on expertise with each fashions, for this battle, Codex 5.3 vs Claude Opus 4.6, I’m going with Claude Opus 4.6 🏆.
The general efficiency, ease of use, and polished UI made it stand out within the duties I examined, regardless that Codex 5.3 had its deserves in pace and performance.
Don’t simply take my phrase for it. Put each fashions to the take a look at your self and see which one works finest for you! Let me know your ideas.
Login to proceed studying and revel in expert-curated content material.
