Weekly AI Rankings — May 03 – May 10, 2026

Top 5 AI Models of the Week

#1 Claude Opus 4.7 ↑1

This week, Claude Opus 4.7 was a focal point of discussions due to the release of the new ProgramBench benchmark, which revealed that none of the models, including Claude, could fully solve the tasks. This raised questions about the real capabilities of coding agents and their applicability in complex scenarios.

Claude Opus 4.7 has increased token limits and power due to the lease of Colossus 1 from SpaceX, allowing it to process up to 10M input and 800K output tokens per minute.

Claude Opus 4.7 Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7 Claude Opus 4.7 Anthropic: Higher limits through expanded compute

#2 Claude Mythos Preview NEW

Claude Mythos Preview was actively discussed this week following a successful evaluation of its capabilities in discovering vulnerabilities in Firefox, where it helped identify 271 vulnerabilities. This highlights its potential in the field of cybersecurity.

According to METR evaluations, Claude Mythos Preview demonstrated a 50% success rate on tasks with a horizon of at least 16 hours, indicating its high effectiveness.

Hardening Firefox with Claude Mythos Preview Evaluation of Claude Mythos Preview's cyber capabilities Behind the Scenes Hardening Firefox with Claude Mythos Preview METR Evals — Mythos Preview horizon eval

#3 GPT-5.4 NEW

GPT-5.4 was discussed in the context of the new ProgramBench benchmark, which showed that none of the models could solve the tasks, highlighting the limitations of current coding agents.

GPT-5.4 demonstrated its capabilities in solving mathematical problems but also faced challenges in the context of multi-file design.

GPT-5.4 Pro solves Erdős Problem #1196 Comparing GPT-5.4, Opus 4.6, GLM-5.1, Kimi K2.5, MiMo V2 Pro and MiniMax M2.7 A GPT-5.4 bug led to OpenAI banning goblins and raccoons ProgramBench

#4 GPT-5.5 Instant NEW

GPT-5.5 Instant became the new default model for ChatGPT, sparking discussions about its improvements, including reduced hallucinations and an updated memory interface.

The model showed significant improvement in AIME 2025, reaching 81.2, which is significantly higher than its predecessor.

GPT-5.5 Instant: Benchmarking the 52% Hallucination Reduction GPT‑5.5 Instant OpenAI releases GPT-5.5 Instant, a new default model for ChatGPT OpenAI представила GPT-5.5 Instant как новую модель по умолчанию в ChatGPT

#5 Claude Opus 4.6 NEW

Claude Opus 4.6 was discussed in light of new architectural solutions like SubQ, which promise inference acceleration and improved handling of long contexts.

SubQ announced support for a context of up to 12M tokens and acceleration of up to 52× compared to FlashAttention.

Changes in the system prompt between Claude Opus 4.6 and 4.7 Claude Opus 4.6 accuracy on BridgeBench hallucination test drops from 83% to 68% Google — Multi-Token Prediction for Gemma 4 Gemma 4 MTP drafters collection

Top 5 AI Tools of the Week

#1 Claude Code →

Claude Code became a topic of discussion due to increased limits and improvements related to the lease of Colossus 1 from SpaceX, which allowed for an increase in tokens for paid tiers.

As a result of the changes, Claude Code can now handle up to 10M input and 800K output tokens per minute.

Claude Code refuses requests or charges extra if your commits mention "OpenClaw" An update on recent Claude Code quality reports Claude Code Routines Anthropic: Higher limits through expanded compute

#2 SubQ NEW

SubQ was announced this week and attracted attention due to its architectural solutions that promise significant inference acceleration.

SubQ supports a context of up to 12M tokens and offers acceleration of up to 52× compared to existing solutions.

SubQ: a sub-quadratic LLM with 12M-token context SubQ: Sub-quadratic LLM built for 12M-token context SubQ Google — Multi-Token Prediction for Gemma 4

#3 CAPTCHA Verification NEW

CAPTCHA Verification was discussed in the context of increased limits for Claude, making it more accessible for long sessions and complex tasks.

The increase in limits is related to new computing deals, including a partnership with SpaceX.

Show HN: OQP – A verification protocol for AI agents Show HN: OQP – A verification protocol for AI agents The Verge — Anthropic’s Claude usage limits are getting a boost after compute deals CAPTCHA Verification

#4 OpenAI Codex NEW

OpenAI Codex received a Chrome extension, allowing it to work directly in the browser, which became a topic of discussion this week.

The extension is available on macOS and Windows but is not yet supported in the EU and UK.

OpenAI Codex system prompt includes directive: "never talk about goblins" OpenAI Models, Codex, and Managed Agents Come to AWS OpenAI Wants Codex to Shut Up About Goblins OpenAI

#5 Claude vs ChatGPT vs Copilot ↑15

The discussion comparing Claude, ChatGPT, and Copilot became relevant this week as participants shared experiences using a multi-model approach.

The multi-model approach allows for using different models for various tasks, enhancing work efficiency.

Claude vs ChatGPT vs Copilot для кода: сравнение 2026 Релиз Claude 3.7 Sonnet — лучшая LLM для кодинга Год с Claude Code: как собрать рабочую конфигурацию с первого запуска / Хабр InsForge

Get daily AI signals in Telegram →