Weekly AI Rankings — May 03 – May 10, 2026
Top 5 AI Models of the Week
This week, Claude Opus 4.7 was a focal point of discussions due to the release of the new ProgramBench benchmark, which revealed that none of the models, including Claude, could fully solve the tasks. This raised questions about the real capabilities of coding agents and their applicability in complex scenarios.
Claude Opus 4.7 has increased token limits and power due to the lease of Colossus 1 from SpaceX, allowing it to process up to 10M input and 800K output tokens per minute.
Claude Mythos Preview was actively discussed this week following a successful evaluation of its capabilities in discovering vulnerabilities in Firefox, where it helped identify 271 vulnerabilities. This highlights its potential in the field of cybersecurity.
According to METR evaluations, Claude Mythos Preview demonstrated a 50% success rate on tasks with a horizon of at least 16 hours, indicating its high effectiveness.
GPT-5.4 was discussed in the context of the new ProgramBench benchmark, which showed that none of the models could solve the tasks, highlighting the limitations of current coding agents.
GPT-5.4 demonstrated its capabilities in solving mathematical problems but also faced challenges in the context of multi-file design.
GPT-5.5 Instant became the new default model for ChatGPT, sparking discussions about its improvements, including reduced hallucinations and an updated memory interface.
The model showed significant improvement in AIME 2025, reaching 81.2, which is significantly higher than its predecessor.
Claude Opus 4.6 was discussed in light of new architectural solutions like SubQ, which promise inference acceleration and improved handling of long contexts.
SubQ announced support for a context of up to 12M tokens and acceleration of up to 52× compared to FlashAttention.
Top 5 AI Tools of the Week
Claude Code became a topic of discussion due to increased limits and improvements related to the lease of Colossus 1 from SpaceX, which allowed for an increase in tokens for paid tiers.
As a result of the changes, Claude Code can now handle up to 10M input and 800K output tokens per minute.
SubQ was announced this week and attracted attention due to its architectural solutions that promise significant inference acceleration.
SubQ supports a context of up to 12M tokens and offers acceleration of up to 52× compared to existing solutions.
CAPTCHA Verification was discussed in the context of increased limits for Claude, making it more accessible for long sessions and complex tasks.
The increase in limits is related to new computing deals, including a partnership with SpaceX.
OpenAI Codex received a Chrome extension, allowing it to work directly in the browser, which became a topic of discussion this week.
The extension is available on macOS and Windows but is not yet supported in the EU and UK.
The discussion comparing Claude, ChatGPT, and Copilot became relevant this week as participants shared experiences using a multi-model approach.
The multi-model approach allows for using different models for various tasks, enhancing work efficiency.