Weekly AI Rankings — May 10 – May 17, 2026

Top 5 AI Models of the Week

#1 Claude Mythos Preview ↑1

This week, Claude Mythos Preview was actively discussed regarding its application in vulnerability discovery in macOS and exploit development. Anthropic also introduced Project Glasswing, which utilizes Mythos for scanning infrastructure for vulnerabilities.

Claude Mythos Preview demonstrated high effectiveness in finding vulnerabilities, surpassing GPT-5.5 in offensive-security evaluations. It also showed significant improvement in multi-step attacks compared to previous versions.

Hardening Firefox with Claude Mythos Preview Behind the Scenes Hardening Firefox with Claude Mythos Preview Automating code security review: Mythos-level capabilities at lower cost Anthropic Mythos and Apple macOS bug report

#2 Claude Opus 4.6 ↑3

Claude Opus 4.6 was discussed in the context of its comparative results with Mythos and GPT-5.5 on benchmarks. The discussion also touched on its shortcomings in programming tasks compared to newer models.

On benchmarks, Claude Opus 4.6 achieved 83% vulnerability discovery, which is lower than Mythos and GPT-5.5. This confirms a shift in user preferences towards newer models for code generation.

Changes in the system prompt between Claude Opus 4.6 and 4.7 GitHub Copilot Pro+ not allowing Claude Opus 4.6 XBOW evaluation of Mythos AISI evaluation of GPT-5.5 cyber capabilities

#3 GPT-5.5 NEW

GPT-5.5 attracted attention this week due to its achievements on benchmarks, including fully solving the cmatrix task. This has become an important signal regarding its capabilities in agent-based programming.

GPT-5.5 fully solved a benchmark task for the first time, highlighting its effectiveness in real programming scenarios. It also outperformed Claude Opus 4.7 in various tasks.

GPT-5.5 Kimi K2.6 just beat Claude, GPT-5.5, and Gemini in a coding challenge OpenAI releases GPT-5.5 and GPT-5.5 Pro in the API ProgramBench — GPT 5.5 first solve

#4 Claude Opus 4.7 ↓3

Claude Opus 4.7 was discussed in the context of its comparative results with GPT-5.5, where it showed poorer performance on programming tasks. This raised questions about its relevance for code generation.

On benchmarks, Claude Opus 4.7 showed low results compared to GPT-5.5, confirming the trend towards choosing newer models for programming tasks.

Claude Opus 4.7 Intelligence, Performance and Price Analysis Tell HN: Claude Opus 4.7 quota suddenly changed to 0 TPM in Bedrock GPT-5.5 first solve on ProgramBench

#5 Llama 3.1 NEW

Llama 3.1 was discussed this week in the context of creating personal clones, demonstrating its flexibility and customization capabilities. This highlights the growing interest in personalized AI solutions.

Llama 3.1 allows for the creation of personalized models trained on individual data, opening new opportunities for users.

Tracing tokens through Llama 3.1 8B inference on H100s Show HN: GlycemicGPT – Open-source AI-powered diabetes management Tomás‑7B — personal clone example

Top 5 AI Tools of the Week

#1 Claude Code →

Claude Code was actively discussed this week in the context of its application for agent-based development and code refactoring. Participants shared successful examples of using Claude Code for complex tasks.

Claude Code demonstrated high effectiveness in complex calculations and analytics, and received updates that increased its limits and agent management capabilities.

Claude Code refuses requests or charges extra if your commits mention "OpenClaw" An update on recent Claude Code quality reports Claude Code to be removed from Anthropic's Pro plan? agents-best-practices

#2 Codex ↑2

Codex was discussed in the context of new features, including mobile management and remote access, significantly expanding its functionality. This makes Codex more accessible to users.

Codex is now accessible through the ChatGPT mobile app, allowing users to manage agents and tasks from anywhere.

Codex is now in the ChatGPT mobile app A Claude Code and Codex Skill for Deliberate Skill Development Show HN: Ctx – a /resume that works across Claude Code and Codex Work with Codex from anywhere

#3 OpenClaw workflow thread NEW

OpenClaw was discussed in the context of its scalability and the application of hundreds of agents for automating development. This drew attention to the economics of using such systems.

OpenClaw demonstrated the ability to automate the engineering process with high token costs, highlighting its complexity and expense.

OpenClaw workflow thread

#4 Bun PR NEW

Bun was discussed in the context of its refactoring from Zig to Rust using Claude Code, which became an example of successful application of agent-based programming in real projects.

The transition to Rust took about 10 days and achieved ~99.8% test pass rate, demonstrating the effectiveness of using Claude Code in infrastructure changes.

Analysis of changes in the Bun codebase after the rewrite from Zig to Rust Bun PR: Rust reimplementation

#5 AISI NEW

AISI was discussed in the context of Anthropic's Glasswing project, which uses Claude Mythos for vulnerability discovery. This raised interest in questions of access to such models.

The Glasswing project provides organizations with tools for scanning infrastructure for vulnerabilities with coordinated disclosure of findings.

Project Glasswing / Mythos Preview AISI: How fast is autonomous AI cyber capability advancing?

Get daily AI signals in Telegram →