Weekly AI Rankings — May 31 – June 07, 2026

Top 5 AI Models of the Week

#1 Claude Opus 4.8 →

This week, the SWE-rebench update was discussed, where Claude Opus 4.8 was compared with other models, including GPT-5.5. The results indicated that GPT-5.5 medium is more efficient than Claude Opus 4.8 high, despite the latter's improved task-solving cost.

Claude Opus 4.8 is a model that has improved task-solving costs but did not show significant quality gains compared to previous versions.

Claude Opus 4.8 Claude Opus 4.8 Max responding to an empty message Claude Opus 4.8 distilled Alibaba Qwen models SWE-rebench

#2 GPT-5.5 NEW

The SWE-rebench discussion revealed that GPT-5.5 medium outperforms Claude Opus 4.8 high in efficiency. This comparison has become crucial for evaluating the performance of modern coding agents.

GPT-5.5 is a new model from OpenAI that demonstrates high efficiency on long contexts, although it does not always achieve the best results.

HWE Bench: A new unbounded Benchmark for LLMs (GPT 5.5 is on top) GitHub Copilot charges GPT 5.5 with a 57x multiplier per request from June first DeepSWE: More and cheaper intelligence from maxed GPT 5.5 than maxed Opus 4.8 Codex for Every Role

#3 Qwen 2-VL NEW

This week, researchers introduced the VL-DAC method, which demonstrated successful skill transfer from simulators to real tasks using Qwen 2-VL. This opens new possibilities for training visual-language models.

Qwen 2-VL is a visual-language model that showed significant improvement in an interactive environment through the use of simulators.

#4 Gemma 4 12B NEW

Google released Gemma 4 12B this week, a multimodal model capable of processing text, images, and audio. This update attracted attention due to its ability to run on standard devices with 16GB of RAM.

Gemma 4 12B is a model that does not require separate encoders for processing different types of data, making it more versatile.

Gemma 4 12B: A unified, encoder-free multimodal model A Visual Guide to Gemma 4 12B Google's new Gemma 4 12B model is designed to run on any laptop with 16GB of RAM Gemma 4 collection on Hugging Face

#5 MiniMax M3 NEW

This week, MiniMax M3 attracted attention as a strong agent model, demonstrating capabilities in web surfing and CUDA core optimization. This highlights the growing interest in multimodal and agent models.

MiniMax M3 is the first open-weights model that combines three frontier capabilities, making it unique in the market.

Minimax M3 MiniMax M3: The First Open-Weights Model to Combine Three Frontier Capabilities MiniMax teased M3 Sparse Attention: 9.7x prefilling, 15.6x decoding at 1M Qwen3.7-Plus Blog

Top 5 AI Tools of the Week

#1 Claude Code →

This week, the use of Claude Code in conjunction with Codex for task architecture and implementation was discussed. Comparisons showed that Claude Code performs better with frontend and design, while Codex is more convenient for working with the environment.

Claude Code is a tool that allows developers to efficiently implement tasks by breaking them down into smaller blocks for better management.

Using Claude Code: The unreasonable effectiveness of HTML Microsoft starts canceling Claude Code licenses Claude Code as a Daily Driver: Claude.md, Skills, Subagents, Plugins, and MCPs Claude Code

#2 ChatGPT NEW

OpenAI introduced the Dreaming mechanism for ChatGPT's memory, which allows synthesizing and updating facts from chat history. This update significantly enhances personalization and user interaction.

Dreaming enables ChatGPT to not only store facts but also synthesize the current state of the user, enhancing its usefulness.

A recent experience with ChatGPT 5.5 Pro Codex is now in the ChatGPT mobile app OpenAI and Government of Malta partner to roll out ChatGPT Plus to all citizens ChatGPT Memory: Dreaming

#3 Recursive self-improvement NEW

This week, Anthropic introduced the concept of recursive self-improvement for Claude, where the model is already generating code for the next version. This highlights the importance of automation in AI development.

Recursive self-improvement is an approach that allows models to enhance themselves, potentially leading to significant breakthroughs in performance.

When AI Builds Itself: Our progress toward recursive self-improvement Sakana AI's Recursive Self-Improvement (RSI) Lab Recursive Self-Improvement Delivers New SOTA Coding Performance

#4 SWE-rebench обновили NEW

The SWE-rebench update this week added 110 new tasks and comparisons of coding agents, making the results more relevant for real-world use. This update became crucial for assessing model performance.

SWE-rebench is a tool for assessing the performance of AI models in programming tasks, which now includes more diverse scenarios.

SWE-rebench

#5 ChatGPT Memory NEW

The discussion of the Dreaming mechanism for ChatGPT's memory this week showed how the system can synthesize and update user information. This makes interactions more personalized.

The Dreaming mechanism allows ChatGPT to manage memory more effectively, improving the quality of user interactions.

Dreaming: Better memory for a more helpful ChatGPT ChatGPT warns it may forget long conversations, I save context outside the chat Ask HN: Why do none of the major AI agents persist memory across sessions? ChatGPT Memory: Dreaming

Get daily AI signals in Telegram →