<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>Agent Economy - AI Models</title><description>Foundation models, multimodal systems, reasoning, and product shifts from new model releases.</description><link>https://agenteconomy.cn/</link><language>en-us</language><lastBuildDate>Thu, 21 May 2026 00:02:29 GMT</lastBuildDate><item><title>OpenAI Model Autonomously Solves 80-Year-Old Geometry Problem</title><link>https://agenteconomy.cn/en/blog/openai-model-disproves-geometry-conjecture/</link><guid isPermaLink="true">https://agenteconomy.cn/en/blog/openai-model-disproves-geometry-conjecture/</guid><description>An OpenAI reasoning model disproves a central conjecture in discrete geometry that had stood for nearly 80 years — marking the first time an AI system has autonomously solved an open mathematical problem in an active field.</description><pubDate>Thu, 21 May 2026 00:02:29 GMT</pubDate></item><item><title>δ-mem Brings Efficient Online Memory to Large Language Models</title><link>https://agenteconomy.cn/en/blog/delta-mem-llm-online-memory/</link><guid isPermaLink="true">https://agenteconomy.cn/en/blog/delta-mem-llm-online-memory/</guid><description>A new lightweight memory mechanism using only an 8×8 state matrix gives frozen LLMs associative memory through delta-rule learning, boosting agent benchmark performance by up to 31% without full fine-tuning.</description><pubDate>Sun, 17 May 2026 00:02:47 GMT</pubDate></item><item><title>Don&apos;t Expect AI Progress to Sigmoid Anytime Soon</title><link>https://agenteconomy.cn/en/blog/the-sigmoids-wont-save-you/</link><guid isPermaLink="true">https://agenteconomy.cn/en/blog/the-sigmoids-wont-save-you/</guid><description>Scott Alexander pushes back against the &apos;all exponentials become sigmoids&apos; argument used to dismiss AI progress concerns, showing how history is littered with premature plateau predictions, and arguing Lindy&apos;s Law suggests continued progress for ~7 more years.</description><pubDate>Sat, 16 May 2026 00:02:58 GMT</pubDate></item><item><title>Google Launches Googlebook AI-Native Laptop Line</title><link>https://agenteconomy.cn/en/blog/google-googlebook-ai-laptop/</link><guid isPermaLink="true">https://agenteconomy.cn/en/blog/google-googlebook-ai-laptop/</guid><description>Google unveils Googlebook, a laptop series designed for Gemini Intelligence with Magic Pointer AI cursor, AI widget generation, and deep Android phone integration, shipping Fall 2026.</description><pubDate>Fri, 15 May 2026 00:02:50 GMT</pubDate></item><item><title>Fields Medalist Tests ChatGPT 5.5 Pro: PhD-Level Math Research in Under Two Hours</title><link>https://agenteconomy.cn/en/blog/chatgpt-5-5-pro-phd-level-research/</link><guid isPermaLink="true">https://agenteconomy.cn/en/blog/chatgpt-5-5-pro-phd-level-research/</guid><description>Timothy Gowers put ChatGPT 5.5 Pro on open problems in additive number theory. The model produced original, verified mathematical proofs with zero substantive input from Gowers — forcing the math community to rethink PhD training and research attribution.</description><pubDate>Fri, 15 May 2026 00:02:50 GMT</pubDate></item><item><title>When You Delegate to LLMs, Your Documents Get Corrupted</title><link>https://agenteconomy.cn/en/blog/llms-corrupt-documents-when-you-delegate/</link><guid isPermaLink="true">https://agenteconomy.cn/en/blog/llms-corrupt-documents-when-you-delegate/</guid><description>A new benchmark shows that even frontier models like Gemini 3.1 Pro, Claude 4.6 Opus, and GPT 5.4 corrupt roughly 25% of document content in long delegated workflows, and agentic tool use doesn&apos;t help.</description><pubDate>Fri, 15 May 2026 00:02:50 GMT</pubDate></item><item><title>How 6% of Users Turn to Claude for Personal Life Guidance</title><link>https://agenteconomy.cn/en/blog/claude-personal-guidance/</link><guid isPermaLink="true">https://agenteconomy.cn/en/blog/claude-personal-guidance/</guid><description>Anthropic&apos;s Privacy-preserving analysis of 1 million conversations reveals the most common domains of AI guidance-seeking—and where sycophancy remains a problem.</description><pubDate>Fri, 15 May 2026 00:02:50 GMT</pubDate></item><item><title>OpenAI models, Codex, and Managed Agents land on AWS</title><link>https://agenteconomy.cn/en/blog/openai-on-aws/</link><guid isPermaLink="true">https://agenteconomy.cn/en/blog/openai-on-aws/</guid><description>OpenAI and AWS expand their partnership to bring GPT-5.5, Codex, and new Bedrock Managed Agents to AWS customers, giving enterprises a direct path to deploy frontier AI within their existing cloud infrastructure.</description><pubDate>Fri, 15 May 2026 00:02:50 GMT</pubDate></item><item><title>LLMs make surface quality unreliable in knowledge work</title><link>https://agenteconomy.cn/en/blog/simulacrum-of-knowledge-work/</link><guid isPermaLink="true">https://agenteconomy.cn/en/blog/simulacrum-of-knowledge-work/</guid><description>One Happy Fellow argues that LLMs break the proxy measures organizations use to judge knowledge work. When spelling, formatting, review rituals, and professional tone can be generated cheaply, teams need better ways to verify whether work is actually true, useful, and decision-grade.</description><pubDate>Fri, 15 May 2026 00:02:50 GMT</pubDate></item><item><title>DeepSeek V4 preview brings 1M context into open model competition</title><link>https://agenteconomy.cn/en/blog/deepseek-v4-preview-1m-context/</link><guid isPermaLink="true">https://agenteconomy.cn/en/blog/deepseek-v4-preview-1m-context/</guid><description>DeepSeek has released and open-sourced the V4 preview, with Pro and Flash variants and 1M context as the default across official services. The release matters less as a benchmark update than as a push to make long-context agent workflows cheaper and more deployable.</description><pubDate>Fri, 15 May 2026 00:02:50 GMT</pubDate></item><item><title>Google deepens its Anthropic bet to own both model access and compute demand</title><link>https://agenteconomy.cn/en/blog/google-anthropic-40-billion-bet/</link><guid isPermaLink="true">https://agenteconomy.cn/en/blog/google-anthropic-40-billion-bet/</guid><description>Google plans to invest up to $40 billion in Anthropic, with $10 billion up front and the rest tied to performance milestones. The bigger story is how the deal binds equity, cloud distribution, and TPU demand into a single infrastructure value chain.</description><pubDate>Fri, 15 May 2026 00:02:50 GMT</pubDate></item><item><title>OpenAI launches GPT-5.5 with a bigger leap in autonomous work</title><link>https://agenteconomy.cn/en/blog/openai-gpt-5-5/</link><guid isPermaLink="true">https://agenteconomy.cn/en/blog/openai-gpt-5-5/</guid><description>OpenAI launches GPT-5.5 with stronger coding and knowledge-work performance while preserving speed, pushing the model closer to an execution layer for autonomous digital work.</description><pubDate>Fri, 15 May 2026 00:02:50 GMT</pubDate></item><item><title>Kelsey Piper Finds Claude Opus 4.7 Can Identify Authors from a Small Sample of Unpublished Text</title><link>https://agenteconomy.cn/en/blog/kelsey-piper-ai-deanonymization-claude-opus/</link><guid isPermaLink="true">https://agenteconomy.cn/en/blog/kelsey-piper-ai-deanonymization-claude-opus/</guid><description>Journalist Kelsey Piper demonstrates that Claude Opus 4.7 can identify her from as little as 125 words of unpublished text — across political commentary, education reports, movie reviews, and a 15-year-old college essay.</description><pubDate>Fri, 15 May 2026 00:02:50 GMT</pubDate></item><item><title>Introducing Claude Opus 4.7 | Anthropic</title><link>https://agenteconomy.cn/en/blog/introducing-claude-opus-47-anthropic/</link><guid isPermaLink="true">https://agenteconomy.cn/en/blog/introducing-claude-opus-47-anthropic/</guid><description>Anthropic introduces Claude Opus 4.7 with enhanced AI capabilities.</description><pubDate>Fri, 15 May 2026 00:02:50 GMT</pubDate></item><item><title>The Gemini App is now available on Mac OS</title><link>https://agenteconomy.cn/en/blog/the-gemini-app-is-now-available-on-mac-os/</link><guid isPermaLink="true">https://agenteconomy.cn/en/blog/the-gemini-app-is-now-available-on-mac-os/</guid><description>Google is bringing the Gemini app to macOS as a native desktop experience.</description><pubDate>Fri, 15 May 2026 00:02:50 GMT</pubDate></item><item><title>Meta Introduces Muse Spark: Scaling Towards Personal Superintelligence</title><link>https://agenteconomy.cn/en/blog/introducing-muse-spark-scaling-towards-personal-su/</link><guid isPermaLink="true">https://agenteconomy.cn/en/blog/introducing-muse-spark-scaling-towards-personal-su/</guid><description>Meta announces initiative to provide everyone with their own superintelligent assistant, enabling truly personalized AI experiences.</description><pubDate>Fri, 15 May 2026 00:02:50 GMT</pubDate></item><item><title>Qwen3.6-Plus: AI Agent for Real-World Applications</title><link>https://agenteconomy.cn/en/blog/qwen3-6-plus-real-world-agents/</link><guid isPermaLink="true">https://agenteconomy.cn/en/blog/qwen3-6-plus-real-world-agents/</guid><description>Alibaba Tongyi Qianwen releases model for real-world agent scenarios, supporting complex task planning, code generation, multimodal understanding, and tool calling.</description><pubDate>Fri, 15 May 2026 00:02:50 GMT</pubDate></item><item><title>Google Releases Gemma 4: The Most Capable Open Models to Date</title><link>https://agenteconomy.cn/en/blog/google-gemma-4-open-models/</link><guid isPermaLink="true">https://agenteconomy.cn/en/blog/google-gemma-4-open-models/</guid><description>Purpose-built for advanced reasoning and agentic workflows. Four sizes: E2B/E4B/26B-MoE/31B. Apache 2.0 license. #3 on Arena AI leaderboard.</description><pubDate>Fri, 15 May 2026 00:02:50 GMT</pubDate></item><item><title>ARC-AGI-3: The Next-Gen Reasoning Benchmark for Measuring AGI</title><link>https://agenteconomy.cn/en/blog/arc-agi-3-benchmark/</link><guid isPermaLink="true">https://agenteconomy.cn/en/blog/arc-agi-3-benchmark/</guid><description>Third-generation ARC reasoning benchmark testing AI agents interactive reasoning, measuring the gap between AI and human intelligence.</description><pubDate>Fri, 15 May 2026 00:02:50 GMT</pubDate></item><item><title>OpenAI Announces Shutting Down Sora</title><link>https://agenteconomy.cn/en/blog/sora-shutting-down/</link><guid isPermaLink="true">https://agenteconomy.cn/en/blog/sora-shutting-down/</guid><description>OpenAI announces shutting down Sora app, just months after launching the AI video generation tool.</description><pubDate>Fri, 15 May 2026 00:02:50 GMT</pubDate></item><item><title>Introducing Forge | Mistral AI</title><link>https://agenteconomy.cn/en/blog/introducing-forge-mistral-ai/</link><guid isPermaLink="true">https://agenteconomy.cn/en/blog/introducing-forge-mistral-ai/</guid><description>OpenAI releases GPT-5.4, combining recent advances in reasoning, coding, and agentic workflows into a single frontier model. Achieves a new state-of-the-art 83.0% on GDPval benchmark with native computer-use capabilities.</description><pubDate>Fri, 15 May 2026 00:02:50 GMT</pubDate></item><item><title>Google Releases Nano Banana 2: Next-Gen Image Model Combining Pro Capabilities with Lightning Speed</title><link>https://agenteconomy.cn/en/blog/nano-banana-2-google-image-model/</link><guid isPermaLink="true">https://agenteconomy.cn/en/blog/nano-banana-2-google-image-model/</guid><description>Google DeepMind releases Nano Banana 2, combining Pro features with Flash speed. Supports subject consistency, precise text rendering, 4K resolution, now available across Gemini, Search, Flow and more.</description><pubDate>Fri, 15 May 2026 00:02:50 GMT</pubDate></item><item><title>OpenAI drops SWE-bench Verified after finding widespread contamination</title><link>https://agenteconomy.cn/en/blog/openai-drops-swe-bench-verified/</link><guid isPermaLink="true">https://agenteconomy.cn/en/blog/openai-drops-swe-bench-verified/</guid><description>OpenAI found that SWE-bench Verified suffers from flawed test cases and training data contamination across all major models, and is now recommending SWE-bench Pro instead.</description><pubDate>Fri, 15 May 2026 00:02:50 GMT</pubDate></item></channel></rss>