Agent Economy - AI Models

Agent Economy - AI ModelsFoundation models, multimodal systems, reasoning, and product shifts from new model releases.https://agenteconomy.cn/en-usThu, 21 May 2026 00:02:29 GMTOpenAI Model Autonomously Solves 80-Year-Old Geometry Problemhttps://agenteconomy.cn/en/blog/openai-model-disproves-geometry-conjecture/https://agenteconomy.cn/en/blog/openai-model-disproves-geometry-conjecture/An OpenAI reasoning model disproves a central conjecture in discrete geometry that had stood for nearly 80 years — marking the first time an AI system has autonomously solved an open mathematical problem in an active field.Thu, 21 May 2026 00:02:29 GMTδ-mem Brings Efficient Online Memory to Large Language Modelshttps://agenteconomy.cn/en/blog/delta-mem-llm-online-memory/https://agenteconomy.cn/en/blog/delta-mem-llm-online-memory/A new lightweight memory mechanism using only an 8×8 state matrix gives frozen LLMs associative memory through delta-rule learning, boosting agent benchmark performance by up to 31% without full fine-tuning.Sun, 17 May 2026 00:02:47 GMTDon't Expect AI Progress to Sigmoid Anytime Soonhttps://agenteconomy.cn/en/blog/the-sigmoids-wont-save-you/https://agenteconomy.cn/en/blog/the-sigmoids-wont-save-you/Scott Alexander pushes back against the 'all exponentials become sigmoids' argument used to dismiss AI progress concerns, showing how history is littered with premature plateau predictions, and arguing Lindy's Law suggests continued progress for ~7 more years.Sat, 16 May 2026 00:02:58 GMTGoogle Launches Googlebook AI-Native Laptop Linehttps://agenteconomy.cn/en/blog/google-googlebook-ai-laptop/https://agenteconomy.cn/en/blog/google-googlebook-ai-laptop/Google unveils Googlebook, a laptop series designed for Gemini Intelligence with Magic Pointer AI cursor, AI widget generation, and deep Android phone integration, shipping Fall 2026.Fri, 15 May 2026 00:02:50 GMTFields Medalist Tests ChatGPT 5.5 Pro: PhD-Level Math Research in Under Two Hourshttps://agenteconomy.cn/en/blog/chatgpt-5-5-pro-phd-level-research/https://agenteconomy.cn/en/blog/chatgpt-5-5-pro-phd-level-research/Timothy Gowers put ChatGPT 5.5 Pro on open problems in additive number theory. The model produced original, verified mathematical proofs with zero substantive input from Gowers — forcing the math community to rethink PhD training and research attribution.Fri, 15 May 2026 00:02:50 GMTWhen You Delegate to LLMs, Your Documents Get Corruptedhttps://agenteconomy.cn/en/blog/llms-corrupt-documents-when-you-delegate/https://agenteconomy.cn/en/blog/llms-corrupt-documents-when-you-delegate/A new benchmark shows that even frontier models like Gemini 3.1 Pro, Claude 4.6 Opus, and GPT 5.4 corrupt roughly 25% of document content in long delegated workflows, and agentic tool use doesn't help.Fri, 15 May 2026 00:02:50 GMTHow 6% of Users Turn to Claude for Personal Life Guidancehttps://agenteconomy.cn/en/blog/claude-personal-guidance/https://agenteconomy.cn/en/blog/claude-personal-guidance/Anthropic's Privacy-preserving analysis of 1 million conversations reveals the most common domains of AI guidance-seeking—and where sycophancy remains a problem.Fri, 15 May 2026 00:02:50 GMTOpenAI models, Codex, and Managed Agents land on AWShttps://agenteconomy.cn/en/blog/openai-on-aws/https://agenteconomy.cn/en/blog/openai-on-aws/OpenAI and AWS expand their partnership to bring GPT-5.5, Codex, and new Bedrock Managed Agents to AWS customers, giving enterprises a direct path to deploy frontier AI within their existing cloud infrastructure.Fri, 15 May 2026 00:02:50 GMTLLMs make surface quality unreliable in knowledge workhttps://agenteconomy.cn/en/blog/simulacrum-of-knowledge-work/https://agenteconomy.cn/en/blog/simulacrum-of-knowledge-work/One Happy Fellow argues that LLMs break the proxy measures organizations use to judge knowledge work. When spelling, formatting, review rituals, and professional tone can be generated cheaply, teams need better ways to verify whether work is actually true, useful, and decision-grade.Fri, 15 May 2026 00:02:50 GMTDeepSeek V4 preview brings 1M context into open model competitionhttps://agenteconomy.cn/en/blog/deepseek-v4-preview-1m-context/https://agenteconomy.cn/en/blog/deepseek-v4-preview-1m-context/DeepSeek has released and open-sourced the V4 preview, with Pro and Flash variants and 1M context as the default across official services. The release matters less as a benchmark update than as a push to make long-context agent workflows cheaper and more deployable.Fri, 15 May 2026 00:02:50 GMTGoogle deepens its Anthropic bet to own both model access and compute demandhttps://agenteconomy.cn/en/blog/google-anthropic-40-billion-bet/https://agenteconomy.cn/en/blog/google-anthropic-40-billion-bet/Google plans to invest up to $40 billion in Anthropic, with $10 billion up front and the rest tied to performance milestones. The bigger story is how the deal binds equity, cloud distribution, and TPU demand into a single infrastructure value chain.Fri, 15 May 2026 00:02:50 GMTOpenAI launches GPT-5.5 with a bigger leap in autonomous workhttps://agenteconomy.cn/en/blog/openai-gpt-5-5/https://agenteconomy.cn/en/blog/openai-gpt-5-5/OpenAI launches GPT-5.5 with stronger coding and knowledge-work performance while preserving speed, pushing the model closer to an execution layer for autonomous digital work.Fri, 15 May 2026 00:02:50 GMTKelsey Piper Finds Claude Opus 4.7 Can Identify Authors from a Small Sample of Unpublished Texthttps://agenteconomy.cn/en/blog/kelsey-piper-ai-deanonymization-claude-opus/https://agenteconomy.cn/en/blog/kelsey-piper-ai-deanonymization-claude-opus/Journalist Kelsey Piper demonstrates that Claude Opus 4.7 can identify her from as little as 125 words of unpublished text — across political commentary, education reports, movie reviews, and a 15-year-old college essay.Fri, 15 May 2026 00:02:50 GMTIntroducing Claude Opus 4.7 | Anthropichttps://agenteconomy.cn/en/blog/introducing-claude-opus-47-anthropic/https://agenteconomy.cn/en/blog/introducing-claude-opus-47-anthropic/Anthropic introduces Claude Opus 4.7 with enhanced AI capabilities.Fri, 15 May 2026 00:02:50 GMTThe Gemini App is now available on Mac OShttps://agenteconomy.cn/en/blog/the-gemini-app-is-now-available-on-mac-os/https://agenteconomy.cn/en/blog/the-gemini-app-is-now-available-on-mac-os/Google is bringing the Gemini app to macOS as a native desktop experience.Fri, 15 May 2026 00:02:50 GMTMeta Introduces Muse Spark: Scaling Towards Personal Superintelligencehttps://agenteconomy.cn/en/blog/introducing-muse-spark-scaling-towards-personal-su/https://agenteconomy.cn/en/blog/introducing-muse-spark-scaling-towards-personal-su/Meta announces initiative to provide everyone with their own superintelligent assistant, enabling truly personalized AI experiences.Fri, 15 May 2026 00:02:50 GMTQwen3.6-Plus: AI Agent for Real-World Applicationshttps://agenteconomy.cn/en/blog/qwen3-6-plus-real-world-agents/https://agenteconomy.cn/en/blog/qwen3-6-plus-real-world-agents/Alibaba Tongyi Qianwen releases model for real-world agent scenarios, supporting complex task planning, code generation, multimodal understanding, and tool calling.Fri, 15 May 2026 00:02:50 GMTGoogle Releases Gemma 4: The Most Capable Open Models to Datehttps://agenteconomy.cn/en/blog/google-gemma-4-open-models/https://agenteconomy.cn/en/blog/google-gemma-4-open-models/Purpose-built for advanced reasoning and agentic workflows. Four sizes: E2B/E4B/26B-MoE/31B. Apache 2.0 license. #3 on Arena AI leaderboard.Fri, 15 May 2026 00:02:50 GMTARC-AGI-3: The Next-Gen Reasoning Benchmark for Measuring AGIhttps://agenteconomy.cn/en/blog/arc-agi-3-benchmark/https://agenteconomy.cn/en/blog/arc-agi-3-benchmark/Third-generation ARC reasoning benchmark testing AI agents interactive reasoning, measuring the gap between AI and human intelligence.Fri, 15 May 2026 00:02:50 GMTOpenAI Announces Shutting Down Sorahttps://agenteconomy.cn/en/blog/sora-shutting-down/https://agenteconomy.cn/en/blog/sora-shutting-down/OpenAI announces shutting down Sora app, just months after launching the AI video generation tool.Fri, 15 May 2026 00:02:50 GMTIntroducing Forge | Mistral AIhttps://agenteconomy.cn/en/blog/introducing-forge-mistral-ai/https://agenteconomy.cn/en/blog/introducing-forge-mistral-ai/OpenAI releases GPT-5.4, combining recent advances in reasoning, coding, and agentic workflows into a single frontier model. Achieves a new state-of-the-art 83.0% on GDPval benchmark with native computer-use capabilities.Fri, 15 May 2026 00:02:50 GMTGoogle Releases Nano Banana 2: Next-Gen Image Model Combining Pro Capabilities with Lightning Speedhttps://agenteconomy.cn/en/blog/nano-banana-2-google-image-model/https://agenteconomy.cn/en/blog/nano-banana-2-google-image-model/Google DeepMind releases Nano Banana 2, combining Pro features with Flash speed. Supports subject consistency, precise text rendering, 4K resolution, now available across Gemini, Search, Flow and more.Fri, 15 May 2026 00:02:50 GMTOpenAI drops SWE-bench Verified after finding widespread contaminationhttps://agenteconomy.cn/en/blog/openai-drops-swe-bench-verified/https://agenteconomy.cn/en/blog/openai-drops-swe-bench-verified/OpenAI found that SWE-bench Verified suffers from flawed test cases and training data contamination across all major models, and is now recommending SWE-bench Pro instead.Fri, 15 May 2026 00:02:50 GMT