Skip to content

【译】个人 AI 智能体 (Personal Agents) 的时代即将开启(A Wave of Personal Agents is Coming)【原文

Personal-Intelligent-Agents-Featured-Image

Generative AI, particularly ChatGPT, has inspired an entire industry and cultural conversation through the almost magical experiences of generating images, text, and video with basic natural language prompts. This generative technology has produced some inspiring, even moving, early results, and while we’re constantly impressed with the applications that sit on top of these foundation models, or as they’re also called, large language models (LLMs), Madrona continues to advance our perspective on the rapidly changing generative app ecosystem and the game theory around which types of companies are more likely to win at various layers of the generative AI stack. To that end, we’ve been thinking about and talking to founders who are transferring the intelligence of these foundation models into autonomous personal agents that can sit inside applications and take the actions required to create personalized experiences and solve some of the biggest challenges in consumer-facing applications today.

生成式 AI (Generative AI),尤其是 ChatGPT,通过简单的自然语言指令就能生成图像、文本和视频的神奇能力,引发了整个行业的革新和广泛的社会讨论。这项生成技术已经展现出令人振奋、甚至感动人心的早期成果。虽然这些建立在基础模型或称大语言模型 (LLMs) 之上的应用不断给我们带来惊喜,但 Madrona 仍在持续深入研究这个快速发展的生成应用生态系统,以及分析不同类型的公司在生成式 AI 技术层面取得成功的可能性。为此,我们一直在与创业者们探讨如何将这些基础模型的智能转化为 AI 智能体,让它们能够在应用程序中自主运行,采取必要的行动来创造个性化体验,并解决当今面向消费者的应用中最具挑战性的问题。

One of ChatGPT’s early strengths was how quickly and extensively it trained people to use natural language to interact with computers and for people to expect computers to understand and respond to questions and commands in a helpful manner. Autonomous intelligent agents are going to take that one step further. They are systems designed to perform specific tasks with little to no human intervention. For example, say someone is shopping for a dress to wear at a wedding in Tahoe in August. With that prompt, an agent would make suggestions and curate options based on the users’ preferences (brand, style) and constraints (inventory available, price, size). Once the shopper makes a selection, the agent would be able to complete the purchase and monitor shipping rather than redirect the shopper to a specific shopping site to complete the purchase themselves.

ChatGPT 最初的一个重要优势,是它让人们快速适应并广泛接受了用自然语言与计算机交互的方式,同时也让人们期待计算机能够理解并有效回应各种问题和指令。而 AI 智能体将把这种交互推向新的高度。它们是专门设计用来完成特定任务的系统,几乎不需要人工干预。举个例子,假设有人想要购买一件在八月份太浩湖婚礼上穿的礼服。有了这个需求,AI 智能体就会根据用户的偏好(比如品牌、风格)和各种限制(如现有库存、价格、尺码)来提供建议和筛选选项。当购物者选定心仪的礼服后,AI 智能体可以直接完成购买并追踪物流,用户不必再被转到特定的购物网站自行完成购买流程。

This is far from easy; foundation models are challenging to work with in many ways. Developers face issues with model memory, learning, and hallucinations, but the founders who overcome these issues to connect the intelligence in the models with systems of action will be ahead of the curve in giving the world what it wants. As Bill Gates said at a recent AI event, “Whoever wins the personal agent, that’s the big thing, because you will never go to a search site again, you will never go to a productivity site, you’ll never go to Amazon again.”

然而,这项技术的实现并不简单,因为基础模型本身就存在诸多挑战。开发者需要解决模型记忆、学习能力以及模型幻觉 (hallucination) 等问题。但是,那些能够克服这些困难,成功将模型智能与实际操作系统结合起来的创始人,必将在满足市场需求方面占据先机。正如 Bill Gates 在最近的一次 AI 活动中所说:"谁能在个人智能体领域胜出,谁就能赢得未来,因为届时人们将不再需要访问搜索网站、生产力工具网站,甚至连 Amazon 都不用再访问了。"

生成式 AI 与 AI 智能体应用全景图(The Generative AI and Agent Application Landscape)

Everyone is used to consulting experts in the physical world. We work with travel agents to book the perfect trip. We ask hotel concierges to book a table for 5 for dinner at a restaurant they recommend. But digitally accomplishing these kinds of asks is nearly impossible due to clunky non-user-friendly interfaces. Most people do not have the patience for it, especially if they have to do it repeatedly, either within an app or across multiple siloed apps. When users express their preferences, the products and services recommended are often low quality. This is either due to a lack of metadata that would enable effective preference matching, intentional speed bumps, such as ad-based business models, or the limitations of the business partnerships that dictate results. The best places to find high-quality recommendations that match user preferences, such as social networks and community-driven content platforms, are often disconnected from the booking and purchasing systems required to complete the transaction.

在物理世界中,每个人都习惯于向专家寻求帮助。我们会找旅行顾问来规划完美的旅程,也会请酒店礼宾帮我们在他们推荐的餐厅订一桌五人的位置。然而,在数字世界中,由于界面复杂且不够人性化,要完成这些看似简单的请求却异常困难。大多数人都没有耐心去操作,尤其是当需要在单个应用内或多个独立应用之间反复进行类似操作时。当用户表达自己的偏好时,往往得到质量欠佳的产品和服务推荐。这可能是因为缺乏支持有效偏好匹配的元数据 (metadata),或是存在一些人为设置的障碍(比如基于广告的商业模式),又或是受限于商业合作关系。而那些最能提供优质、符合用户偏好推荐的平台,如社交网络和社区驱动的内容平台,通常又与实际的预订和购买系统完全割裂。

Personal agents that connect systems of intelligence with systems of action are still in the early stages, but we see them as conversational applications that provide value to end-users through improved information retrieval, discovery, and action. What we’re seeing now are mostly chat assistants like Diem, HeyPi, and Character AI that still don’t work in the ability to act autonomously.

连接智能系统和行动系统的个人 AI 智能体虽然仍处于起步阶段,但我们认为它们是一种通过改进信息检索、发现和行动来为用户创造价值的对话式应用。目前我们看到的主要是像 Diem、HeyPi 和 Character AI 这样的聊天助手,但它们仍然缺乏自主行动的能力。

Over the past year, we have seen a proliferation of content-focused generative-native applications with the likes of Runway, Jasper, Midjourney, and many others. With these, the user provides a prompt, and the application provides options.

在过去的一年里,我们见证了以内容创作为核心的生成式原生应用 (generative-native applications) 的蓬勃发展,比如 Runway、Jasper、Midjourney 等。在这些应用中,用户输入提示词 (prompt),应用程序就会提供相应的选项。

我们认为自主智能体 (autonomous intelligent agents) 有望打通线下实体和线上数字购物体验之间的鸿沟,从而创造全新的、高效的在线购物方式。(We think autonomous intelligent agents have the potential to bridge the gap between physical and digital consumer experiences and can lead to new, efficient online shopping experiences.)

Some technology companies have implemented copilot AI assistants, or as we call them, generative-enhanced applications, which get closer to the agent functionality. But still, they do not autonomously take action. For example, Expedia implemented an AI chat search for trip discovery within their mobile app. And companies like Kayak, Instacart, and Klarna have plugins within ChatGPT. But to complete a purchase through Kayak, the user is presented with the relevant link within their conversation window to go to Kayak and book the trip.

一些科技公司已经推出了 Copilot AI 助手,也就是我们所说的生成式增强应用 (generative-enhanced applications),这些应用更接近于 AI 智能体的功能。不过,它们仍然不能自主采取行动。比如,Expedia 在其移动应用中加入了 AI 聊天式旅行搜索功能。而 Kayak、Instacart 和 Klarna 等公司则开发了 ChatGPT 插件。但是,如果用户想通过 Kayak 完成购买,他们只能在对话窗口中点击相关链接跳转到 Kayak 网站来预订行程。

Where we anticipate seeing the next wave of growth is in new companies building native personal agents and chat assistants based on foundation models and emerging agent technologies. New companies have the advantage of starting from scratch to build native applications — they can test and implement entirely new user experiences. Some teams we have met with are starting with a domain-specific approach, while others are taking a horizontal approach to creating personal agents. Entrepreneurs that build applications within specific verticals will be able to take advantage of training their models with relevant data that may be missing from a horizontal approach like ChatGPT (e.g., for apparel shopping, the app will need relevant data about inventory, fit, and size). The above map of companies is not exhaustive, but as it stands, we see opportunities in several areas of the consumer landscape, including apparel shopping, furniture/interior shopping, travel itineraries, restaurant reservations, real estate, food delivery/grocery ordering, and personal productivity.

我们预计,下一波增长将来自于那些基于基础模型 (foundation models) 和新兴智能体技术打造原生个人智能体和聊天助手的新兴公司。这些新公司的优势在于可以从零开始构建原生应用 —— 他们能够测试和实施全新的用户体验。我们接触过的一些团队正在专注于特定领域,而另一些则采用跨领域的通用方法来开发个人智能体。在特定垂直领域构建应用的创业者们可以利用相关数据来训练他们的模型,而这些数据在 ChatGPT 这样的通用平台中可能并不具备 (例如,服装购物应用就需要掌握库存、版型和尺码等相关数据)。上述提到的公司分布并不完整,但目前我们在消费领域看到了多个充满机会的方向,包括服装购物、家具/室内装饰购物、旅行行程规划、餐厅预订、房地产、外卖/杂货订购以及个人生产力等领域。

创业机遇分析(Founder Opportunities)

To achieve this vision and large-scale adoption of conversational applications, founders must overcome memory, learning, and hallucination challenges within foundation models. Autonomous intelligent agents do not always learn from mistakes, prompts, or prior attempts. Improving their memory and learning capabilities will help to provide a more accurate user experience. We think building domain-specific applications will help founders manage mistakes or limitations of agents today. Agents can also sometimes become stuck in a loop, repeatedly attempting the same task or hallucinating the next step. Although agents break tasks into subtasks, they may still get stuck on the sub-tasks they create. Building human-in-the-loop applications will help prevent and mitigate hallucinations. As we continue to look for the next generation of applications that leverage systems of intelligence with systems of action, we think founders who overcome the challenges and optimize for these seven components will be the winners.

为了实现这一愿景和对话式应用的大规模采用,创始人必须克服基础模型 (foundation models) 中的记忆、学习和幻觉 (hallucination) 等挑战。自主智能体 (autonomous intelligent agents) 并不总是能从错误、提示或先前的尝试中学习。提高它们的记忆和学习能力将有助于提供更准确的用户体验。我们认为构建特定领域的应用将帮助创始人应对当今智能体的错误或局限性。智能体有时也会进入重复循环状态,不断尝试相同的任务或对下一步产生错误认知。尽管智能体会将任务分解为子任务,但它们可能仍会在所创建的子任务上遇到障碍。构建人机协作 (human-in-the-loop) 的应用将有助于预防和减轻这种幻觉现象。随着我们继续寻找下一代将智能系统 (systems of intelligence) 与行动系统 (systems of action) 相结合的应用,我们认为能够克服这些挑战并优化这七个组件的创始人将成为最终的赢家。

  • Model memory: The model needs to learn from user questions, behavior, and preferences. And continually become more personalized based on user interactions with the model.
  • Data: The application should connect to a data source that the application can access for tasks. Applications with access to a unique dataset or proprietary data may have an advantage. Proprietary data that is not portable can be used to train models for superior performance, creating a sticky user experience.
  • Integrations: After the system receives information from the user, it needs to be able to integrate with external systems and execute actions.
  • Compute: Compute costs associated with foundation models can be high. Teams will need to have a strategy to minimize compute resources and set their application up for longevity.
  • Security and authorization: Models could be misused or harmful. The team should have a strategy for safety controls to prevent abuse or bad actions within their application.
  • UX & UI: The app should have a compelling user experience that is intuitive to the broader population. We are in the realm of new user experiences, so revolutionary interfaces could attract and maintain more users, creating a flywheel around usage and data. Are there interfaces that could be unique to an application and hard for the incumbents to develop? And could that interface create a 10x better user experience? The user experience design will be important for teaching people how to use agents and understanding how their data is used to create the app experience.
  • Distribution: The team should have a strategy for the application to reach target users and continually improve its dataset. Incumbent and startup applications, such as Milo, are now available as ChatGPT plugins. We are closely watching to see if ChatGPT could become a horizontal platform for verticalized applications to reach users, like the iOS App Store.

  • 模型记忆:模型需要通过学习用户的问题、行为和偏好来不断进化。随着用户与模型的持续互动,模型会逐渐提供更加个性化的服务。
  • 数据:应用程序需要能够连接并访问特定的数据源来完成任务。如果一个应用能够获取独特的数据集或专有数据,就可能具有竞争优势。那些不可轻易转移的专有数据可用于训练模型,提升性能,从而提高用户的使用黏度。
  • 集成:当系统接收到用户的信息后,需要能够与外部系统进行集成并执行相应的操作。
  • 计算:与基础模型 (Foundation Models) 相关的计算成本可能会很高。团队需要制定策略来优化计算资源的使用,确保应用程序能够长期可持续运营。
  • 安全和授权:模型可能被滥用或产生有害影响。团队应该建立安全控制机制,防止应用程序被滥用或产生不良行为。
  • 用户体验和界面:应用程序应该为广大用户提供引人入胜且直观的使用体验。在这个新型用户体验的时代,创新的界面设计可能会吸引并留住更多用户,形成用户使用量增加带来更多数据、更多数据促进产品改进的良性循环。我们需要思考:是否存在某些独特的界面设计是现有企业难以模仿的?这些界面是否能带来十倍于现有解决方案的更好体验?用户体验设计对于教会用户如何使用 AI 智能体 (AI Agent) 以及理解自己的数据将如何被利用来优化应用体验至关重要。
  • 分发:团队需要制定策略来触达目标用户并持续改进数据集。目前,无论是老牌还是初创公司的应用(如 Milo)都可以作为 ChatGPT 的插件使用。我们正在密切关注 ChatGPT 是否能够像 iOS App Store 一样,成为一个通用平台,让各个细分领域的应用都能通过它来接触用户。

我们相信,将智能系统与行动系统相连接,以创造个性化体验并解决当今数字消费者体验中最令人困扰的差距,将创造巨大的价值。(We believe connecting systems of intelligence with systems of action to create personalized experiences and solve the most frustrating gaps in the digital consumer experience today will create immense value.)

未来的应用 - 个人 AI 助手(Applications of The Future – Personal Agents)

We think autonomous intelligent agents have the potential to bridge the gap between physical and digital consumer experiences and can lead to new, efficient online shopping experiences, such as shopping carts that automatically populate with our desired purchases replacing e-commerce catalogs, and no longer needing to endlessly scroll through hotel, restaurant, and other online options. Ultimately, we believe connecting foundation models to systems of action to create personalized experiences and solve the most frustrating gaps in the digital consumer experience today will create immense value.

我们认为自动化的 AI 智能体 (autonomous intelligent agents) 能够让线上线下的消费体验无缝衔接,带来全新的高效网购体验。例如,购物车可以根据我们的需求自动添加商品,取代传统的电商商品目录;我们也不用再在酒店、餐厅等网站上无休止地翻页浏览。我们相信,通过将基础模型与实际应用系统相结合,不仅能打造个性化体验,还能解决当今数字消费过程中的诸多痛点,从而创造巨大的价值。

At Madrona, we have invested in AI technology companies for over a decade, including RunwayML, OthersideAI, Fixie, Deepgram, OctoML, Turi, Visual Layer, and more. We are interested in hearing from entrepreneurs who see the potential for generative AI to unlock personalization and address consumers’ most pressing needs. Please contact [email protected] or [email protected] if this sounds like you. We look forward to hearing from you!

作为 Madrona 的一员,我们在过去十多年里投资了众多 AI 技术公司,包括 RunwayML、OthersideAI、Fixie、Deepgram、OctoML、Turi、Visual Layer 等。我们希望能听到更多创业者的想法,特别是那些看到生成式 AI (Generative AI) 在个性化服务和解决用户核心需求方面潜力的创业者。如果你对此感兴趣,欢迎通过 [email protected][email protected] 与我们联系。期待与你交流!