Appearance
【译】深入解析现代边缘计算函数的技术架构与代码实现(Exploring the infrastructure and code behind modern edge functions)【原文】
At this point, the internet has a fully global reach. If you create a successful web or mobile app, you could have users on every continent (maybe not Antarctica). When they load up your app, they want it to be fast and relevant to them. That’s where edge computing comes in — it runs code and serves data from servers (points-of-presence) as close as possible to the client.
如今,互联网已经实现了全球化覆盖。如果你开发了一个成功的网络或移动应用,你可能会在世界各大洲都拥有用户 (也许南极洲除外)。当用户打开你的应用时,他们都期望获得快速的响应和与己相关的内容。这就是边缘计算 (Edge Computing) 的用武之地 —— 它通过部署在离用户最近的节点 (POP,服务接入点) 来运行代码和提供数据服务。
Companies like Vercel, Netlify, and Supabase have taken that a step further and created edge functions. These are bits of code that, when you deploy your site to these providers, get syndicated around the world to be executed as close and as fast as possible for local users who hit the site or app. It allows not just maximizing web performance for users worldwide, but also other just-in-time modifications that customize the web app for the local viewer.
Vercel、Netlify 和 Supabase 等公司在这一基础上更进一步,推出了边缘函数 (Edge Functions) 服务。当你将网站部署到这些平台时,这些代码片段会被分发到全球各地,为访问网站或应用的本地用户提供最近距离、最快速度的执行环境。这不仅能为全球用户优化网络性能,还能实现实时定制,为不同地区的用户提供本地化的网络应用体验。
It can make the world feel like your data center, but it’s an extension of content delivery networks: instead of serving heavy assets like images or video, they execute code. “There's these other traditional network companies that help connect the world's data transmission,” said Dana Lawson, Senior Vice President of Engineering at Netlify, “but there's this new abstraction of that where you have the ability to execute code.”
这项技术让全球各地都成为你的虚拟数据中心,它是内容分发网络 (CDN) 的一种延伸:不同的是,它执行的是代码,而不是分发图片或视频等大型文件。正如 Netlify 工程高级副总裁 Dana Lawson 所说:"传统的网络公司致力于连接全球数据传输,而现在我们有了新的技术抽象层,使得我们能够在全球范围内执行代码。"
This article will talk about that abstraction layer and the hardware it runs on, as well dive into the use cases for code that runs as local as possible for your users. For information on how it all works, I spoke with Malte Ubl, CTO at Vercel, Chris Daley, Principal Software Engineer at Akamai Technologies, and Lawson. The folks at Deno also gave me a brief statement and a link to a blog post that covered the same ground as this article.
本文将详细探讨这一技术抽象层及其底层硬件架构,同时深入分析为用户提供本地化代码执行的应用场景。为了深入了解其运作机制,我采访了 Vercel 的首席技术官 Malte Ubl、Akamai Technologies 的首席软件工程师 Chris Daley 以及 Lawson。Deno 团队也提供了一份简短说明和一篇相关博客文章,内容与本文主题不谋而合。
依托科技巨擘的基础设施(Building on the shoulders of tech giants)
When I was initially looking into this, I was interested in the infrastructure behind edge functions. To be able to call a function and have it execute wherever in the world the user is feels like a bit of magic. And all computing magic in the end is supported by silicon physically located in the world. But it turns out that the silicon that these edge functions run on don’t belong to the companies that run them.
当我最初研究这个主题时,我对边缘函数 (Edge Functions) 背后的基础设施产生了浓厚的兴趣。能够调用一个函数,并让它在世界上任何一个用户所在的位置执行,这简直就像魔法一样神奇。当然,所有的计算"魔法"最终都要依赖于现实世界中实实在在的硅片。但有趣的是,运行这些边缘函数的硅片实际上并不属于那些提供服务的公司。
As mentioned in the intro, CDNs have been around for a while. Now with cloud companies covering the world with cheap compute, building server racks in every time zone seems redundant, especially when someone else has already handled the hard of deploying physical infrastructure. “We're always thinking about scalability and climate change and how we serve the world and be good citizens,” said Lawson. “If you're trying to do it yourself, you're gonna miss out on some of those important details. You're gonna spend a lot of time, energy, and effort on stuff that's already been done—innovate. That's why you piggyback on these behemoths that have already done that hard work”
正如在引言中提到的,CDN (内容分发网络, Content Delivery Network) 已经存在很长时间了。如今,随着各大云服务公司用低成本的计算资源覆盖全球,在每个时区都建立自己的服务器机架似乎显得多余,尤其是当其他公司已经解决了部署物理基础设施这一棘手问题时。正如 Lawson 所说:"我们始终在考虑可扩展性、气候变化的问题,以及如何更好地服务全球用户,同时做一个负责任的企业公民。如果想要独自完成这些,你可能会忽略很多重要细节,还会在一些已经解决的问题上耗费大量时间和精力,而不是专注于创新。这就是为什么我们选择依托这些已经完成艰苦工作的科技巨头。"
Netlify and Supabase both run their edge functions on Deno Deploy as an extra abstraction layer (Supabase has even open-sourced their edge runtime if you want to give it a go yourself). According to Deno creator Ryan Dahl, Deno “runs on the public clouds (GCP and AWS) but otherwise is built from scratch. The system is meant to be as user friendly as possible, so users shouldn't need to think about regions when writing their code.” Vercel runs on Cloudflare’s edge worker infrastructure.
Netlify 和 Supabase 都选择在 Deno Deploy 平台上运行他们的边缘函数,将其作为一个额外的抽象层 (Supabase 甚至开源了他们的边缘运行时环境,如果你感兴趣的话可以自己尝试)。根据 Deno 的创始人 Ryan Dahl 的说法,Deno "运行在公共云服务平台 (Google Cloud Platform 和 Amazon Web Services) 上,但其他部分都是从零开始构建的。这个系统的设计理念是尽可能对用户友好,因此开发者在编写代码时无需考虑地理区域的问题。"而 Vercel 则选择在 Cloudflare 的边缘计算基础设施上运行。
Most IP lookups use the unicast routing scheme. DNS resolves a URL to an IP address, which takes you to a particular server. However, Deno Deploy and Cloudflare both use anycast, in which an IP address maps to a pool of computers. The network (at least in a WAN, aka the internet) then resolves the address to whichever computer is closest.
大多数 IP 查找使用单播 (unicast) 路由方案。DNS 将 URL 解析为 IP 地址,这会将你引导到特定的服务器。然而,Deno Deploy 和 Cloudflare 都使用任播 (anycast),在这种方式下,一个 IP 地址会映射到一组计算机。然后网络 (至少在广域网 WAN,也就是互联网中) 会将地址解析到距离最近的计算机。
While Daley says Akamai uses unicast for most routing, they do offer anycasting for edge DNS resolution. More importantly, they have a bit of mathematical magic that speeds traffic through the internal network to the fastest server. That magic is an extension of the algorithms that brought the company to prominence over 25 years ago.
Daley 提到,虽然 Akamai 在大多数路由中使用单播,但他们也为边缘 DNS 解析提供任播服务。更重要的是,他们开发了一套独特的数学算法,可以优化内部网络流量,将请求引导到响应最快的服务器。这套算法是对该公司 25 年前奠定其行业地位的原始算法的扩展。
In general, when a client requests something from an edge worker, whether through an edge function or in a deployed code bundle, it hits a thin reverse proxy server. That proxy routes it to a server close (close in this case means fastest for that location) to the client and executes the requested function. That server where the code actually executes is known as the origin. There it can provide typical server-side functions: pull data from databases, fill in dynamic information, and render portions as static HTML to avoid taxing the client with heavy JavaScript loads. “Turn the thing that worked on your local machine and wrap it such that when we deploy it to the infrastructure that it behaves exactly the same way,” said Ubl. “That makes our edge functions product amore abstract notion because you don't use them directly. We call this framework-defined infrastructure”
通常,当客户端通过边缘函数 (edge function) 或部署的代码包向边缘工作器 (edge worker) 发起请求时,请求首先会到达一个轻量级的反向代理服务器。该代理会将请求路由到距离客户端最近的服务器 (这里的"最近"是指响应最快的),并在那里执行请求的函数。这个实际执行代码的服务器被称为源站 (origin)。在源站上,系统可以执行典型的服务器端功能:从数据库获取数据,填充动态信息,并将部分内容渲染为静态 HTML,从而避免客户端承受过重的 JavaScript 负载。正如 Ubl 所说:"我们的目标是确保本地运行的程序在部署到基础设施后能保持完全相同的行为。这使得我们的边缘函数产品变得更加抽象,因为用户不需要直接使用它们。我们称这种方式为框架定义的基础设施 (framework-defined infrastructure)。"
How you use it depends on the provider. Netlify seems to be pretty straight-forward: deploy the function, then call it the same as you would any other server code. It does provide a number of server events on which to hang functions. Vercel offers the standard server-side version as well as a middleware option that executes before a request is processed. Akamai, as a provider of an underlying edge worker network, offers a number of events along the request path in which to execute code:
不同的服务提供商有着不同的边缘函数使用方式。Netlify 的方案相对简单:部署函数后,你可以像调用普通服务器代码一样使用它。它还提供了多个服务器事件来触发这些函数。Vercel 则提供了标准的服务器端版本,以及可以在请求处理前执行的中间件选项。作为底层边缘计算网络的提供商,Akamai 在请求处理过程中提供了多个可执行代码的时机:
- When the client first requests an edge worker 客户端首次请求边缘服务时 (
onClientRequest
) - When the request first reaches the origin server 请求首次到达源服务器时 (
onOriginRequest
) - When the origin responds after running the code bundle 源服务器执行代码包并作出响应时 (
onOriginResponse
) - Right before the response payload reaches the client 响应内容即将发送给客户端前 (
onClientResponse
)
This allows apps to do some complex trickery on the backend. “We allow you to do something like go to a different origin or rewrite the path that I'm going to talk to the origin,” said Daley. “You might say no, I don't actually want that website, I want you to serve something else completely instead. You can remove headers from it. You could add new headers. You could look at the headers that are there, manipulate them, and send that back. Then right before you go to OnClientResponse
again, you could do some more edits. When you think about what we call a code bundle, there's a good chance it's not all running on the same machine.”
这使得应用程序能够在后端实现一些复杂的处理逻辑。Daley 解释道:"我们允许开发者执行各种操作,比如重定向到不同的源服务器,或者重写与源服务器通信的路径。开发者可能会说'我不想使用原来的网站,而是要提供完全不同的内容'。你可以删除或添加 HTTP 头部信息,也可以查看并修改现有的头部信息再发送回去。在最后的 OnClientResponse 阶段,你还可以进行更多修改。说到我们所说的代码包,它们很可能是分布在不同机器上运行的。"
Regardless of whether the edge function performs a simple retrieval or a series of complex steps, it’s all about maximizing performance. Each extra second a site takes to load can cost a business money. “It's time to first byte,” said Lawson. With some of these applications, they're completely being manifested on the served assets and origins—whole websites are being created right there on the edge.”
无论边缘函数是执行简单的数据获取还是复杂的处理流程,其核心目标都是为了提升性能。网站加载时间每多一秒,企业就可能损失收益。正如 Lawson 所说:"关键在于首字节时间 (Time to First Byte)。在某些应用中,整个网站都是在边缘节点上即时生成的,完全依赖于边缘服务的资源和源站。"
As anyone working on high-traffic websites knows, there’s one thing that can greatly speed up your time to first byte.
对于那些维护高流量网站的开发者来说,他们都清楚有一个关键因素可以显著提升首字节响应速度。
缓存:系统性能的决定因素(Cache rules everything around me)
One of the ironies of edge functions is that the abstraction layers built on top of these global server networks slow things down. “We're adding a little bit of latencies, right?” said Lawson. “We have our traditional content delivery network. We have proxies that are taking those small little requests, shipping them over to these run times.” Each of these stops adds a little time, as does the code execution on the origin.
边缘函数有一个颇具讽刺意味的现象:为全球服务器网络构建的抽象层反而降低了系统速度。正如 Lawson 所说:"确实,我们增加了一些延迟。我们有传统的内容分发网络,还有代理服务器负责接收这些小型请求并将它们转发到运行时环境。"每一个环节都会增加一点延迟,包括在源站执行代码的时间。
How do edge function networks minimize this latency so that getting the request to the edge doesn’t cancel out the gains made by executing it there? “The fair answer is many layers of caching,” said Ubl. “And lots of Redis. There's there's three primary layers involved. One does TLS termination and IP layer firewall that looks agnostically at traffic and tries to filter out the bad stuff without paying the price of knowing really what's going on underneath. Going one layer down is the layer that has the biggest footprint. That one understands who the customers are, what their deployments are, and so forth. That's driven by substantial caching.”
那么,边缘函数网络如何降低这种延迟,确保请求到达边缘节点的成本不会抵消掉在边缘执行带来的收益呢?Ubl 给出了一个中肯的答案:"多层缓存机制,再加上大量的 Redis。主要包含三个层次。第一层负责 TLS 终止和 IP 层防火墙,它会以通用方式检查流量并过滤恶意内容,而无需深入了解底层细节。第二层是规模最大的,它掌握着客户信息、部署状态等内容,主要依靠大规模缓存来运作。"
“It's so fast and it's just amazing how quickly we're transmitting. It's almost like a no op.” Dana Lawson
正如 Dana Lawson 所说:"系统运行速度快得惊人,数据传输几乎感觉不到延迟。"
This makes getting from the client to the origin server extremely fast. “There is some overhead right between when you get the request and then you have to now deal with the JavaScript instead of hard coded things,” said Daley. “but it's zero copy shared memory. There is overhead, but it's extremely, extremely low to go in there—I think it's less than microseconds. The bigger overhead is usually whatever problem they're trying to solve.”
这种架构使得从客户端到源服务器的访问变得极其迅速。Daley 解释道:"当然,从处理请求到需要处理 JavaScript (而不是硬编码内容) 时会有一些开销,但我们使用了零拷贝共享内存 (zero-copy shared memory) 技术。虽然确实存在开销,但这个过程的延迟极低,甚至不到微秒级别。真正的性能瓶颈通常在于用户要解决的具体问题。"
That’s the final layer: the origin server, where the code gets executed. That code, depending on what it is, is probably going to be the biggest source of latency overhead. But caching can help mitigate that as well. “If we've seen your stuff before, you're in memory as best we can within memory limits,” said Daley. “That overhead will be fairly low depending on how you structured your code — we have some best practices about things to avoid.”
最后一层是负责执行代码的源服务器。根据具体业务逻辑的不同,这里可能会产生最大的延迟开销。不过,缓存机制同样可以帮助缓解这个问题。如 Daley 所说:"在内存容量允许的范围内,我们会将之前处理过的内容保留在内存中。这样的开销会相当低,具体取决于代码的组织方式 —— 我们也提供了一些最佳实践指南,说明了哪些做法需要避免。"
Once a client has completed their first request, the origin server has the response for that request cached. There’s a cost to replicating that cached response to other servers in the edge network, so maintaining a link between that server and the client can shave precious millisecond off of requests. “Our edge function invocation service primarily acts as a load balancer,” said Ubl. “We load balance as ourselves and see a worker that can take a little bit more traffic and then multiplex another request on the same connection. It's basically just HTTP Keep-Alive
. That's really fast.”
当客户端完成第一次请求后,服务器就会保存这个请求的响应内容。虽然把这些缓存的内容同步到边缘网络的其他服务器会消耗资源,但保持服务器和客户端之间的连接可以让后续请求节省宝贵的毫秒级时间。正如 Ubl 所说:"我们的边缘函数调用服务主要扮演着流量分配器的角色。我们通过自主调度,当发现某个工作节点还能承担更多流量时,就会在同一个连接上处理新的请求。这其实就是 HTTP Keep-Alive 机制,效率非常高。"
Another spot where the backend can slow down is in accessing your databases. Most edge function and edge worker providers also have fast “serverless” key-value store databases that you can add (or you can use other serverless database providers). But if you have a DB-heavy workload, you can use the routing and caching features of the network to speed things up . “From a latency perspective, once you talk to your database twice, it's always cheaper to cache data,” said Ubl. “It comes with the trade offs of caching — you have to invalidate things.The other thing that users can opt into is to invoke the code next to their database.”
后端系统变慢的另一个常见原因是数据库访问。大多数边缘函数和边缘工作节点服务商都提供快速的"无服务器" (serverless) 键值存储数据库供选择(你也可以使用其他无服务器数据库服务商)。如果你的应用需要频繁访问数据库,可以利用网络的路由和缓存功能来提升速度。Ubl 表示:"从延迟角度来看,如果需要多次访问数据库,使用缓存总是更有效率的。当然,使用缓存也有其利弊 —— 你需要考虑何时更新缓存。另外,用户还可以选择在靠近数据库的位置运行代码。"
Caching can cause queuing issues for sites in less common languages, especially in functions and code bundles with multiple requests. “We changed how we were doing queuing at one point,” said Daley, “because when a subrequest goes off and it's cacheable, it's going to look to execute on the individual machine. Certain machines tend to be busier with certain customers, so their content is going to be on those machines often. If you have a lot of those stacking up, and you're waiting on all these sub requests to finish, requests can fail when they hit resource limits. Most of the time, it takes ten milliseconds to run. We did a lot of work dealing with the outliers. I think it was like 900% improvement in people not hitting a resource limit.”
对于使用小众编程语言的网站来说,缓存可能会导致请求排队的问题,尤其是在处理包含多个请求的函数和代码包时。Daley 解释说:"我们曾经改变了请求排队的方式。当系统发出一个可以缓存的附属请求时,它会在单台服务器上执行。某些服务器可能因为特定客户的需求而特别繁忙,这些客户的内容经常存储在这些服务器上。如果大量这样的请求堆积起来,而系统又在等待所有附属请求完成,当达到系统资源上限时,请求就可能失败。虽然大多数请求只需要十毫秒就能处理完,但我们还是投入了大量精力来解决这些极端情况。这使得用户遇到系统资源不足的情况大大减少,降幅接近 900%。"
These systems are built for speed and repeatability — a CDN for code, essentially — so not every use case is a good fit. But those that are can see big gains.
这些系统的设计初衷是追求速度和一致性 —— 本质上就是一个用于代码的内容分发网络 (CDN) —— 并不是所有场景都适合使用。不过,对于适合的应用场景来说,性能提升确实相当可观。
动态定制网站架构(Custom websites on the fly)
Not all applications will benefit from functions that run on the edge. Of those that do, not all of their code will need to be executed on the edge. The functions that benefit will be I/O-bound, not CPU-bound. They’ll still use CPUs, obviously, but they provide some logic around moving more static assets or calling APIs and transforming the returned data. Said Daley, “It's not general purpose compute as much as is shaping traffic.”
并非所有应用都能从边缘计算函数 (Edge Functions) 中获益。即使是那些适合使用边缘计算的应用,也不是所有代码都需要在边缘节点执行。真正受益的是那些 I/O 密集型函数,而非 CPU 密集型函数。当然,这些函数仍然需要使用 CPU,但它们主要负责处理静态资源的传输、API 调用以及数据转换等逻辑。正如 Daley 所说:"这更像是在优化和调度网络流量,而不是进行通用计算。"
This means a lot of conditional logic on pieces of websites, even on whole pages. You could serve language-specific pages based on the region. You could A/B test portions of sites. You can automatically redirect broken bookmarks. You can implement incremental static regeneration. You could inject a GDPR warning if the site didn’t see a cookie. You could geofence users and serve different content based on their location — sale on berets only in Paris, for example. “If you're very sophisticated, you can create an entire visual experience that's been separated and globally distributed,” said Lawson.
这种技术为网站的局部甚至整个页面提供了丰富的条件控制能力。例如,你可以根据用户所在地区提供相应语言版本的页面;可以对网站特定部分进行 A/B 测试 (A/B Testing);可以自动重定向失效的书签;可以实现增量式静态内容更新 (Incremental Static Regeneration);在未检测到 cookie 时插入 GDPR 合规提示;甚至可以基于用户地理位置进行内容差异化展示 —— 比如针对不同地区显示不同的促销信息。正如 Lawson 所说:"如果你掌握了这项技术,就能创建一个完全解耦且全球分布式的视觉体验。"
If you want to get really fancy, you can chain together multiple pieces and custom create a website on the fly. “We have a fifth event and it's called responseProvider
, and it's a synthetic origin,” said Daley. “There are some internal demos where I've seen people do impressive things. If you wanted to, say, call a bunch of different APIs, get all the JSON from those, and stitch it all together and call Edge KV—which is the distributed database—then put it all together, you could actually rewrite a web page right there and send it back.”
如果想要实现更高级的功能,你还可以将多个组件串联起来,实现即时的网站定制。Daley 解释道:"我们有第五种事件类型,称为 responseProvider,它是一个合成数据源 (Synthetic Origin)。在内部演示中,我见过一些令人印象深刻的应用案例。比如,开发者可以同时调用多个不同的 API,获取它们返回的 JSON 数据,然后结合分布式数据库 Edge KV 的内容,将这些数据整合起来,从而即时生成并返回一个全新的网页。"
What it enables now is pretty impressive, but get even more interesting when considering how this functionality will help enable future AI functionality. “It basically enables the AI revolution because you can't afford to run it on a traditional serverless function," said Ubl, “But in the I/O bound use case, which edge functions are ideal for, you outsource the model and inference to somewhere else and just call that API.”
目前边缘函数实现的功能已经相当令人印象深刻,但当我们思考这些功能将如何助力未来的 AI 应用时,前景更加令人期待。Ubl 表示:"边缘函数 (Edge Functions) 实际上为 AI 革命铺平了道路,因为在传统的无服务器函数上运行 AI 的成本过高。但在 I/O 密集型的应用场景中,边缘函数是理想的解决方案,你可以将模型和推理过程交给其他服务处理,仅需调用相应的 API 即可。"
With the increasing prevalence of generative AI, what’s to stop people from combining the conditional logic that edge functions excel at with generative code? “We're gonna see more AI building these websites and generating them and calling functions,” said Lawson. “It'd be really cool to see it on traffic patterns too, for it to be smart. Where you're coming in and saying, okay, we wanna make sure this campaign hits this amount of audience. It hits a threshold, hits a metric, maybe it cascades it. Just automatic detection. I think it will be personalized experiences. We will not be as much driven by humans doing research and looking at analytics, but analytics calling code and doing it.”
随着生成式 AI (Generative AI) 的快速发展,将边缘函数擅长的条件逻辑与代码生成能力结合似乎是大势所趋。Lawson 说道:"我们将看到越来越多的 AI 参与网站构建、内容生成和函数调用。如果能将这项技术应用到流量模式分析中会很酷,让系统变得更智能。比如,当你设定某个营销活动需要覆盖特定数量的受众时,系统可以自动监控阈值达成情况,追踪关键指标,甚至触发连锁反应。这些都将是自动化的检测过程。我认为未来会出现更多个性化的体验。我们将不再过度依赖人工研究和数据分析,而是让数据分析系统自动触发相应的代码执行。"
What looks fast and seamless to an end user takes a lot of behind the scenes work to maximize the speed at which a request hits an origin server, processes, and returns to their screen. With multiple layers of abstractions behind them, edge functions can make all the difference for I/O-heavy web applications of today, and the AI enhanced experiences of the future.
虽然用户体验到的是快速流畅的服务,但要实现请求快速到达源服务器、完成处理并返回到用户屏幕上,背后需要大量的优化工作。在多层抽象技术的支持下,边缘函数不仅能够显著提升当今 I/O 密集型 Web 应用的性能,还将为未来的 AI 增强体验提供强大支持。