Appearance
【译】为什么我们应该选择模块化而非微服务架构(You Want Modules, Not Microservices)【原文】
TL;DR: Architecture is hard sometimes - people keep offering up some new idea that quickly becomes the mainstream "way to do it" without any context or nuance, and the industry, desperate to find ways to improve their architecture, snaps it up without hesitation. Microservices was the latest in the trend, and it's time we dissected the idea and got to the real root of what's going on.
内容提要:软件架构设计是一项富有挑战性的工作。在这个领域中,新的理念往往会在缺乏完整背景和深入理解的情况下,就迅速成为主流方法。而行业从业者为了改进架构,经常会不加思考地追随这些潮流。微服务架构 (Microservices) 就是最近这种现象的典型代表。现在,让我们深入分析这一概念,探究其背后的本质。
在微服务架构的核心,据说蕴含着许多优秀的特性!(At the heart of microservices, we're told we'll find ... Lots of Good Things (TM)!)
- "Scalability": "Code can be broken into smaller parts that can be developed, tested, deployed, and updated independently."
- "Focus": "... developer focuses on solving business problems and business logic."
- "Availability": "back-end data must always be available for a wide range of devices... ."
- "Simplicity": "provides simplified development of large scale enterprise level application."
- "Responsiveness": "... enables distributed applications to scale is response to changing transaction loads... ."
- "Reliability": "Ensures no single point of failure by providing replicated server groups that can continue when something breaks. Restores the running application to good condition after failures occur."
...
- "可扩展性":"系统代码可以被拆分成多个小模块,每个模块都能独立开发、测试、部署和更新。"
- "关注点":"...开发者可以将精力集中在解决业务问题和业务逻辑上。"
- "可用性":"后端数据需要保证随时可以被各类设备访问..."
- "简化性":"为大规模企业级应用程序提供简化的开发方式。"
- "响应性":"...让分布式应用能够根据事务负载的变化动态扩展..."
- "可靠性":"通过部署可复制的服务器组来消除单点故障隐患,确保系统在部分组件故障时仍能继续运行。同时在发生故障后能将应用恢复到正常状态。"
These all sound relatively familiar, I'd imagine, but the fun part about those six quotes is that two were taken from microservices literature (blog posts, papers, etc), two from twenty-years-ago EJB literature, and two from Oracle Tuxedo, which is forty-plus-years-ago technology. Can you spot which went to which?
有趣的是,这些描述听起来都很熟悉吧?实际上,这六段引用分别来自三个不同时期的技术:有的出自微服务 (microservices) 相关文献(博客文章、论文等),有的来自 20 年前的 EJB 文献,还有的来自已有 40 多年历史的 Oracle Tuxedo 技术。你能猜出哪段引用对应哪个时期的技术吗?
We have a tendency in this industry to re-use our hype points over and over again.
这正说明了我们这个行业的一个有趣现象:总是在反复使用相同的营销术语来包装技术。
"那些不记得过去的人注定要重蹈覆辙。(Those who cannot remember the past are condemned to repeat it.)" --George Santanyana, The Life of Reason (1905)
With respect to the microservices hype, one company's blog post offers 10 reasons to charge into microservices:
关于微服务架构的热门话题,一家公司的博客文章列出了采用微服务的 10 个理由:
- They promote big data best practices. Microservices naturally fit within a data pipeline-oriented architecture, which aligns with the way big data should be collected, ingested, processed and delivered. Each step in a data pipeline handles one small task in the form of a microservice.
- They are relatively easy to build and maintain. Their single-purpose design means they can be built and maintained by smaller teams. Each team can be cross-functional while also specialise in a subset of the microservices in a solution.
- They enable higher-quality code. Modularising an overall solution into discrete components helps application development teams focus on one small part at a time. This simplifies the overall coding and testing process.
- They simplify cross-team co-ordination. Unlike traditional service-oriented architectures (SOAs), which typically involve heavyweight inter-process communications protocols, microservices use event-streaming technologies to enable easier integration.
- They enable real-time processing. At the core of a microservices architecture is a publish-subscribe framework, enabling data processing in real time to deliver immediate output and insights.
- They facilitate rapid growth. Microservices enable code and data reuse the modular architecture, making it easier to deploy more data-driven use cases and solutions for added business value.
- They enable more outputs. Data sets often are presented in different ways to different audiences; microservices simplify the way data can be extracted for various end users.
- Easier to assess updates in the application life cycle. Advanced analytics environments, including those for machine learning, need ways to assess existing computational models against newly created models. A-B and multivariate testing in a microservices architecture enable users to validate their updated models.
- They enable scale. Scalability is about more than the ability to handle more volume. It’s also about the effort involved. Microservices make it easier to identify scaling bottlenecks and then resolve those bottlenecks at a per-microservice level.
- Many popular tools are available. A variety of technologies in the big data world, including the open-source community, work well in a microservices architecture. Apache Hadoop, Apache Spark, NoSQL databases and many streaming analytics tools can be used for microservices. We are also proud to partner with Pivotal in this area.
...
- 符合大数据最佳实践。微服务 (Microservices) 天然适配数据管道导向的架构,这与大数据的采集、引入、处理和传递方式完全一致。数据管道中的每个步骤都可以作为一个微服务来处理单一任务。
- 开发和维护相对简单。单一职责的设计理念使得小型团队就能够进行开发和维护。每个团队既可以具备跨功能性,又能专注于解决方案中特定的微服务模块。
- 提升代码质量。将整体解决方案拆分为独立模块,有助于开发团队每次只关注一个小部分。这样可以简化整体的编码和测试流程。
- 简化团队协作。传统的面向服务架构 (SOA) 通常需要复杂的进程间通信协议,而微服务则通过事件流技术 (Event-streaming) 实现更便捷的集成。
- 支持实时处理。微服务架构的核心是发布订阅 (Publish-subscribe) 框架,能够实时处理数据并提供即时的输出结果和分析洞察。
- 加速业务增长。微服务的模块化架构支持代码和数据重用,便于快速部署更多数据驱动的应用场景和解决方案,创造更多业务价值。
- 提供多样化输出。数据集往往需要以不同形式呈现给不同的受众群体;微服务简化了面向不同最终用户的数据提取方式。
- 便于评估应用程序生命周期更新。包括机器学习在内的高级分析环境,需要评估现有计算模型与新模型的差异。微服务架构中的 A/B 测试和多变量测试让用户能够验证更新后的模型。
- 支持规模化扩展。可扩展性不仅仅体现在处理更大数据量的能力上,还包括实现扩展的成本效率。微服务架构让我们能更容易地发现扩展瓶颈,并在具体服务层面解决这些问题。
- 丰富的工具生态。大数据领域的众多技术,包括开源社区的解决方案,都能很好地支持微服务架构。Apache Hadoop、Apache Spark、NoSQL 数据库以及各种流分析工具都可以用于构建微服务。我们也很荣幸能在这个领域与 Pivotal 展开合作。
Let's take a second and examine each of those, but this time in light of prior art:
让我们从现有技术的角度,重新来分析这些观点:
- They promote big data best practices. Pipes-and-filters architectures have been a part of the software scene since the 70s, when Unixes promoted several ideas:
- Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new "features".
- Expect the output of every program to become the input to another, as yet unknown, program. Don't clutter output with extraneous information. Avoid stringently columnar or binary input formats. Don't insist on interactive input.
- They are relatively easy to build and maintain. See the Unix philosophy, above.
- They enable higher-quality code. If focusing on one small part at a time helps improve quality, then see the Unix philosophy, above.
- They simplify cross-team co-ordination. This one is interesting; it suggests that "service-oriented architectures (SOAs) ... typically involve heavyweight inter-process communications protocols"--like JSON over HTTP? Or is that taken to mean that all SOA requires SOAP, WSDL, XML Schema and the full collection of WS-* specifications? Ironically, nothing about a microservice in any way prevents it from using any of those "heavyweight" protocols, and some microservices are even suggsting the use of gRPC, a binary protocol that bears closer resemblance to IIOP, from CORBA, which was the "heavyweight protocol" predecessor to... SOAP, WSDL, XML Schema, and the full collection of WS-* specifications.
- They enable real-time processing. Real-time processing has actually been a "thing" for quite a while, and while many such systems use a pub-sub or "event bus" model to do it, it hardly requires microservices to do it.
- They facilitate rapid growth. "Reuse the modular architecture"--do we even have a count of how many different things have all promoted "reuse" as a selling point? Languages certainly have done it (OOP, functional languages, procedural languages), libraries, frameworks.... One day I want to see something hyped that explicitly says "Screw reuse. We don't care about that."
- They enable more outputs. "Data sets often are presented in different ways to different audiences"--that sounds a great deal like the Crystal Reports home page.
- Easier to assess updates in the application life cycle. The need to "assess existing computational models against newly created models" for machine learning and advanced analytics environments... kinda sounds like a large pile of action words thrown together with little substance behind them.
- They enable scale. How funny--the same was said of EJB, transactional middleware processing (a la Tuxedo), and mainframes.
- Many popular tools are available. I don't think I have to really work hard to point out that tools have always been available for every major hype that's come through our industry--particularly after the hype has taken root for a while. Most readers won't even be old enough to remember CASE tools but maybe they'll remember UML.
...
- 它们推广大数据最佳实践。管道和过滤器架构 (Pipes-and-filters Architecture) 自70年代就已经是软件领域的重要组成部分,当时 Unix 推广了以下理念:
- 每个程序专注于做好一件事。如果需要完成新任务,应该重新开发程序,而不是通过添加新"功能"使现有程序变得复杂。
- 设计程序时要考虑到它的输出可能会成为另一个未知程序的输入。因此输出要保持简洁,不要添加无关信息。避免使用严格的列式格式或二进制格式,也不要强制要求交互式输入。
- 它们便于构建和维护。这一点可以参考上面提到的 Unix 设计理念。
- 它们有助于提高代码质量。如果说专注于处理较小的模块有助于提高质量,那么这也印证了 Unix 的设计理念。
- 它们简化了跨团队协作。这一点很有意思:有人说"面向服务架构 (SOA) 通常需要重量级的进程间通信协议"--是指 JSON over HTTP 这样的协议吗?还是说所有 SOA 都必须使用 SOAP、WSDL、XML Schema 以及完整的 WS-* 规范?有趣的是,微服务架构本身并不排斥使用这些所谓的"重量级"协议,有些微服务甚至推荐使用 gRPC 这样的二进制协议,而这与 CORBA 中的 IIOP 非常相似。而 IIOP 正是 SOAP、WSDL、XML Schema 等"重量级协议"的前身。
- 它们支持实时处理。实际上,实时处理早已存在,虽然许多系统采用发布-订阅或"事件总线"模型来实现,但这并不是微服务的专利。
- 它们促进快速发展。说到"重用模块化架构"--究竟有多少技术在宣传时都把"重用"作为卖点?编程语言(面向对象编程、函数式编程、过程式编程)、程序库、框架等等都这么做过。什么时候能看到一个技术敢于说"我们不在乎重用这回事"就好了。
- 它们能产生更多输出形式。"同一份数据集可以用不同方式呈现给不同受众"--这说法听起来很像 Crystal Reports 的宣传语。
- 它们便于评估应用程序生命周期中的更新。所谓"在机器学习和高级分析环境中评估现有计算模型与新模型的对比",听起来更像是一堆专业术语的堆砌,缺乏实质内容。
- 它们支持规模化扩展。有趣的是,这种说法在 EJB、事务中间件处理(如 Tuxedo)和大型机时代就已经出现过了。
- 有大量现成的工具可用。显然,每当一项技术开始流行,相关工具就会如雨后春笋般出现,这一点无需多言。现在的读者可能记不得 CASE 工具了,但应该还记得 UML 吧。
But the discerning reader will notice that there is a pretty common theme to about half of the points above--the idea of creating and maintaining small, independent "chunks" of code and data, versioned apart from one another, using common inputs and outputs to enable a larger integration of the system. It's almost as if...
仔细观察就会发现,上述观点中大约有一半都在强调同一个主题:创建和维护小型、独立的代码和数据模块,这些模块可以独立版本管理,通过标准的输入输出接口实现更大规模的系统集成。这让人不禁联想到...
深入微服务我们会发现 - 微服务其实就是一个个模块(At the heart of microservices, we find ... Modules.)
Yup, the lowly "module", that core concept that has been at the heart of most programming languages since the 1970s. (Even earlier, though it was harder to do with older languages that didn't incorporate the module as a first-class core concept.) Call them "assemblies" on the CLR (C#, F#, Visual Basic, ...), "JARs" or "packages" on the JVM (Java, Kotlin, Clojure, Scala, Groovy, ...), or dynamically-link libraries from your favorite operating system (DLLs on Windows, sos or as on *nixes, and of course macOS has the "Frameworks" tucked away inside the /Library directories), but at a conceptual level, they're all modules. Each has a different internal format, but each serves the same basic purpose: an independently-built, -managed, -versioned, and -deployed unit of code that can be reused.
没错,就是那个"不起眼的模块",这个自 1970 年代以来就一直作为大多数编程语言核心概念的基础。(其实更早就有了,只是在那些没有将模块作为一等核心概念的老旧语言中实现起来比较困难。) 无论是 CLR (C#, F#, Visual Basic, ...) 平台上的"程序集",JVM (Java, Kotlin, Clojure, Scala, Groovy, ...) 平台上的"JAR"或"包",还是各种操作系统中的动态链接库 (Windows 上的 DLL,*nix 系统上的 so 或 a 文件,以及 macOS 中存放在 /Library 目录下的"Frameworks"),从本质上来说,它们都是模块。尽管它们的内部格式各不相同,但都具有相同的基本用途:作为一个可重用的代码单元,能够独立构建、管理、版本控制和部署。
Consider this working definition of a module, quoted from one of Computer Science's foundational papers:
让我们看看计算机科学一篇奠基性论文中关于模块的经典定义:
"A well-defined segmentation of the project effort ensures system modularity. Each task forms a separate, distinct program module. At implementation time each module and its inputs and outputs are well-defined, there is no confusion in the intended interface with other system modules. At checkout time the integrity of the module is tested independently; there are few scheduling problems in synchronizing the completion of several tasks before checkout can begin. Finally, the system is maintained in modular fashion; system errors and deficiencies can be traced to specific system modules, thus limiting the scope of detailed error searching."
"通过合理划分项目工作来保证系统的模块化。每个任务都构成一个独立且独特的程序模块。在具体实现时,每个模块的输入输出都有清晰的定义,与其他系统模块之间的接口也不会产生混淆。在验证阶段,可以独立测试每个模块的完整性;在开始验证之前,几乎不会遇到多任务同步完成时的调度问题。最后,系统的维护也是基于模块进行的;系统的错误和缺陷可以追踪到具体的模块,这样就把详细的错误查找范围限定在了特定范围内。"
This comes from David Parnas' seminal paper, "On the Criteria To Be Used in Decomposing Systems into Modules", written in 1971 - over 50 years ago at the time of this writing. The well-defined "separate, distinct program modules" covers about half of the suggested benefits of microservices, and we've been able to do that for fifty years.
这段话出自 David Parnas 1971 年发表的开创性论文《论分解系统为模块的标准》 —— 到现在已经过去 50 多年了。这里描述的"独立且独特的程序模块"的概念,已经涵盖了微服务所宣称优势的一半左右,而这些理念我们已经实践了整整五十年。
So why the hullabaloo over microservices?
那么,为什么最近几年围绕微服务会掀起如此大的波澜呢?
Because microservices were really never about microservices, or services, or even distributed systems.
因为微服务的本质,从来就不是关于微服务本身,也不是关于服务,甚至不是关于分布式系统。
深入微服务我们会发现 - 微服务其实就是组织架构的清晰化(At the heart of microservices, we should find ... Organizational clarity.)
Amazon, one of the first companies to openly discuss the microservice concept, really wasn't trying to push the architectural principle as much as they were trying to push the idea of an independent development team whose blockers were few and far between. Waiting on the DBA team for schema changes? QA needs a build to test so they can find bugs? Or are we waiting on the infrastructure team to procure a server? Or the UX team to create a prototype for the presentation?
Amazon 作为最早公开讨论微服务概念的公司之一,他们的重点其实不在于推广架构原则,而是在推广一种几乎不会受到外部阻碍的独立开发团队理念。还在等待 DBA 团队修改数据库架构?或是等待 QA 团队拿到构建版本来测试查找 bug?又或者在等待基础设施团队采购服务器?亦或是在等待 UX 团队设计界面原型?
SCHHHLLLUURRRRRRRPPPPpppp...
That sound you hear is the development team aggregating ownership of any and all of those dependencies that could (and frequently would) block them from moving forward. It meant that the teams were a small microcosm of the average IT team's various parts (analysis, development, design, testing, data management, deployment, administration, and more). It did mean that now teams either had to be assembled from a variety of disparate skillsets, or else we had to require the complete set of skills in each team member (the so-called "Full Stack Developer"), which meant that hiring these folks became infinitely trickier. It also meant that now the team was responsible for its own production outages, meaning the team itself now has to be given on-call responsibilities (and the commensurate payroll and legal implications that go along with that). But, when all that was navigated, it meant that each team could build their artifact independently of one another, constrained by nothing other than time and the physics of how fast fingers can fly over a keyboard.
这个声音象征着开发团队正在将所有可能阻碍项目进展的依赖项的控制权收归己有。这使得每个团队都变成了一个小型的完整 IT 团队,涵盖了各个职能(包括分析、开发、设计、测试、数据管理、部署和运维等)。这就意味着团队要么需要由具备不同专长的人员组成,要么需要每个成员都是"全栈开发者 (Full Stack Developer)",这无疑大大增加了招聘的难度。同时,团队还要为生产环境的故障负责,这意味着团队成员需要承担值班职责(以及相应的薪资补贴和法律责任)。但是,一旦这些都安排妥当,每个团队就能够真正实现独立开发和交付,唯一的限制就是时间和开发效率本身。
In theory, anyway.
不过这只是理论上的说法罢了。
深入微服务我们会发现 - 分布式系统的那些误区(At the heart of microservices, we often find ... The Fallacies of Distributed Computing.)
For those not familiar with them, the Fallacies were first coined by Peter Deutsch in a presentation to his peers at Sun--back in the 80s. They reappeared in the 1994 seminal paper "A Note on Distributed Computing" by Ann Wolrath and Jim Waldo, and they both essentially say the same thing:
对于不熟悉这个话题的人来说,这些关于分布式系统的误区最初是由 Peter Deutsch 在 80 年代在 Sun 公司的一次演讲中提出的。这些观点后来在 1994 年 Ann Wolrath 和 Jim Waldo 发表的重要论文"关于分布式计算的研究"中再次被提及,它们传达的核心信息是一致的:
"Getting distributed systems right - performance, reliability, scalability, whatever "right" means - is hard." (loosely paraphrased)
"要构建一个优秀的分布式系统 —— 无论是从性能、可靠性还是可扩展性方面来看,不管你如何定义'优秀' —— 都是一项极具挑战的任务。"(意译)
When we decomposed the system into in-memory modules running on a single operating system node, the costs of passing data across process or library boundaries was pretty negligble, even fifty years ago. When passing that data across network lines, though - as most microservices do - adds five to seven orders of magnitude greater latency to the communication. That is not simply something we can "scale away" by adding more nodes to the network; that actually makes the problem worse.
五十年前,当我们将系统拆分为在单个操作系统节点上运行的内存模块时,跨进程或库边界传递数据的开销可以忽略不计。但是,当数据需要通过网络传输时 —— 这也是大多数微服务的工作方式 —— 通信延迟会增加五到七个数量级。这个问题并不是简单地通过增加网络节点就能"横向扩展"解决的;实际上,这样做反而会让情况变得更糟。
Yes, some of that can be made less relevant by hosting the microservices on the same machine, usually by loading them into a cluster of virtual machines running containerized images of the independent microservice. (As in, using Docker Compose or Kubernetes to host a collection of Docker containers.) Doing so, however, adds latency between the virtual machine process boundaries (because we have to move data up and down the virtual networking stack, in accordance with the rules of the seven-layer model, even if some of those layers are being entirely emulated), and still creates the reliability issue of running on a single node.
诚然,我们可以通过将微服务部署在同一台机器上来缓解部分问题,通常的做法是将它们加载到运行容器化微服务的虚拟机集群中(例如使用 Docker Compose 或 Kubernetes 来管理一组 Docker 容器)。但这种方案也有其缺陷:它会在虚拟机进程边界之间引入额外的延迟(因为即使部分网络层是模拟的,数据仍然需要遵循网络七层模型在虚拟网络栈中上下传递),而且仍然存在单节点运行带来的可靠性风险。
What's worse, even as we start to wrestle with the Fallacies of Distributed Computing, we begin to run into a related, but separate, set of problems: The Fallacies of Enterprise Computing.
更具挑战性的是,当我们刚开始着手解决这些分布式计算的误区时,又会遇到另一组相关但独立的问题:企业计算中的误区。
深入微服务我们会发现 - 我们需要重新思考我们真正的需求(At the heart of microservices, we need ... To start rethinking what we really need.)
Do you need to decompose the problem into independent entities? You can do that by embracing standalone processes hosted in Docker containers, or you can do that by embracing standalone modules in an application server that obey a standardized API convention, or a variety of other options. This isn't a technical problem that requires abandoning anything that's already been built--it can be done using technologies from anywhere in the last twenty years, including servlets, ASP.NET, Ruby, Python, C++, maybe even shudder Perl. The key is to establish that common architectural backplane with well-understood integration and communication conventions, whatever you want or need it to be.
你是否需要将问题拆分成独立的组件?这可以通过多种方式实现:可以使用 Docker 容器来托管独立进程,也可以在应用服务器中创建遵循标准 API 规范的独立模块,还有其他多种选择。这并不是一个需要推倒重来的技术问题——你可以使用过去二十年中的任何技术来实现,包括 servlets、ASP.NET、Ruby、Python、C++,甚至是古老的 Perl。关键在于建立一个具有清晰的集成和通信规范的统一架构基础,具体采用什么方案取决于你的实际需求。
Do you need to reduce the dependencies your development team is facing? Then begin by looking at those dependencies and working with partners to determine which of them you can bring into the team's wheelhouse. If the organization doesn't want to officially break up the "skill-centric" ontology of its org chart (meaning you have a "database" group, a "infrastructure" group, and a "QA" group as peers to your "development" group), then work with the senior executives to at least allow for a "dotted-line" reporting structure, so there's individuals from each group that are now "matrixed" in on a single team. But, most importantly, make sure that team has a crystal-clear vision of what it is they're trying to build, and they can confidently describe the heart of their service/microservice/module to any random stranger walking by on the street. The key is to give the team the direction and goal, the autonomy to accomplish it, and the clarion call to get it done.
你是否需要减少开发团队的依赖?那就先梳理这些依赖关系,与合作方一起确定哪些工作可以交由团队自主完成。如果组织不愿意改变现有的"专业技能导向"的组织结构(即设置独立的"数据库"组、"基础设施"组和"QA"组与"开发"组并列),那么可以与高层管理者协商,至少建立一个"虚线汇报"的机制,让各个专业组都能派人加入到统一的项目团队中。最重要的是,确保团队对他们要构建的产品有清晰的认识,能够自信地向任何外行人解释他们的服务/微服务/模块的核心价值。关键是要为团队指明方向和目标,赋予他们实现目标的自主权,并给予他们明确的使命感。
It really boils down to these two things, which really, really, really have nothing to do with each other except tangentially.
说到底就是这两点,除了表面上的一些关联外,它们实际上是完全独立的两个概念。