My Understanding of Platform Engineering
Gartner has listed Platform Engineering as one of the top strategic technology trends for 2024.
Speaking of Platform Engineering, it’s often said to be: old wine (DevOps) in new bottles (Platform Engineering).
Today, based on past experiences with self-service platforms, let’s talk about my understanding of Platform Engineering.
Cloud-Native Platforms
When it comes to Platform Engineering, it’s inevitable to discuss cloud-native technology, but this won’t be a debate on whether to shift to cloud-native or not.
The three pillars of cloud-native are: Microservices, Kubernetes, and DevOps. Based on previous experiences, I believe the core capabilities of a cloud-native technology platform (which include, but are not limited to) can be summarized as:
- Container Platform: Focuses on containerization technology and Kubernetes orchestration, achieving application elasticity, efficient storage, and network communication. This provides infrastructure support for the implementation of Microservices and DevOps.
- Microservices Platform: Manages microservices centrally, including unified service governance, configuration management, API gateways, and support for various microservice frameworks, accommodating complex service interactions and flexible development needs.
- Monitoring Platform: Offers comprehensive monitoring systems, including log collection, performance metrics monitoring, link tracing, real-time alerts, and visualization of monitoring data, aiding in the stable operation of systems and rapid fault localization.
- DevOps Integration Platform: Integrates continuous integration and continuous deployment (CICD) processes, as well as a documentation center and code quality management, achieving automated, efficient, and standardized software development and operational processes.
Regarding cloud-native technology, which has been popular for many years (although it seems to be cooling down lately?), opinions vary in the industry: it has improved development efficiency and resource utilization; however, it also leads to resource wastage, difficult deployment and maintenance, and worsened observability.
Much of the negative feedback about cloud-native technology stems from its inherent complexity. Cloud-native platforms have exposed too much complexity to developers.
The Complexity of Cloud-Native
Cloud-native technology, despite its many advantages such as flexibility, scalability, and efficient resource utilization, also introduces significant complexity:
- Complexity of the Technology Stack: Cloud-native environments typically involve containerization, microservices architecture, CI/CD, and container orchestration based on Kubernetes. Each of these technologies has its learning curve, and they need to be integrated and work together, adding to the overall complexity of the system.
- Complexity of Management and Operations: In cloud-native environments, applications are typically decomposed into multiple microservices, each deployed in separate containers, making monitoring, logging, troubleshooting, and performance optimization more complex.
- Network Complexity: The microservices architecture means a lot of inter-service network communication. Coupled with containers and hybrid multi-cloud network environments, managing the network traffic between these services, ensuring high availability and network security, and implementing service discovery all add to the complexity of network management.
- Observability and Monitoring Challenges: Ensuring sufficient visibility into the multitude of microservices running in a constantly changing environment requires complex monitoring and logging systems.
- Security Challenges: The distributed and dynamic nature of cloud-native architecture introduces new security challenges. For example, ensuring container security, secure communication between services, and continuously managing and updating security policies in a dynamic environment.
These complexities introduce significant friction and cognitive load for developers, thereby diminishing their development experience. Between developers focused on business development and underlying infrastructure, a blurred boundary area is formed. Platform Engineering focuses on this gray area, aiming to bridge this gap and simplify the development process.
Platform Engineering acts as a glue layer.
What is Platform Engineering
Platform engineering is the discipline of designing and building toolchains and workflows that enable self-service capabilities for software engineering organizations in the cloud-native era. Platform engineers provide an integrated product most often referred to as an “Internal Developer Platform” covering the operational necessities of the entire lifecycle of an application.
Platform Engineering is dedicated to building and maintaining a bridge that transforms complex infrastructure into simplified abstractions, allowing developers to focus on business logic development without delving into underlying technical details. At the same time, they also strive to integrate and sink common functionalities of the business logic layer, further simplifying the development process.
Here, I prefer to understand “engineering” as a verb, meaning the process of engineering a platform.
What is Engineering
The standardization, systematization, and normalization of the software development process aim to improve the efficiency, quality, and maintainability of software development. These aspects typically include: process standardization, systematization, and normalization.
- Process Standardization: Defining standard processes, document standards, code conventions, reuse, and modularization.
- Systematization: Unification, automation, and service management systems of tools and platforms.
- Normalization: Quality control standards, security and compliance, performance standards.
As the level of platform engineering increases, developers enjoy higher levels of self-service capabilities. The implementation of such self-service platforms further shortens the distance between developers and infrastructure platforms, allowing developers to use platform resources more conveniently without understanding the underlying technical details.
A self-service platform is the product of engineering, or the tangible form of Platform Engineering.
Self-Service Platform
The self-service platform enables developers to directly manage and operate resources, involving them in the entire lifecycle of the software while not concerning themselves with the underlying infrastructure and implementation details, reducing the direct involvement of platform team members. This platform follows the “You build it, You run it” philosophy, but does not require developers to deeply understand the underlying technology (“Know-how”).
From my perspective, a self-service platform is more like a combination of processes and tools. At the tool level, whether open-source or commercial software, they provide abstractions of common capabilities. However, in terms of process, due to the specific management needs of each enterprise and dependencies on supporting departments, process designs vary. Although tools may be standardized, it is difficult to achieve this with processes.
Conclusion
I believe that simply comparing Platform Engineering to ‘new wine in an old bottle’ is not comprehensive enough. In fact, Platform Engineering represents a further sedimentation and elevation of traditional methods (such as DevOps), giving them new forms and capabilities.
In the interaction between technology and people, we should not overlook the importance of connection and coordination. As a reflection of the real world, software systems should always adhere to the principle of people-first.