Skip to content

The features has added the most in the past two years! Kubernetes 1.27 officially released

k8s 1.27 release

On April 11, 2023 Pacific Time, Kubernetes 1.27 was officially released. This version is 4 months after the last version was released and is the first version of **2023. **

The release team tracked 60 enhancements in the new version, which is much more than the previous version. Among them, 13 features have been upgraded to the stable version, 29 existing features have been optimized and upgraded to Beta, and another 18 Alpha-level features, Most are new features. This version contains many important features and user experience optimization . This page will introduce some important features in detail in the next section.

Important features

Container registry switched from k8s.gcr.io to registry.gcr.io

**Since Kubernetes version 1.25, the Kubernetes container registry domain name has changed from k8s.gcr.io to registry.k8s.io. ** After March 20, 2023, k8s.gcr.io will no longer continue to be maintained directly, but will be proxied to registry.k8s.io.

KEP-1847: StatefulSet PVC Auto Delete Feature Beta

The **StatefulSetAutoDeletePVC feature introduced in v1.23 will be upgraded to Beta in version 1.27 and enabled by default. ** However, enabling by default does not mean that all StatefulSet PVCs will be deleted automatically.

Users can configure whenDeleted or whenScaled stages to trigger Retain or Delete behavior. Among them, Retain is the default behavior. Only when the StatefulSet configured with the Delete policy is deleted, the corresponding PVC deletion action will be triggered.

KEP-3453: Optimize iptables mode performance of kube-proxy in large clusters

Feature MinimizeIPTablesRestore was introduced in version 1.26 and upgraded to Beta and enabled by default in version 1.27. This feature is intended to improve iptables mode performance of kube-proxy in large clusters.

If you encounter the problem that the Service information is not correctly synchronized to iptables, you can set the kube-proxy startup parameter to --feature-gates=MinimizeIPTablesRestore=false to disable the feature (and submit an issue to the community). You can also view the sync_proxy_rules_iptables_partial_restore_failures_total metric in the metrics information of kube-proxy to monitor the number of rule synchronization failures.

KEP-2831 and KEP-647: APIServer and Kubelet Tracing Feature Beta

**APIServerTracing has been promoted to Beta status and is enabled by default. ** Currently only supports Tracing of kube-apiserver and etcd components, but support for client-go will be added in the future. Tracing capabilities will also be gradually added to other components. Tracing in kube-apiserver needs to specify a configuration file to enable, and the receiver must be specified.

APIServerTracing

**Tracing invoked through CRI by Kubelet and container runtime has also been enabled by default. ** Similarly, the KubeletTracing feature of kubelet has been enabled by default, but the receiving end of Tracing needs to be configured to work properly.

KEP-3077: Context Logging

Context logs can help users understand the context information of the logs, **make logs better for users to troubleshoot and understand, and improve the observability of logs. ** Currently, the migration of kube-controller-manager has been completed, and the migration of kube-scheduler will be completed in version 1.28.

KEP-1287: Vertical Elastic Scaling of Pod Resources

Kubernetes now supports in-place resizing of Pod resources, pod reconstruction is no longer necessary, which largely alleviates problems such as cold start of horizontal elasticity.

  • In previous versions, the Pod API did not support modifying resources. That is, container-defined resource limits and requests, such as CPU and memory, are immutable. In version 1.25, the CRI API began to support hot updates of Pod resource limits.
  • Added a resizePolicy field to a Pod's container to allow users to control whether containers are restarted when resources change.
  • Added allocatedResources field to the container state to describe the node resources allocated for the Pod.
  • Added resources field to container status to report the actual resources applied to the running container.
  • Added resize field to Pod state to describe the status of a request to resize a Pod. This field can be Proposed, InProgress (in progress), Deferred (delayed) or Infeasible (not feasible).

KEP-3386: Kubelet Event Driven PLEG Upgrade to Beta

**In the case of a large number of node Pods, the Pod status update can be driven by the Event when the container is running, which can effectively improve the efficiency. ** In 1.27, the feature has reached Beta condition and basic E2E testing tasks have been added. The reason why this feature is turned off by default is that the community believes that this feature needs to be supplemented with the following verifications: stress testing, recovery testing, and retries with backoff logic.

  1. The stress test needs to create a large number of containers in a single Pod to generate CRI events, and observe whether the latency value exceeds 1 second.
  2. The recovery test is to verify that the Kubelet can correctly update the container state after restarting.
  3. The retry with backoff logic is to solve the problem that the Kubelet may not be able to connect when the CRI Runtime is down.

KEP-3476: Volume Group Snapshot Alpha (API)

** Being able to take snapshots on all volumes of a Pod at the same time will be a major technological breakthrough in disaster recovery backup and failure recovery use cases. ** Now you don't have to worry about applications not working correctly due to a few seconds difference between the backed up volumes. In addition, in terms of security research, the group snapshot feature of storage volume will also be a major change. When troubleshooting, your snapshots and the state of your pods are comparable. It should be noted that this feature is not in the Kubernetes registry, the definition of VolumeGroupSnapshots API is currently maintained in https://github.com/kubernetes-csi/external-snapshotter

KEP-3838 and KEP-3521: Pod scheduling ready state enhancements

Scheduling readiness feature PodSchedulingReadiness, introduced as an Alpha feature in v1.26, **This feature has been upgraded to Beta since v1.27, and it is enabled by default. **

This feature introduces the .spec.schedulingGates field in the Pod object to define whether the Pod is allowed to start scheduling. One of the most common cases for this feature is: batch scheduling, where a group of Pods need to wait for the cluster resources to satisfy the entire group of Pods to start at the same time before scheduling.

For pods whose hover (Gated) state is restricted by SchedulingGate, in order to allow third-party controllers to more flexibly control the final scheduling policy of these pods, The new version allows the NodeAffinity and NodeSelector of the Gated Pod to be modified, but only allows narrowing down the node range. Narrowing the scope means that new policies can be added, and existing policies cannot be deleted or modified.

KEP-3243: Scheduling optimization during rolling update of Deployment

Before the PodTopologySpread scheduling strategy only paid attention to the key of the label, not the value of the label, so when rolling to update the Deployment, Scheduling cannot distinguish between old and new instances, which may lead to uneven scheduling of new instances. For example, the last set of 4 old Pods for a rollover are all on the same node, Then, in order to distribute the new pod more evenly, it will be scheduled to other nodes with a high probability; after rolling and deleting this group of old pods at the end, it is possible that a node does not schedule the deployment.

In v1.27, the PodTopologySpread scheduling strategy can distinguish the value of the scheduling Pod label (This usually refers to the Pod's pod-template-hash label, which has different values for Pods corresponding to different replica sets), After this rolling update, new Pod instances will be scheduled more evenly.

KEP-2876: Use the Common Expression Language (CEL) to validate CRDs

**CustomResourceValidationExpressions has been upgraded to Beta in v1.25. ** Validation rules use the Common Expression Language (CEL) to validate the value of a custom resource. In v1.27, ValidationRule adds a new field messageExpression, which can better display prompt information. In previous versions, only fixed failure messages were supported.

x-kubernetes-validations:
- rule: "self.x <= self.maxLimit"
   messageExpression: '"x exceeded max limit of " + string(self.maxLimit)'

KEP-2258: Node log query

v1.27 added NodeLogQuery feature gate (Feature Gate), Provides cluster administrators with the ability to use kubectl to stream view node logs without logging into the node. This feature is currently Alpha, and it needs to configure kubelet to enable feature gate control, and also needs to set enableSystemLogHandler and enableSystemLogQuery are true. On Linux, We assume that system service logs are available through journald. On Windows, We assume that system service logs are available at the application log provider. In both operating systems, Logs can also be obtained by reading the files under the /var/log/ directory. The support for this feature on Windows is also gradually improving, Currently using Get-WinEvent to get system and application logs.

Not only that, but the current node log acquisition feature also supports some parameters. Where query represents the service name, You can specify kubelet, containerd, etc. patern filters log entries by the provided PERL-compatible regular expressions. In addition, the supported parameters include sinceTime, untilTime, tailLines, and boot.

# Fetch kubelet logs from a node named node-1. example that have the word "error"
kubectl get --raw "/api/v1/nodes/node-1.example/proxy/logs/?query=kubelet&pattern=error"

KEP-3659: kubectl apply --prune redesign

--prune was introduced as an Alpha feature in v1.5, which provides automatic cleaning of some objects deleted by apply yaml, But this process has some performance problems and defects, and in some cases it will cause object leaks. Common causes include allowlist (formerly called whitelist) GVK content does not match apply content, or a namespace change. The case of namespace manipulation changes, For example, the first apply operates namespaces A and B; and if the second apply only applies the resources of namespace A, then the resources of namespace B will not be cleaned up.

In v1.27, kubectl redesigned the feature of apply --prune, ** added --applyset to be used together, and the feature is still at Alpha level** at present, Therefore, you need to configure the KUBECTL_APPLYSET environment variable to 1 or true to enable it. When kubectl apply, kubectl will add to the object applyset.kubernetes.io/part-of label, which is used when cleaning. In order to meet more complex cases, applyset.kubernetes.io/id is also introduced To identify the Parent object, and toolling, additional-namespaces to help distinguish objects and increase namespace information, etc. For more details, please read the KEP content.

DaoCloud participation function

In this release, DaoCloud mainly contributed sig-node, sig-scheduling and kubeadm related content, the specific features are as follows:

  • [client-go] Fix waiting time when trying to get leader lease
  • [Scheduling] MinDomainsInPodTopologySpread feature upgraded to Beta
  • [Scheduling] When any scheduler plugin returns unschedulableAndUnresolvable status in PostFilter, the pod's scheduling cycle terminates immediately
  • [logging] Migrate controllers to use contextual logging
  • [kubeadm] Kubeadm: Add feature gating EtcdLearnerMode, which allows newly added control node's etcd to join as a Learner, and then upgrade to a voting member
  • [Node] Pods whose spec.terminationGracePeriodSeconds property value is negative will be modified to a terminationGracePeriodSeconds of 1 second
  • [Node] Added a new feature that can limit the number of parallel image downloads for a node
  • [Node] Improve the current Memory QoS feature and optimize its adaptability in cgroup v2 cases
  • [node] Kubelet: migrate "--container-runtime-endpoint" and "--image-service-endpoint" into kubelet configuration
  • [Node] Kubelet allows Pods to use net.ipv4.ip_local_reserved_ports sysctl by default, requires kernel version 3.16+
  • [CLI] The kubectl.kubernetes.io/default-container label is officially GA, mainly used for kubectl logs, exec and other commands to determine the default container

During the v1.27 release process, DaoCloud participated in hundreds of bug fixes and feature development, as the author of about 90 submissions, See the contribution list for details (15 of the more than 200 contributors to this version are from DaoCloud). During the Kubernetes v1.27 release cycle, Many R&D engineers of DaoCloud have achieved a lot.

  • wzshiming mainly maintains the project KWOK (Kubernetes Without Kubelet) which has become a community hotspot and effectively saves resources and improves efficiency in large-scale cluster simulation.
  • windsonsea almost took over the translation of the recent official website blog and became the maintainer of SIG-docs-zh.
  • mengjiao-liu is also a senior Chinese and English maintainer of SIG-docs.
  • kerthcet will share two interesting scheduling topics at the upcoming KubeCon Europe 2023, namely "Sig Scheduling Deep Dive" and "Building a Batch System for the Cloud with Kueue" (part of Kubernetes Batch + HPC Day).
  • pacoxu will share the theme of "Kubeadm Deep Dive".

Other features that need to be understood

APPS:

  • PodDisruptionBudget did not support specifying how to deal with unhealthy Pods. Unhealthy Pods refer to Pod Running But the status is not Ready. We added a new field unhealthyPodEvictionPolicy that allows users to specify the What should happen to the pod. This field was upgraded to Beta in v1.27.
  • The "StatefulSetStartOrdinal" feature has been upgraded to Beta, which allows the configuration of the start ordinal in the StatefulSet by default.
  • CronJob supports Timezone feature advanced to GA.
  • The StatefulSetStartOrdinal feature gate has been advanced to Beta, and the DownwardAPIHugePages kubelet feature has been advanced to GA.
  • API validation for Indexed Job has been relaxed, allowing to change both parallelism and completions To expand or shrink the Indexed Job, but it is necessary to keep parallelism == completions to be modified synchronously.

API:

  • Based on the KEP-2876 CRD verification expression language provided by Kubernetes v1.25, this feature adds a new resource—— ValidatingAdmissionPolicy, which allows field validation when not using a Validation Webhook.
  • In 1.27, Kubernetes provides Beta support for aggregate discovery, publishing all resources supported by the cluster through /api and /apis instead of providing each Group separately.
  • OpenAPIV3 feature GA, allowing API servers to publish OpenAPI V3. The community recommends using OpenAPI v3, which has many advantages, These include a lossless representation of the CustomResourceDefinition OpenAPI v3 validation schema, while OpenAPI v2 does a lossy conversion in CRD validation. kubectl explain already supports OpenAPI v3, but it needs to configure the environment variable KUBECTL_EXPLAIN_OPENAPIV3 to enable it.
  • Promoted SelfSubjectReview to beta level.

Auth:

  • KMSv2 has been upgraded to Beta. This feature has been optimized in 1.27. For example: when the plug-in key ID remains unchanged, The DEK data encryption key is reused, and the DEK is randomly regenerated when the Server starts.
  • Added a new Alpha API: ClusterTrustBundle (certificates.k8s.io/v1alpha1).
  • AdmissionWebhookMatchConditions feature has entered Alpha: in v1Beta and v1 API, Added MatchConditions field for ValidatingWebhookConfiguration and MutatingWebhookConfiguration.
  • Upgraded the LegacyServiceAccountTokenTracking feature to Beta for tracking Sercet-based SA token usage.

CLI:

  • The --subresource support of kubectl is upgraded to Beta. Currently, subresource only supports status and scale.
  • Improved kubectl plugin parsing to support non-shadowing subcommands. Need to configure the environment variable KUBECTL_ENABLE_CMD_SHADOW=true When this feature is enabled, for example, when kubectl create foo is executed, it will first find that create does not have the foo subcommand, and kubectl will automatically try to run the kubectl-create-foo plugin.

network:

  • Enable the CloudNodeIPs feature in the kubelet when the external cloud provider supports providing dual-stack IPs, You can then specify --node-ip for dual stack. This feature is currently in Alpha and needs to be turned on manually. ValidatingAdmissionPolicy Added matchConditions field to support custom matching conditions based on CEL. This feature is currently still in Alpha.
  • Allows dynamic scaling of the number of IPs available to serve a Service. Added MultiCIDRServiceAllocator function, currently at Alpha level.
  • New feature ServiceNodePortStaticSubrange to enable new strategy in NodePort service port subranger, So the node port range is subdivided and the dynamically allocated NodePort ports from above are preferred for services.
  • Added a warning about invalid DNS labels for workload resource (Pods, ReplicaSets, Deployments, Jobs, CronJobs, or ReplicationControllers) names.

elasticity:

  • HPAContainerMetrics is upgraded to Beta, which allows HorizontalPodAutoscaler to perform scaling operations based on the metrics of the ContainerResource type of each container in the target Pod.

node:

  • Dynamic resource allocation function, use the feature DynamicResourceAllocation. The new API is better than Kubernetes' existing device plugins The Device Plugin feature is more flexible. Because it allows Pods to request (claim) specific types of resources, which can be used at the node level, cluster level, or any other user-defined implementation model.
  • user namespace support expanded, the feature is still Alpha, but supports StatefulSets compared to v1.26.
  • GRPC probe feature GA.
  • Bump default API QPS limits for Kubelet.
  • Graduate Kubelet Topology Manager to GA.
  • Increased Kubelet's default API QPS limit, where both kubeAPIQPS and kubeAPIBurst were increased by a factor of 10.
  • Topology Manager for Kubelet Topology Manager feature GA.
  • The seccomp profile default is upgraded to GA level.

LOG:

  • kube-proxy, kube-scheduler and kubelet have HTTP API to change the log level at runtime. This feature also works for JSON formatted log output.

Metrics:

  • /metrics/slis is now available for control plane components and can be used to get health check metrics for the current component.

Lease:

  • Kubernetes component election now only supports Lease.

Scheduling:

  • Added Metric plugin_evaluation_total to scheduler. This metric shows how many times a particular plugin affected the scheduling results. -The scheduling framework can use the Skip state to skip this process in the Filter and Score stages to improve performance. In the PreFilter stage, if the Plugin returns Skip information, then the corresponding Filter and Score stages of the Plugin can be skipped later.

storage:

  • NewVolumeManagerReconstruction feature upgraded to Beta. This is a refactoring of VolumeManager , Allows the kubelet to carry additional information about how existing volumes are mounted during startup.
  • SELinuxMountReadWriteOncePod feature upgraded to Beta. This feature uses the correct SELinux labels during volume mounts, This feature speeds up container startup compared to recursively changing each file one by one.
  • ReadWriteOncePod PV access mode feature upgraded to Beta. This feature introduces a new ReadWriteOncePod access mode, Used to restrict a PV's access to a single Pod on a single node. While the ReadWriteOnce mode restricts single-node access, it does not restrict simultaneous access by multiple Pods on the same node.
  • CSINodeExpandSecret feature upgraded to Beta level.

version flag

The subject of this Kubernetes release is "Chill Vibes" relaxed vibes.

Chill Vibes

This was inspired by a recent release where nothing unusual happened after a feature freeze, the only one we've had in multiple releases. We were able to make this release easier because of the hard work that went into improving release management behind the scenes. And that's exactly what **this thread wants to celebrate: all the hard work that goes into improving the community experience. **

Special thanks to Britnee Laverack for creating the logo design, and she also designed the Kubernetes 1.24: Stargazer logo design.

Upgrade Notes

This section mainly introduces the API changes in v1.27, as well as the removal and deprecation of features. Deprecated features are usually removed after 1-2 versions. See Features removed and major changes in Kubernetes v1.27 for more details.

Features and major changes removed in Kubernetes v1.27: https://kubernetes.io/blog/2023/03/17/upcoming-changes-in-kubernetes-v1-27/

Among them **need to focus on, IPv6DualStack external cloud provider feature gate has been deleted. ** (The feature became GA in version 1.23, and feature gating was removed for all other components a few versions ago.) If you still enable it manually, you must stop now.

k8s.gcr.io redirects to registry.k8s.io related instructions

**Again, for hosting its container images, the Kubernetes project uses a community-owned registry called registry.k8s.io. ** Starting March 20, all traffic from the outdated k8s.gcr.io repository will be redirected to registry.k8s.io. The deprecated k8s.gcr.io repository will eventually be shut down in the future.

Other changes to note

  • The storage.k8s.io/v1beta1 API version of CSIStorageCapacity was deprecated in v1.24 and will be removed in v1.27.
  • Removed NetworkPolicyEndPort, LocalStorageCapacityIsolation, StatefulSetMinReadySeconds, IdentifyPodOS, DaemonSetUpdateSurge, EphemeralContainers, CSIInlineVolume, CSIMigration, ControllerManagerLeaderMigration Feature gating, most of these features are officially GA in versions before v1.25.
  • kube-apiserver removes the --master-service-namespace command line argument.
  • kube-controller-manager command line arguments --enable-taint-manager and --pod-eviction-timeout are removed.
  • kubelet removes the command-line parameter --container-runtime, which currently has only one optional value "remote" and was deprecated in previous versions.
  • Deprecated the alpha state seccomp annotations seccomp.security.Alpha.kubernetes.io/pod and container.seccomp.security.Alpha.kubernetes.io.
  • The SecurityContextDeny feature gate is deprecated and will be removed in a future release.
  • Linux/arm will not ship in Kubernetes 1.27 due to issues building with golang 1.20.2. Note, arm64 is still supported.

historic version

References

Comments