Resource monitoring and pod autoscaling

Effectively managing application resources is critical for ensuring performance, stability, and cost-efficiency. Cloudfleet provides powerful tools for this: the kubectl top command for real-time resource monitoring, and Horizontal Pod Autoscaling (HPA), a standard Kubernetes feature, for automatically adjusting your application’s scale to meet demand.

Understanding your resource consumption is the first step. The kubectl top command gives you immediate insights into how your nodes and pods are utilizing CPU and memory. This visibility is essential for identifying potential bottlenecks, optimizing resource requests and limits, and making informed decisions about scaling.

Once you have a grasp of your resource patterns, Horizontal Pod Autoscaling (HPA) enables your applications to respond dynamically to changes in load. This means your application can automatically scale out (add more pods) during peak traffic and scale in (remove pods) during quieter periods, ensuring high availability without overprovisioning.

The benefits of HPA are significantly amplified by Cloudfleet’s Node Autoprovisioning feature:

Efficient Scale-Out: When HPA needs to add more pods due to increased demand, and your current nodes lack capacity, Node Autoprovisioning seamlessly provisions new nodes. This ensures your application can scale without manual intervention.
Cost-Effective Scale-In: When HPA reduces the number of pods as demand subsides, Node Autoprovisioning can deprovision underutilized or empty nodes. This directly translates to cost savings, as you only pay for the infrastructure you actively use.

This integrated approach ensures that your entire stack, from pods to nodes, scales efficiently and economically. For more details on Node Autoprovisioning, see the Cloudfleet Node Provisioner documentation.

Monitoring Resource Usage with kubectl top

The kubectl top command provides a snapshot of resource consumption (CPU and memory) for nodes and pods within your cluster. It’s an indispensable tool for real-time monitoring.

Overview

The kubectl top command fetches resource metrics from the Metrics Server, which is an integral component of your Cloudfleet Kubernetes cluster, collecting data on CPU and memory usage.

Viewing Node Metrics:

To display CPU and memory usage for all nodes in the cluster:

kubectl top nodes

Example Output:

NAME            CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
node-worker-1   150m         7%     1750Mi          45%
node-worker-2   220m         11%    2500Mi          62%
...

Viewing Pod Metrics:

To display CPU and memory usage for pods:

For pods in the current (default) namespace:

kubectl top pods

For pods in a specific namespace:

kubectl top pods -n <your-namespace>

For pods across all namespaces:

kubectl top pods --all-namespaces

Example Output (for a single namespace):

NAME                                 CPU(cores)   MEMORY(bytes)
my-app-deployment-7b5d8f9c4f-abcde   55m          130Mi
my-app-deployment-7b5d8f9c4f-fghij   60m          135Mi
another-service-pod-xyz123           100m         240Mi
...

For more command options, refer to the official kubectl top documentation.

Horizontal Pod Autoscaling (HPA)

Horizontal Pod Autoscaling (HPA) automatically scales the number of pods in a deployment, replica set, stateful set, or replication controller based on observed CPU utilization or other custom metrics.

How HPA Works

The HPA controller, a part of the Kubernetes control plane, periodically checks metrics from the Metrics Server. It compares these observed metrics against the target values you define in an HorizontalPodAutoscaler resource.

Scaling Out: If the observed metric (e.g., average CPU utilization) exceeds your target, the HPA controller increases the number of pod replicas.
Scaling In: If the observed metric falls below your target, the HPA controller decreases the number of pod replicas, but not below the minimum you’ve set.

This ensures your application maintains performance under load and conserves resources when demand is low.

Cloudfleet supports standard Kubernetes HPA. Configuration and management follow standard Kubernetes practices. The Metrics Server, which HPA relies on for CPU and memory data, is a core component of your Cloudfleet cluster.

This example creates an HPA that targets a Deployment named my-app-deployment. It aims to maintain an average CPU utilization of 60% across all its pods, scaling between 1 and 10 replicas.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
  namespace: default # Or your specific namespace
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app-deployment # Name of the Deployment to scale
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        # The target average CPU utilization over all the pods,
        # represented as a percentage of the requested CPU.
        averageUtilization: 60

To apply this example, save the YAML to a file (e.g., my-app-hpa.yaml) and apply it using kubectl:

kubectl apply -f my-app-hpa.yaml

You can check the status of your HPA using:

kubectl get hpa my-app-hpa

And get an output similar to this:

NAME         REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS     AGE
my-app-hpa   Deployment/my-app-deployment   25%/60%   1         10        3            15m

In this output you can see the following information:

NAME: The name of your HPA resource (my-app-hpa).
REFERENCE: The workload being scaled (Deployment/my-app-deployment).
TARGETS: The current metric utilization compared to the target. In this example, 25%/60% means the current average CPU utilization across pods is 25%, and the target is 60%. If no pods are running or metrics are not yet available, you might see /60%.
MINPODS: The minimum number of replicas configured (1).
MAXPODS: The maximum number of replicas configured (10).
REPLICAS: The current number of running replicas (3).

You can get more details about the HPA by describing it:

kubectl describe hpa my-app-hpa

For comprehensive details on HPA configuration, supported metrics (including custom and external metrics), and advanced features, refer to the official Kubernetes HPA documentation.

Scenario: HPA Scales Up and Node Autoprovisioning Responds

Understanding how HPA interacts with Cloudfleet’s Node Autoprovisioning feature is key to appreciating the fully automated scaling capabilities of your cluster. Here’s what happens when HPA scales up your application and the existing nodes lack capacity:

Increased Load & HPA Reaction: Your application experiences an increase in load (e.g., higher CPU usage). The HPA controller detects that the current average resource utilization (e.g., CPU) across your pods has exceeded the target defined in your HorizontalPodAutoscaler resource.
HPA Scales Out Pods: HPA decides to increase the number of pod replicas for your Deployment, ReplicaSet, or StatefulSet to meet the demand.
Pod Scheduling Attempt: The Kubernetes scheduler attempts to place these newly created pods onto available nodes in the cluster.
Insufficient Resources & Pending Pods: If the existing nodes do not have enough allocatable resources (CPU, memory, etc.) to accommodate one or more of the new pods, these pods will enter a Pending state. You can observe this by running kubectl get pods. The events for a pending pod (viewable via kubectl describe pod <pod-name>) might indicate reasons like FailedScheduling due to insufficient resources.
Node Auto-provisioning Detects Need: Cloudfleet’s Node Auto-provisioning feature constantly monitors for unschedulable pods. It identifies that pods are pending due to a lack of resources.
New Node Provisioning: Node Auto-provisioning checks its configuration (e.g., defined node pools, instance types, and scaling limits). If it determines that adding a new node would allow the pending pods to be scheduled, it initiates the process of creating a new virtual machine and adding it to your Kubernetes cluster. This node creation process typically takes a few minutes. During this time, the pods will remain Pending.
Node Joins Cluster: Once the new node is provisioned and successfully joins the Kubernetes cluster, it becomes available for scheduling workloads.
Pods Scheduled on New Node: The Kubernetes scheduler, now aware of the new node and its available resources, schedules the previously Pending pods onto this new node.
Application Capacity Increased: The new pods transition to a Running state. Your application now has increased capacity to handle the higher load, all achieved automatically.

This seamless integration between Horizontal Pod Autoscaling and Node Autoprovisioning ensures that your applications can scale efficiently from the pod level up to the infrastructure level without manual intervention, providing true elasticity and optimizing resource utilization.

Previous
← Node auto-provisioner

Next
Private container registries →