Control plane scalability

Cloudfleet provides a fully managed Kubernetes service, meaning we take on the operational burden of scaling and maintaining your cluster’s control plane. A well-performing control plane is the backbone of a healthy and responsive cluster, and our team ensures it can handle your growing demands. While Cloudfleet handles the underlying scaling, this guide provides insights into how different components interact and how you can optimize your usage patterns to get the best performance from your managed Kubernetes experience. This guide provides insights and best practices for understanding how your interactions with the API Server can affect the performance and scaling of other managed components like the Controller Manager and Scheduler.

Understanding API Server Scalability

The API Server is often the most heavily utilized component. Efficiently managing requests and interactions with it is key.

API Priority and Fairness (APF)

Imagine the Kubernetes API server as a busy intersection in a city. It’s the central point where all requests for your cluster – every kubectl command, every deployment update, every application query – must pass. Without a good traffic management system, this intersection could quickly become gridlocked, especially if one or two sources send a flood of requests. This is where Kubernetes’ API Priority and Fairness (APF) steps in, acting like an intelligent traffic controller for your cluster’s brain.

APF’s main job is to keep the API server responsive and stable for everyone. It does this by categorizing incoming requests into different priority levels and then using flow schemas to decide which priority a request gets. Think of it like having express lanes for emergency vehicles (critical system operations) while ensuring that regular traffic (user and application requests) still flows smoothly and fairly, without any single driver hogging all the lanes. This prioritization helps prevent any single user or poorly behaving application from overwhelming the API server and degrading performance for other critical cluster functions.

At Cloudfleet, we utilize APF to ensure a stable platform for all our users. To manage resources effectively and maintain fairness, especially on our Basic clusters, we have specific APF settings. This includes a particular priority class for basic cluster users which comes with defined concurrency shares (how many requests can be handled at once) and queuing mechanisms (what happens if too many requests arrive simultaneously). Pro cluster users, on the other hand, are not subject to these specific Basic cluster limitations. They benefit from higher, dedicated concurrency limits and their control plane operates with dedicated resources, ensuring their production workloads receive the priority and throughput they need.

Determining if APF is impacting your API requests involves observing specific symptoms and metrics. You might notice increased latency for your kubectl commands or even receive HTTP 429 “Too Many Requests” errors from the API server. If you suspect this, and you have the necessary permissions, you can peek at the API server’s metrics. By querying the /metrics endpoint (e.g., via kubectl get –raw /metrics), you can look for specific indicators. Keep in mind these commands can produce a lot of data, and in a production environment, you’d typically use a monitoring system like Prometheus to track these:

# Check for any rejected requests by APF (filter by 'priority_level' if needed)
kubectl get --raw /metrics | grep "apiserver_flowcontrol_rejected_requests_total"

# Check current in-queue requests (filter by 'priority_level' if needed)
kubectl get --raw /metrics | grep "apiserver_flowcontrol_current_inqueue_requests"

# Check request wait duration (filter by 'priority_level' if needed, shows buckets)
kubectl get --raw /metrics | grep "apiserver_flowcontrol_request_wait_duration_seconds_bucket"

If you see a significant number of queued or rejected requests that seem to correspond with your activity, it’s a good sign to review your API usage patterns (which we’ll discuss next). For Basic cluster users consistently hitting these limits, upgrading to Pro clusters can provide the higher throughput needed.

Optimizing Your Interactions with the API Server

While APF helps manage overall traffic, the way your own applications, scripts, and tools communicate with the API server plays a crucial role in its load and, consequently, your application’s performance. Adopting efficient API usage patterns is key. For instance, when you need to monitor changes to resources, it’s far more efficient to establish a WATCH connection rather than repeatedly polling with GET or LIST requests. A watch provides a continuous stream of updates, significantly reducing the number of individual requests. Similarly, when you do need to list resources, be specific. Avoid overly broad LIST requests like kubectl get pods –all-namespaces if you don’t need information from every namespace. Instead, leverage Field Selectors and Label Selectors to narrow down your query to only the objects you’re interested in. For very large result sets, use pagination (limit and continue parameters) to retrieve objects in manageable chunks. And if you only need the names of resources, requesting a specific output format like kubectl get … -o name is much lighter on the API server than fetching full YAML or JSON manifests. If you’re developing custom controllers or applications that interact heavily with the API server, implement client-side throttling. Kubernetes client libraries, such as client-go, often include built-in rate limiters that can help prevent your application from overwhelming the API. Also, try to avoid making many small, frequent updates to the same object; if possible, batch these into fewer, larger updates, while still being mindful of the object size limits enforced by the cluster’s storage backend (which Cloudfleet manages). Finally, be cautious with GET requests inside tight loops; if you find yourself repeatedly fetching the same object, consider caching it locally if it doesn’t change often, or use a WATCH for updates.

Considerations for Custom Resource Definitions (CRDs)

Custom Resource Definitions (CRDs) are a powerful way to extend the Kubernetes API, allowing you to define your own resource types. While invaluable, it’s important to be mindful of how CRDs can influence API server performance. Each CRD you introduce adds new API endpoints that the server must handle. A very large number of CRDs, or CRDs with extremely complex schemas or validation rules, can increase the API server’s memory footprint and potentially slow down operations like API discovery. When designing and using CRDs, aim for clarity and efficiency. Keep your CRD schemas as concise as practical for your use case. If your custom controllers interact with these CRDs, ensure they follow the efficient API usage patterns discussed earlier to minimize unnecessary load. Thoughtful CRD management contributes to a smoother experience for all API server interactions.

Impact of Your Workloads on other control plane components

While Cloudfleet ensures the Controller Manager and Scheduler are appropriately scaled, the nature and volume of your workloads can influence their operational efficiency. Users should be aware of a few key aspects:

Volume of Objects: A very high number of Kubernetes objects (like Pods, Deployments, Services, Endpoints, etc.) naturally increases the amount of work these components need to perform. The Controller Manager must watch and reconcile a larger set of resources, and the Scheduler has a wider array of Pods and Nodes to consider when making placement decisions.
Controller Efficiency: If you develop custom controllers, ensure they use efficient selectors and minimize unnecessary API calls to avoid placing undue strain on the system.
Pod Churn: Creating a very large number of very short-lived Pods (high churn) can also load these components, as they constantly process creation and deletion events.
Resource Definitions: Clearly defining resource requests and limits for your Pods is crucial. This allows the Scheduler to make optimal placement decisions, contributing to overall cluster stability and efficient resource utilization.

Being mindful of these factors when designing and deploying your applications can help ensure that the managed Controller Manager and Scheduler operate as efficiently as possible. Understanding how your workloads interact with the managed Kubernetes control plane is key to a robust and performant cluster. By following best practices in API usage and workload design, you contribute to the overall stability and efficiency that Cloudfleet’s managed service provides. Our APF configurations, including specific considerations for the Basic cluster, alongside the dedicated resources and higher limits for Pro cluster users, are designed to offer a stable platform for all. Should you encounter specific scaling challenges or have further questions, the Cloudfleet support team is available to assist.

Previous
← Kubernetes versions and upgrades