CPU-based autoscaling on the StatefulSet path, with automatic
tenant-side cleanup on scale-down.
Prerequisites
workload.type: statefulset(HPA doesn’t make sense for a
DaemonSet — one pod per node is the design).enrollment.mode: apiso the chart can self-enrol new replicas
without operator intervention.- A working metrics-server in the cluster
(kubectl top pods -n npa-publishershould return data). All
managed Kubernetes flavours ship it by default; bare-metal
clusters may need installation.
Enable
1 | workload: |
Apply with the usual helm upgrade --install. The chart renders aHorizontalPodAutoscaler targeting the StatefulSet:
1 | $ kubectl get hpa -n npa-publisher |
What happens on scale-up
- HPA observes average CPU above target → patches the StatefulSet’s
replicas. - Kubernetes creates a new pod (
<release>-N). - The new pod runs
npa-bootstrap, calls the Netskope API withcommonName-<pod-name>, gets a publisher_id, enrols. - Once
NPACONNECTEDshows up in the publisher logs, Netskope
load-balancers route new private-app sessions to the new
replica.
No manual token shuffling. Each replica is an independent
identity in the Netskope console.
What happens on scale-down
- HPA observes CPU below target → patches replicas down.
- Kubernetes terminates the highest-ordinal pod first.
- The Publisher record stays in the Netskope tenant. By default
the chart does not delete it — see the warning below.
⚠️ Why scale-down doesn’t auto-delete by default
The Netskope API refuses to delete a Publisher that has Private
Apps attached (/api/v2/infrastructure/publishers/{id}returns
an error). Many tenants attach apps to every Publisher in a
region for load balancing — including the auto-scaled replicas.
An automatic DELETE would silently fail on those, leaving both
orphan Publisher records and stranded app attachments to chase.Reconciling orphans periodically via the delete-publisher
flow is safer than racing the lifecycle.
Opt-in: auto-delete on scale-down
If you’re certain auto-scaled replicas never carry app
assignments — e.g. you only attach apps to a fixed baseline
Publisher and let the scaled replicas inherit traffic via DTLS
load-balancing only:
1 | enrollment: |
The pod’s preStop hook then fires on termination:
1 | # Inside the pod, at termination time: |
The hook is best-effort. It exits 0 on any failure (including
the “Publisher has apps attached” rejection) so the pod terminates
promptly. If you turn this on and your assumption later changes,
the worst outcome is silent orphans rather than blocked pods.
Why CPU and not tunnel-count
CPU is what’s available without extra infrastructure. The Publisher
does report active SNAT connection counts (num_snat_conns)
internally, but it doesn’t expose them as Prometheus metrics — they
go up to the Netskope stitcher control plane instead. Wiring those
into HPA would require a sidecar that reads the internalpublisher_metrics file and serves Prometheus, plusprometheus-adapter or KEDA in the cluster. That’s tracked on the
roadmap.
In practice, CPU tracks tunnel count well enough: more active
sessions → more packet processing → more CPU. TunetargetCPUUtilizationPercentage based on observed load.
Tuning scale policies
The HPA behavior block (Kubernetes v2 spec)
lets you slow down scale-up or scale-down. Useful because
Publisher enrollment takes ~30–60 seconds, so you may want to
delay scale-up reactions to avoid flapping:
1 | hpa: |