Skip to main content

Provisioning compute

In this section we will configure Karpenter to allow the creation of Inferentia and Trainium EC2 instances. Karpenter can detect the pending Pods that require an inf2 or trn1 instance. Karpenter will then launch the required instance to schedule the Pod.

tip

You can learn more about Karpenter in the Karpenter module that's provided in this workshop.

Karpenter has been installed in our EKS cluster, and runs as a Deployment:

~$kubectl get deployment -n kube-system
NAME        READY   UP-TO-DATE   AVAILABLE   AGE
...
karpenter   2/2     2            2           11m

Karpenter requires a NodePool to provision nodes. This is the Karpenter NodePool that we will create:

~/environment/eks-workshop/modules/aiml/inferentia/nodepool/nodepool.yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: aiml
spec:
template:
metadata:
labels:
instanceType: "neuron"
provisionerType: "karpenter"
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values:
- on-demand
- key: karpenter.k8s.aws/instance-family
operator: In
values:
- inf2
- trn1
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: aiml
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: aiml
spec:
amiFamily: AL2023
amiSelectorTerms:
- alias: al2023@latest
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
deleteOnTermination: true
volumeSize: 100Gi
volumeType: gp3
iops: 16000
throughput: 1000
role: ${KARPENTER_NODE_ROLE}
userData: |
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="//"

--//
Content-Type: text/x-shellscript; charset="us-ascii"

#!/bin/bash
sed -i "s/^max_concurrent_downloads_per_image = .*$/max_concurrent_downloads_per_image = 10/" /etc/soci-snapshotter-grpc/config.toml
sed -i "s/^max_concurrent_unpacks_per_image = .*$/max_concurrent_unpacks_per_image = 10/" /etc/soci-snapshotter-grpc/config.toml

--//
Content-Type: application/node.eks.aws

apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
featureGates:
FastImagePull: true
--//
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: ${EKS_CLUSTER_NAME}
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: ${EKS_CLUSTER_NAME}
tags:
app.kubernetes.io/created-by: eks-workshop
A

In this section we assign what instances this NodePool is allowed to provision for us

B

You can see here that we've configured this NodePool to only allow the creation of inf2 and trn1 instances

Apply the NodePool and EC2NodeClass manifest:

~$kubectl kustomize ~/environment/eks-workshop/modules/aiml/inferentia/nodepool \
| envsubst | kubectl apply -f-

Now the NodePool is ready for the creation of our training and inference Pods.