Configuring the chat bot
The sample retail application includes a built-in chat interface that allows customers to interact with the store using natural language. This feature can help customers find products, get recommendations, or answer questions about store policies. For this module, we'll configure this chat component to use our Mistral-7B model served through vLLM.
Let's reconfigure the UI component to enable the chat bot functionality and point it to our vLLM endpoint:
- Kustomize Patch
- Deployment/ui
- Diff
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../../../base-application/ui
patches:
- path: deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/created-by: eks-workshop
app.kubernetes.io/type: app
name: ui
namespace: ui
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/component: service
app.kubernetes.io/instance: ui
app.kubernetes.io/name: ui
template:
metadata:
annotations:
prometheus.io/path: /actuator/prometheus
prometheus.io/port: "8080"
prometheus.io/scrape: "true"
labels:
app.kubernetes.io/component: service
app.kubernetes.io/created-by: eks-workshop
app.kubernetes.io/instance: ui
app.kubernetes.io/name: ui
spec:
containers:
- env:
- name: RETAIL_UI_CHAT_ENABLED
value: "true"
- name: RETAIL_UI_CHAT_PROVIDER
value: openai
- name: RETAIL_UI_CHAT_MODEL
value: /models/mistral-7b-v0.3
- name: RETAIL_UI_CHAT_OPENAI_BASE_URL
value: http://mistral.vllm:8080
- name: JAVA_OPTS
value: -XX:MaxRAMPercentage=75.0 -Djava.security.egd=file:/dev/urandom
- name: METADATA_KUBERNETES_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: METADATA_KUBERNETES_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: METADATA_KUBERNETES_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
envFrom:
- configMapRef:
name: ui
image: public.ecr.aws/aws-containers/retail-store-sample-ui:1.2.1
imagePullPolicy: IfNotPresent
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 45
periodSeconds: 20
name: ui
ports:
- containerPort: 8080
name: http
protocol: TCP
resources:
limits:
memory: 1.5Gi
requests:
cpu: 250m
memory: 1.5Gi
securityContext:
capabilities:
add:
- NET_BIND_SERVICE
drop:
- ALL
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
volumeMounts:
- mountPath: /tmp
name: tmp-volume
securityContext:
fsGroup: 1000
serviceAccountName: ui
volumes:
- emptyDir:
medium: Memory
name: tmp-volume
app.kubernetes.io/name: ui
spec:
containers:
- env:
+ - name: RETAIL_UI_CHAT_ENABLED
+ value: "true"
+ - name: RETAIL_UI_CHAT_PROVIDER
+ value: openai
+ - name: RETAIL_UI_CHAT_MODEL
+ value: /models/mistral-7b-v0.3
+ - name: RETAIL_UI_CHAT_OPENAI_BASE_URL
+ value: http://mistral.vllm:8080
- name: JAVA_OPTS
value: -XX:MaxRAMPercentage=75.0 -Djava.security.egd=file:/dev/urandom
- name: METADATA_KUBERNETES_POD_NAME
valueFrom:
This configuration makes the following important changes:
- Enables the chat bot component in the UI interface
- Configures the application to use the OpenAI model provider, which works with vLLM's OpenAI-compatible API
- Specifies the appropriate model name, which is required by the OpenAI endpoint format
- Sets the endpoint URL to
http://mistral.vllm:8080
, connecting to our Kubernetes Service for the vLLM Deployment
Let's apply these changes to our running application:
namespace/ui unchanged
serviceaccount/ui unchanged
configmap/ui unchanged
service/ui unchanged
deployment.apps/ui configured
With these changes applied, the UI will now display a chat interface that connects to our locally deployed language model. In the next section, we'll test this configuration to see our AI-powered chat bot in action.
While the UI is now configured to use the vLLM endpoint, the model needs to be fully loaded before it can respond to requests. If you encounter any delays or errors when testing, this may be because the model is still being initialized.