Running Watchful Scale in Production

Overview

The Watchful product is a set of containerized applications designed to work together to allow a team of annotators to quickly and efficiently label data. The product consists of two parts - the application (APP) and the hub (HUB). Both components are distributed as docker images stored in AWS ECR and can be run using the container orchestration tool of your choice.

Versioning

We follow the SemVer versioning system using the <major>.<minor>.<patch> pattern. All updates will be announced with Release Notes to let you know what has changed. Major and minor updates will also contain Upgrade notes, if necessary.

Pulling the Images

ComponentURL
App610410161133.dkr.ecr.us-west-1.amazonaws.com/production/watchful:{TAG}
Hub610410161133.dkr.ecr.us-west-1.amazonaws.com/production/watchful-hub:{TAG}

We do not offer a latest tag on the images. You must specify the major, minor, and patch versions when pulling images.

This guide assumes you have downloaded and configured the AWS CLI and Docker.

## Login
aws ecr get-login-password --region us-west-1 | \
    docker login \
    --username AWS \
    --password-stdin \
    610410161133.dkr.ecr.us-west-1.amazonaws.com

## Pull the image
docker pull \
  610410161133.dkr.ecr.us-west-1.amazonaws.com/production/watchful-hub:3.0.1234

Updates

The Watchful product is designed to be run on air-gapped systems as well as those connected to the outside world. As we publish new versions, the Customer Support team will notify you of the new version and you can always check the Product Release Notes for the latest version and what is in it.

To take the new version, you must update the tag you’re using to pull the images from ECR. We will never push updates to your system.

Recommendations

When updating the App or Hub, we recommend taking the following steps:

  1. Inform users of an impending update and ask them to stop any active sessions
  2. Backup the persistent volume storage attached to each instance
  3. Update your manifest files to point to the new tagged version
  4. Restart the running containers
  5. Verify persistent volumes are re-attached as necessary
  6. Inform users they can restart their sessions (they will need to re-authenticate and then reload projects)

Logs

Logs for both the App and the Hub are shipped to STDERR. You can change the log level at runtime for both the App and the Hub by setting the WATCHFUL_LOG environment variable (described later in this document).

Running the App

The App is the core of the Watchful product and contains the all the components necessary for an individual to complete a labeling task. The App can be run in “standalone” mode and does not require the Hub. Without the Hub, however, sharing projects will not be available.

The App is a self-contained application written in Rust containing a server, the core functionality, and a frontend that are all compiled and distributed as a single binary within the Docker image.

Requirements

  1. Each unique Watchful user requires a dedicated App instance. Do not attempt to put App instances behind a load balancer or otherwise share instances by different people during concurrent sessions.
  2. The Watchful App works by keeping project state in memory and by writing project files to the local file system. Ensure app instances are durable (NOT ephemeral) and are backed by persistent storage.
  3. For general workloads, App instances require a minimum of four (4) dedicated CPU cores. To determine memory requirements, look at the largest data set you’ll label and allocate 4x that amount in RAM. For a 200mb project, you’ll need at least 800mb of RAM per instance.

Docker

Running the app via Docker is a great way to work on a dataset locally. To run the app,

## Login
aws ecr get-login-password --region us-west-1 | \
    docker login \
        --username AWS \
        --password-stdin \
        610410161133.dkr.ecr.us-west-1.amazonaws.com

## Run the Container
docker run \
	--name watchful \
  -d -p 9001:9001 \
  -v ~/watchful:/root/watchful \
  -e WATCHFUL_LOG=info \
	610410161133.dkr.ecr.us-west-1.amazonaws.com/production/watchful:3.0.1234

The following environment variables are available when running the App:

NameRequiredKeyDescription
Log levelNWATCHFUL_LOGSet the log level you desire. Available options are info, trace, error, warn, and debug. The default is info.

Running the Hub

What is Watchful Hub?

The Hub is responsible for facilitating project collaboration, user management, and sharing. It is a self-contained application written in Rust containing a server, the core functionality, and a frontend that are all compiled and distributed as a single binary within the Docker image.

Requirements

  1. The Watchful Hub requires a durable file system for project storage. Additionally, it uses a SQLite database to track users and permissions. Ensure the attached volume will persist after the container restarts.
  2. The Hub is designed to be stateless and can handle restarts. However, like the app, do not run the Hub behind a load balancer as both the database and file system access need to be dedicated to the single running Hub instance.
  3. For general workloads, Hub instances require a single dedicated CPU core and 1GB of RAM. Provision enough disk storage to handle 4x the total volume of projects you’ll run across all App instances.

Docker

Running the Hub via Docker is fairly straightforward, though won’t open much functionality on its own. To run the Hub via Docker,

## Login
aws ecr get-login-password --region us-west-1 | \
    docker login \
    --username AWS \
    --password-stdin \
    610410161133.dkr.ecr.us-west-1.amazonaws.com

## Run the container
docker run \
  --name watchful-hub \
  -d -p 9005:9005 \
  -v ~/hub:/root \
  -e DATABASE_URL=/root/watchful.db \
  -e SHARED_SECRET=$(openssl rand -hex 36) \
  -e AWS_ACCESS_KEY_ID=OMITTED \
  -e AWS_SECRET_ACCESS_KEY=OMITTED \
  -e AWS_REGION=OMITTED \
  -e AWS_CUSTOMER_BUCKET=OMITTED \
  -e WATCHFUL_LOG=info \
  610410161133.dkr.ecr.us-west-1.amazonaws.com/production/watchful-hub:3.0.1234

Note:

When running Watchful within Azure, be sure to pay attention to the recommended mount options for Azure Kubernetes found here. Specifically, use thenobrl mount option. Failure to do so can result in a "database is locked" error that is the result of a windows explorer process sending a byte-range lock to the sqlite database file.

The following environment variables are available when running the Hub

NameRequiredKeyDescription
Database URLYDATABASE_URLPath to the SQLite database. This must be a path to the mounted, durable storage volume or you will lose data every time you update the hub.
Shared SecretYSHARED_SECRETA seed value used in authenticating and authorizing users. Any sufficient length random string is fine for a value.
$(openssl rand -hex 36) will generate a secure random string for you.
AWS Access KeyNAWS_ACCESS_KEY_IDOptional. Set this value if you want to use AWS S3 Buckets for data import + export.
AWS secret keyNAWS_SECRET_ACCESS_KEYOptional. Set this value if you want to use AWS S3 Buckets for data import + export.
AWS storage regionNAWS_REGIONOptional. Set this value if you want to use AWS S3 Buckets for data import + export.
AWS customer bucketNAWS_CUSTOMER_BUCKETOptional. Set this value if you want to use AWS S3 Buckets for data import + export.
Log levelNWATCHFUL_LOGSet the log level you desire. Available options are info, trace, error, warn, and debug. The default is info.

Common deployment scenarios

The following sections show how to run an App and Hub instance in both Kubernetes and via Docker Compose

Docker Compose

version: '3'

services:
  hub:
    image: 610410161133.dkr.ecr.us-west-1.amazonaws.com/production/watchful-hub:3.0.1234
    ports:
      - "9005"
    volumes:
      - ~/hub/:/root/remote/
    environment:
      - CUSTOMER_ID=thisisanid
      - WATCHFUL_KEY=thisisakey
      - WATCHFUL_SECRET=thisisasecret
      - SHARED_SECRET=thisisnotasecurenorrandomsecret
      - DATABASE_URL=/root/watchful.db
      - AUTH_CONF=/app/auth/auth_model.conf
  app:
    image: 610410161133.dkr.ecr.us-west-1.amazonaws.com/production/watchful:3.0.1234
    ports:
      - "9001:9001"
    volumes:
      - ~/watchful/:/root/watchful/

Kubernetes

apiVersion: v1
kind: Namespace
metadata:
  labels:
    app: watchful
  name: watchful
---
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    app: watchful
  name: watchful-secret-robot
  namespace: watchful
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  labels:
    app: watchful
  name: role-watchful-secret-robot
  namespace: watchful
rules:
- apiGroups:
  - ""
  resources:
  - secrets
  verbs:
  - create
  - delete
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  labels:
    app: watchful
  name: role-watchful-secret-robot-binding
  namespace: watchful
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: role-watchful-secret-robot
subjects:
- kind: ServiceAccount
  name: watchful-secret-robot
  namespace: watchful
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: watchful
  name: watchful-app
  namespace: watchful
spec:
  ports:
  - name: app-port
    port: 9001
  selector:
    app: watchful
  type: LoadBalancer
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: watchful
  name: watchful-hub
  namespace: watchful
spec:
  ports:
  - name: hub-port
    port: 9005
  selector:
    app: watchful
  type: LoadBalancer
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  labels:
    app: watchful
  name: watchful-app-pvc
  namespace: watchful
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  labels:
    app: watchful
  name: watchful-hub-pvc
  namespace: watchful
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: watchful
  name: watchful-app
  namespace: watchful
spec:
  replicas: 1
  selector:
    matchLabels:
      app: watchful
  template:
    metadata:
      labels:
        app: watchful
    spec:
      automountServiceAccountToken: false
      containers:
      - image: 610410161133.dkr.ecr.us-west-1.amazonaws.com/production/watchful:3.0.1234
        imagePullPolicy: Always
        name: watchful-app
        ports:
        - containerPort: 9001
        volumeMounts:
        - mountPath: /root/watchful
          name: watchful-app-storage
      volumes:
      - name: watchful-app-storage
        persistentVolumeClaim:
          claimName: watchful-app-pvc
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: watchful
  name: watchful-hub
  namespace: watchful
spec:
  replicas: 1
  selector:
    matchLabels:
      app: watchful
  template:
    metadata:
      labels:
        app: watchful
    spec:
      automountServiceAccountToken: false
      containers:
      - env:
        - name: DATABASE_URL
          value: /root/watchful.db
        - name: SHARED_SECRET
          value: <insert_value>
        - name: AWS_CUSTOMER_BUCKET
          value: null
        - name: AWS_ACCESS_KEY_ID
          value: null
        - name: AWS_SECRET_ACCESS_KEY
          value: null
        - name: AWS_REGION
          value: us-west-1
        image: 610410161133.dkr.ecr.us-west-1.amazonaws.com/production/watchful-hub:3.0.1234
        imagePullPolicy: Always
        livenessProbe:
          httpGet:
            path: /ruok
            port: 9005
          periodSeconds: 5
        name: watchful-hub
        ports:
        - containerPort: 9005
        volumeMounts:
        - mountPath: /root
          name: watchful-hub-storage
      volumes:
      - name: watchful-hub-storage
        persistentVolumeClaim:
          claimName: watchful-hub-pvc
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
  labels:
    app: watchful
  name: ecr-cred-helper
  namespace: watchful
spec:
  concurrencyPolicy: Allow
  failedJobsHistoryLimit: 1
  jobTemplate:
    metadata:
      creationTimestamp: null
      labels:
        app: watchful
    spec:
      template:
        metadata:
          creationTimestamp: null
          labels:
            app: watchful
        spec:
          containers:
          - command:
            - /bin/sh
            - -c
            - |-
              SECRET_NAME=watchful-${AWS_REGION}-ecr-registry
              [email protected]
              TOKEN=`aws ecr get-login --region ${AWS_REGION} --registry-ids ${ACCOUNT_ID} | cut -d' ' -f6`
              echo "ENV variable setup done"
              kubectl delete secret --ignore-not-found $SECRET_NAME
              kubectl create secret docker-registry $SECRET_NAME \
              --docker-server=https://${ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com \
              --docker-username=AWS \
              --docker-password="${TOKEN}" \
              --docker-email="${EMAIL}"
              echo "Secret created with name: $SECRET_NAME"
              echo "Finished"
            env:
            - name: ACCOUNT_ID
              value: "610410161133"
            - name: AWS_REGION
              value: us-west-1
            - name: AWS_ACCESS_KEY_ID
              value: < Watchful, Inc. will provide this value to you >
            - name: AWS_SECRET_ACCESS_KEY
              value: < Watchful, Inc. will provide this value to you >

            image: odaniait/aws-kubectl:latest
            imagePullPolicy: IfNotPresent
            name: ecr-cred-helper
            resources: {}
            securityContext:
              capabilities: {}
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
          dnsPolicy: Default
          hostNetwork: true
          restartPolicy: Never
          schedulerName: default-scheduler
          securityContext: {}
          serviceAccountName: watchful-secret-robot
          terminationGracePeriodSeconds: 30
  schedule: 0 */6 * * *
  successfulJobsHistoryLimit: 3
  suspend: false