Prerequisites and System Requirements

Detailed information for the system requirements.

Prerequisites

Please only run one container instance per machine. Running multiple containers results in vastly reduced performance.

The following prerequisites are required to run the container:

Container engine, such as Docker (can be installed using the official instructions )
(GPU only) Nvidia Container Toolkit with Nvidia driver version 515 or higher (can be installed using the following installation guide )

All other dependencies, such as CUDA are included with the container and don’t need to be installed separately.

The image comes in two different build flavours:

A compact, CPU-only container that runs on any Intel or AMD CPU and a container with GPU acceleration. The CPU container is highly optimised for the majority of use cases, as the container uses hand-coded AMX/AVX2/AVX512/AVX512 VNNI instructions in conjunction with Neural Network compression techniques to deliver a ~25X speedup over a reference transformer-based system.
A GPU container is designed for large-scale deployments making billions of API calls or processing terabytes of data per month.

The minimum system requirements for the container image are as follows:

	Minimum	Recommended (Text only)	Recommended (All Features)	Recommended Concurrency
CPU	Any x86 (Intel or AMD) processor with 7.5GB free RAM and 50GB disk volume	Intel Sapphire Rapids or newer CPUs supporting AMX with 16GB RAM and 50GB disk volume	Intel Sapphire Rapids or newer CPUs supporting AMX with 64GB RAM and 100GB disk volume	1
GPU	Any x86 (Intel or AMD) processor with 28GB free RAM. Nvidia GPU with compute capability 7.0 or higher (Volta or newer) and at least 16GB VRAM. 100GB disk volume	Any x86 (Intel or AMD) processor with 32GB RAM and Nvidia Tesla T4 GPU. 100GB disk volume	Any x86 (Intel or AMD) processor with 64GB RAM and Nvidia Tesla T4 GPU. 100GB disk volume	32

While CPU-based container will run on any x86-compatible instance, the below cloud instance types give optimal throughput and latency per dollar:

Platform	Recommended Instance Type (Text only)	Recommended Instance Type (All Features)
Azure	Standard_E2_v5 (2 vCPUs, 16GB RAM)	Standard_E8_v5 (8 vCPUs, 64GB RAM)
AWS	M7i.large (2 vCPUs, 8GB RAM)	m7i.4xlarge (16 vCPUs, 64GB RAM)
GCP	N2-Standard-2 (2 vCPUs, 8GB RAM)	N2-Standard-16 (16 vCPUs, 64GB RAM)

Notes:

In the event when a lower latency is required, the instance type should be scaled; e.g. using an M7i.xlarge in place of a M7i.large. While the Cobalt docker solution can make use of all available CPU cores, it delivers best throughput per dollar using a single CPU core machine. Scaling CPU cores does not result in a linear increase in performance.

Similarly for the GPU-based image, it is recommended the following Nvidia T4 GPU-equipped instance types:

Platform	Recommended Instance Type (Text only)	Recommended Instance Type (All Features)
Azure	Standard_NC8as_T4_v3	Standard_NC8as_T4_v3
AWS	G4dn.2xlarge	G4dn.4xlarge
GCP	N1-Standard-8 + Tesla T4	N1-Standard-16 + Tesla T4