Prerequisites
Info
Please only run one container instance per machine. Running multiple containers results in vastly reduced performance.The following prerequisites are required to run the container:
- Container engine, such as Docker (can be installed using the official instructions )
- (GPU only) Nvidia Container Toolkit with Nvidia driver version 515 or higher (can be installed using the following installation guide )
All other dependencies, such as CUDA are included with the container and don’t need to be installed separately.
System Requirements
- Docker and
docker-composeinstalled.
The image comes in two different build flavours:
- A compact, CPU-only container that runs on any Intel or AMD CPU and a container with GPU acceleration. The CPU container is highly optimised for the majority of use cases, as the container uses hand-coded AMX/AVX2/AVX512/AVX512 VNNI instructions in conjunction with Neural Network compression techniques to deliver a ~25X speedup over a reference transformer-based system.
- A GPU container is designed for large-scale deployments making billions of API calls or processing terabytes of data per month.
Minumum Requirements
The minimum system requirements for the container image are as follows:
| Minimum | Recommended (Text only) | Recommended (All Features) | Recommended Concurrency | |
|---|---|---|---|---|
| CPU | Any x86 (Intel or AMD) processor with 7.5GB free RAM and 50GB disk volume | Intel Sapphire Rapids or newer CPUs supporting AMX with 16GB RAM and 50GB disk volume | Intel Sapphire Rapids or newer CPUs supporting AMX with 64GB RAM and 100GB disk volume | 1 |
| GPU | Any x86 (Intel or AMD) processor with 28GB free RAM. Nvidia GPU with compute capability 7.0 or higher (Volta or newer) and at least 16GB VRAM. 100GB disk volume | Any x86 (Intel or AMD) processor with 32GB RAM and Nvidia Tesla T4 GPU. 100GB disk volume | Any x86 (Intel or AMD) processor with 64GB RAM and Nvidia Tesla T4 GPU. 100GB disk volume | 32 |
Recommended Requirements
While CPU-based container will run on any x86-compatible instance, the below cloud instance types give optimal throughput and latency per dollar:
| Platform | Recommended Instance Type (Text only) | Recommended Instance Type (All Features) |
|---|---|---|
| Azure | Standard_E2_v5 (2 vCPUs, 16GB RAM) | Standard_E8_v5 (8 vCPUs, 64GB RAM) |
| AWS | M7i.large (2 vCPUs, 8GB RAM) | m7i.4xlarge (16 vCPUs, 64GB RAM) |
| GCP | N2-Standard-2 (2 vCPUs, 8GB RAM) | N2-Standard-16 (16 vCPUs, 64GB RAM) |
- In the event when a lower latency is required, the instance type should be scaled; e.g. using an M7i.xlarge in place of a M7i.large. While the Cobalt docker solution can make use of all available CPU cores, it delivers best throughput per dollar using a single CPU core machine. Scaling CPU cores does not result in a linear increase in performance.
Similarly for the GPU-based image, it is recommended the following Nvidia T4 GPU-equipped instance types:
| Platform | Recommended Instance Type (Text only) | Recommended Instance Type (All Features) |
|---|---|---|
| Azure | Standard_NC8as_T4_v3 | Standard_NC8as_T4_v3 |
| AWS | G4dn.2xlarge | G4dn.4xlarge |
| GCP | N1-Standard-8 + Tesla T4 | N1-Standard-16 + Tesla T4 |