Getting Started

How to get a VoiceGen Server running on your system

Using Cobalt VoiceGen

A typical VoiceGen release, provided as a compressed archive, will contain a linux binary (voicegen-server) for the required native CPU architecture, appropriate Dockerfile and models.
Cobalt VoiceGen runs either locally on linux or using Docker.
Cobalt VoiceGen will serve the GRPC API on port 2727. A web demo will be enabled on port 8080.
To quickly try out VoiceGen, first start the server as shown below and open the web demo at http://localhost:8080 in your browser to input text and play / download synthesized audio. You can also use the SDK in your preferred language to use VoiceGen from the command line or within your application.

Info

The cobalt.license.key file will be provided separately that must be copied into the directory resulting from decompressing the archive. Please do this before running the steps below.

Running VoiceGen Server Locally on Linux

./voicegen-server

By default, the binary assumes the presence of a configuration file, located in the same directory, named: voicegen-server.cfg.toml. A different config file may be specified using the --config argument.

Running VoiceGen Server as a Docker Container

To build and run the Docker image for VoiceGen, run:

docker build -t cobalt-voicegen .
docker run -p 2727:2727 -p 8080:8080 cobalt-voicegen

How to Get a Copy of the VoiceGen Server and Models

The release you will receive is a compressed archive (tar.bz2) and is generally structured accordingly:

release.tar.bz2
├── COPYING
├── README.md
├── voicegen-server
├── voicegen-server.cfg.toml
├── Dockerfile
├── models
│   └── en_US-multispeaker-22050hz
│
└── cobalt.license.key [ provided separately, needs to be copied over ]

The README.md file contains information about this release and instructions for how to start the server on your system.
The voicegen-server is the server program which is configured using the voicegen-server.cfg.toml file.
The Dockerfile can be used to create a container that will let you run VoiceGen server on non-linux systems such as MacOS and Windows.
The models directory contains the speech synthesis models. The content of these directory will depend on the models you are provided.

System Requirements

Cobalt VoiceGen runs on Linux. You can run it directly as a linux application.

You can evaluate the product on Windows or Linux using Docker Desktop but we would not recommend this setup for use in a production environment.

A Cobalt VoiceGen release typically includes a single model together with binaries and config files. VoiceGen models may take up to 250MB of disk space, and need a minimum of 2GB RAM when evaluating locally. For production workloads, we recommend configuring containerized applications with each instance allocated with 4 CPUs and 4GB RAM.

Cobalt VoiceGen runs on x86_64 CPUs. We also support Arm64 CPUs, including processors such as the Graviton (AWS c7g EC2 instances). VoiceGen is significantly more cost effective to run on C7g instances compared to similarly sized Intel or AMD processors, and we can provide you an Arm64 release on request.

To integrate Cobalt VoiceGen into your application, please follow the next steps to install or generate the SDK in a language of your choice.