This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

VoiceBio

Low latency, high accuracy on-prem / on-cloud solutions for speaker verification and identification.

1: Getting Started

2: Generating SDKs

3: Connecting to the Server

4: Streaming Enrollment

5: Streaming Verification

6: Streaming Identification

7: Comparing Voiceprints

8: Vectorizing Voiceprints

9: API Reference

1 - Getting Started

How to get a VoiceBio Server running on your system

Using Cobalt VoiceBio

A typical VoiceBio release, provided as a compressed archive, will contain a linux binary (voicebio-server) for the required native CPU architecture, appropriate Dockerfile and models.
Cobalt VoiceBio runs either locally on linux or using Docker.
Cobalt VoiceBio will serve the VoiceBio GRPC API on port 2727.
To quickly try out VoiceBio, first start the server as shown below and use the SDK in your preferred language to use VoiceBio from the command line or within your application.

Info

The cobalt.license.key file will be provided separately that must be copied into the directory resulting from decompressing the archive. Please do this before running the steps below.

Running VoiceBio Server Locally on Linux

./voicebio-server

By default, the binary assumes the presence of a configuration file, located in the same directory, named: voicebio-server.cfg.toml. A different config file may be specified using the --config argument.

Running VoiceBio Server as a Docker Container

To build and run the Docker image for VoiceBio, run:

docker build -t cobalt-voicebio .
docker run -p 2727:2727 -p 8080:8080 cobalt-voicebio

How to Get a Copy of the VoiceBio Server and Models

The release you will receive is a compressed archive (tar.bz2) and is generally structured accordingly:

release.tar.bz2
├── COPYING
├── README.md
├── voicebio-server
├── voicebio-server.cfg.toml
├── Dockerfile
├── models
│   └── en_US-16khz
│
└── cobalt.license.key [ provided separately, needs to be copied over ]

The README.md file contains information about this release and instructions for how to start the server on your system.
The voicebio-server is the server program which is configured using the voicebio-server.cfg.toml file.
The Dockerfile can be used to create a container that will let you run VoiceBio server on non-linux systems such as MacOS and Windows.
The models directory contains the speaker ID models. The content of these directory will depend on the models you are provided.

System Requirements

Cobalt VoiceBio runs on Linux. You can run it directly as a linux application.

You can evaluate the product on Windows or Linux using Docker Desktop but we would not recommend this setup for use in a production environment.

A Cobalt VoiceBio release typically includes a single VoiceBio model together with binaries and config files. The general purpose VoiceBio models take up to 100MB of disk space, and need a minimum of 2GB RAM when evaluating locally. For production workloads, we recommend configuring containerized applications with each instance allocated with 4 CPUs and 4GB RAM.

Cobalt VoiceBio runs on x86_64 CPUs. We also support Arm64 CPUs, including processors such as the Graviton (AWS c7g EC2 instances). VoiceBio is significantly more cost effective to run on C7g instances compared to similarly sized Intel or AMD processors, and we can provide you an Arm64 release on request.

To integrate Cobalt VoiceBio into your application, please follow the next steps to install or generate the SDK in a language of your choice.

2 - Generating SDKs

Gives instructions about how to generate an SDK for your project from the proto API definition.

APIs for all Cobalt’s services are defined as a protocol buffer specification or simply a proto file and be found in the cobaltspeech/proto github repository.
The proto file allows a developer to auto-generate client SDKs for a number of different programming languages. Step by step instructions for generating your own SDK can be found below.
We provide pre-generated SDKs for a couple of languages. You can choose to use these instead of generating your own. These are listed here along with instructions on how to install / import them into your projects.

Pre-generated SDKs

Golang

Pre-generated SDK files for Golang can be found in the cobaltspeech/go-genproto repo
To use it in your Go project, simply import it:

import voicebiopb "github.com/cobaltspeech/go-genproto/cobaltspeech/voicebio/v1"

An example client using the above repo can be found here.

Python

Pre-generated SDK files for Python can be found in the cobaltspeech/py-genproto repo
The Python SDK depends on Python >= 3.5. You may use pip to perform a system-wide install, or use virtualenv for a local install. To use it in your Python project, install it:

pip install --upgrade pip
pip install "git+https://github.com/cobaltspeech/py-genproto"

Generating SDKs

Step 1. Installing `buf`

To work with proto files, we recommend using buf, a user-friendly command line tool that can be configured generate documentation, schemas and SDK code for different languages.

# Latest version as of March 14th, 2023.

COBALT="${HOME}/cobalt"
  mkdir -p "${COBALT}/bin"

VERSION="1.15.1"
URL="https://github.com/bufbuild/buf/releases/download/v${VERSION}/buf-$(uname -s)-$(uname -m)"
  curl -L ${URL} -o "${COBALT}/bin/buf"

# Give executable permissions and adding to $PATH.

chmod +x "${COBALT}/bin/buf"
  export PATH="${PATH}:${COBALT}/bin"

brew install bufbuild/buf/buf

Step 2. Getting `proto` files

Clone the cobaltspeech/proto repository:

COBALT="${HOME}/cobalt"
mkdir -p "${COBALT}/git"

# Change this to where you want to clone the repo to.
PROTO_REPO="${COBALT}/git/proto"

git clone https://github.com/cobaltspeech/proto "${PROTO_REPO}"

Step 3. Generating code

The cobaltspeech/proto repo provides a buf.gen.yaml config file to get you started with a couple of languages.
Other plugins can be added to the buf.gen.yaml file to generate SDK code for more languages.
To generate the SDKs, simply run the following (assuming the buf binary is in your $PATH)

cd "${PROTO_REPO}"

# Removing any previously generated files.
rm -rf ./gen

# Generating code for all proto files inside the `proto` directory.
buf generate proto

You should now have a folder called gen inside ${PROTO_REPO} that contains the generated code. The latest version of the VoiceBio API is v1. You can import / include / copy the generated files into your projects as per the conventions of different languages.

Python
Golang

gen
├── ... other languages ...
└── py
  └── cobaltspeech
    ├── ... other services ...
    └── voicebio
      └── v1
        ├── voicebio_pb2_grpc.py
        ├── voicebio_pb2.py
        └── voicebio_pb2.pyi

gen
├── ... other languages ...
└── go
   ├── cobaltspeech
   │ ├── ...
   │   └── voicebio
   │      └── v1
   │        ├── voicebio_grpc.pb.go
   │        └── voicebio.pb.go
   └── gw
     └── cobaltspeech
       ├── ...
       └── voicebio
         └── v1
            └── voicebio.pb.gw.go

Step 4. Installing gPRC and protobuf

A couple of gRPC and protobuf dependencies are required along with the code generated above. The method of installing them depends on the programming language being used.
These dependencies and the most common way of installing/ / including them are listed below for some chosen languages.

# It is encouraged to this inside a python virtual environment

# to avoid creating version conflicts for other scripts that may

# be using these libraries.

pip install --upgrade protobuf
pip install --upgrade grpcio
pip install --upgrade google-api-python-client

go get google.golang.org/protobuf
go get google.golang.org/grpc
go get google.golang.org/genproto

# More details on grpc installation can be found at:

# https://grpc.io/docs/languages/cpp/quickstart/

COBALT="${HOME}/cobalt"
mkdir -p "${COBALT}/git"

# Latest version as of 14th March, 2023.

VERSION="v1.52.0"
GRPC_REPO="${COBALT}/git/grpc-${VERSION}"

git clone \
 --recurse-submodules --depth 1 --shallow-submodules \
 -b "${VERSION}" \
 https://github.com/grpc/grpc ${GRPC_REPO}

cd "${GRPC_REPO}"
mkdir -p cmake/build

# Change this to where you want to install libprotobuf and libgrpc.

# It is encouraged to install gRPC locally as there is no easy way to

# uninstall gRPC after you’ve installed it globally.

INSTALL_DIR="${COBALT}"

cd cmake/build
cmake \
 -DgRPC_INSTALL=ON \
 -DgRPC_BUILD_TESTS=OFF \
 -DCMAKE_INSTALL_PREFIX=${INSTALL_DIR} \
 ../..

make -j
make install

3 - Connecting to the Server

Describes how to connect to a running Cobalt VoiceBio server instance.

Once you have your VoiceBio server up and running, and have installed or generated the SDK for your project, you can connect to a running instance of VoiceBio server, by “dialing” a gRPC connection.
First, you need to know the address where the server is running: e.g. host:grpc_port. By default, this is localhost:2727 and should be logged to the terminal when you first start VoiceBio server as grpcAddr:

2023/08/14 10:49:38 info  {"license":"Copyright © 2023--present. Cobalt Speech and Language, Inc.  For additional details, including information about open source components used in this software, please see the COPYING file bundled with this program."}
2023/08/14 10:49:38 info  {"msg":"reading config file","path":"configs/voicebio-server.config.toml"}
2023/08/14 10:49:38 info  {"msg":"server initializing"}
2023/08/14 10:49:38 info  {"msg":"license verified"}
2023/08/14 10:49:41 info  {"msg":"runtime initialized","model_count":"2","init_time_taken":"2.512935646s"}
2023/08/14 10:49:41 info  {"msg":"server started","grpcAddr":"[::]:2727","httpApiAddr":"[::]:8080","httpOpsAddr":"[::]:8081"}

Info

If you are hosting your server with Transport Layer Security (TLS) enabled, then please follow the instructions under Connection With TLS. Otherwise, you can follow the instructions for the Default Connection method.

Default Connection

The following code snippet connects to the server and queries its version. It connects to the server using an “insecure” gRPC channel. This would be the case if you have just started up a local instance of VoiceBio server without TLS enabled.

Python
Go

import grpc
import cobaltspeech.voicebio.v1.voicebio_pb2_grpc as stub
import cobaltspeech.voicebio.v1.voicebio_pb2 as voicebio

serverAddress = "localhost:2727"

# Using a channel without TLS enabled.
channel = grpc.insecure_channel(serverAddress)
client = stub.VoiceBioServiceStub(channel)

# Get server version.
versionResp = client.Version(voicebio.VersionRequest())
print(versionResp)

package main

import (
	"context"
	"fmt"
	"os"

	"google.golang.org/grpc"
	"google.golang.org/grpc/credentials/insecure"

	voicebiopb "github.com/cobaltspeech/go-genproto/cobaltspeech/voicebio/v1"
)

func main() {
	const (
		serverAddress  = "localhost:2727"
	)

	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()

	opts := []grpc.DialOption{
		grpc.WithTransportCredentials(insecure.NewCredentials()), // Using a channel without TLS enabled.
		grpc.WithBlock(),
		grpc.WithReturnConnectionError(),
		grpc.FailOnNonTempDialError(true),
	}

	conn, err := grpc.DialContext(ctx, serverAddress, opts...)
	if err != nil {
		fmt.Printf("failed to dial gRPC connection: %v\n", err)
		os.Exit(1)
	}

	client := voicebiopb.NewVoiceBioServiceClient(conn)

	// Get server version.
	versionResp, err := client.Version(ctx, &voicebiopb.VersionRequest{})
	if err != nil {
		fmt.Printf("failed to get server version: %v\n", err)
		os.Exit(1)
	}

	fmt.Printf("%v\n", versionResp)
}

Connect with TLS

In our recommended setup for deployment, TLS is enabled in the gRPC connection, and when connecting to the server, clients validate the server’s SSL certificate to make sure they are talking to the right party. This is similar to how “https” connections work in web browsers.
The following snippets show how to connect to a VoiceBio Server that has TLS enabled. They use the cobalt’s self-hosted demo server at demo.cobaltspeech.com:2727, but you obviously use your own server instance.

Note

Commercial use of the demo server at demo.cobaltspeech.com:2727 is not permitted. This server is for testing and demonstration purposes only and is not guaranteed to support high availability or high volume. Data uploaded to the server may be stored for internal purposes.

Python
Go

import grpc
import cobaltspeech.voicebio.v1.voicebio_pb2_grpc as stub
import cobaltspeech.voicebio.v1.voicebio_pb2 as voicebio

serverAddress = "demo.cobaltspeech.com:2727"

# Setup a gRPC connection with TLS. You can optionally provide your own
# root certificates and private key to grpc.ssl_channel_credentials()
# for mutually authenticated TLS.
creds = grpc.ssl_channel_credentials()
channel = grpc.secure_channel(serverAddress, creds)
client = stub.VoiceBioServiceStub(channel)

# Get server version.
versionResp = client.Version(voicebio.VersionRequest())
print(versionResp)

package main

import (
	"context"
	"crypto/tls"
	"fmt"
	"os"
	"time"

	"google.golang.org/grpc"
	"google.golang.org/grpc/credentials"

	voicebiopb "github.com/cobaltspeech/go-genproto/cobaltspeech/voicebio/v1"
)

func main() {
	const (
		serverAddress  = "demo.cobaltspeech.com:2727"
		connectTimeout = 10 * time.Second
	)

	// Setup a gRPC connection with TLS. You can optionally provide your own
	// root certificates and private key through tls.Config for mutually
	// authenticated TLS.
	tlsCfg := tls.Config{}
	creds := credentials.NewTLS(&tlsCfg)

	ctx, cancel := context.WithTimeout(context.Background(), connectTimeout)
	defer cancel()

	opts := []grpc.DialOption{
		grpc.WithTransportCredentials(creds),
		grpc.WithBlock(),
		grpc.WithReturnConnectionError(),
		grpc.FailOnNonTempDialError(true),
	}

	conn, err := grpc.DialContext(ctx, serverAddress, opts...)
	if err != nil {
		fmt.Printf("failed to dial gRPC connection: %v\n", err)
		os.Exit(1)
	}

	client := voicebiopb.NewVoiceBioServiceClient(conn)

	// Get server version.
	versionResp, err := client.Version(ctx, &voicebiopb.VersionRequest{})
	if err != nil {
		fmt.Printf("failed to get server version: %v\n", err)
		os.Exit(1)
	}

	fmt.Printf("%v\n", versionResp)
}

Client Authentication

In some setups, it may be desired that the server should also validate clients connecting to it and only respond to the ones it can verify. If your VoiceBio server is configured to do client authentication, you will need to present the appropriate certificate and key when connecting to it.
Please note that in the client-authentication mode, the client will still also verify the server’s certificate, and therefore this setup uses mutually authenticated TLS.
The following snippets show how to present client certificates when setting up the credentials. These could then be used in the same way as the examples above to connect to a TLS enabled server.

Python
Go

creds = grpc.ssl_channel_credentials(
  root_certificates=root_certificates,  # PEM certificate as byte string
  private_key=private_key,              # PEM client key as byte string 
  certificate_chain=certificate_chain,  # PEM client certificate as byte string
)

package main

import (
	// ...

	"crypto/tls"
	"crypto/x509"
	"fmt"
	"os"

	// ..
)

func main() {
	// ...

	// Root PEM certificate for validating self-signed server certificate
	var rootCert []byte

	// Client PEM certificate and private key.
	var certPem, keyPem []byte

	caCertPool := x509.NewCertPool()
	if ok := caCertPool.AppendCertsFromPEM(rootCert); !ok {
		fmt.Printf("unable to use given caCert\n")
		os.Exit(1)
	}

	clientCert, err := tls.X509KeyPair(certPem, keyPem)
	if err != nil {
		fmt.Printf("unable to use given client certificate and key: %v\n", err)
		os.Exit(1)
	}

	tlsCfg := tls.Config{
		RootCAs:      caCertPool,
		Certificates: []tls.Certificate{clientCert},
	}

	creds := credentials.NewTLS(&tlsCfg)

	// ...
}

4 - Streaming Enrollment

Describes how to stream audio to VoiceBio server for enrollment.

The following example shows how to stream audio using VoiceBio’s StreamingEnroll request and generate a voiceprint. The stream can come from a file on disk or be directly from a microphone in real time.

Streaming from an audio file

We support several headered file formats including WAV, MP3, FLAC etc. For more details, please see the protocol buffer specification here. For best accuracy, it is recommended to use an uncompressed / loss-less compression audio format like WAV or FLAC.
The examples below use a WAV file as input. We will query the server for available models and use the first model to generate the voiceprint.
Generated Voiceprints can be updated and made more robust by re-enrolling them with additional audio. Please see the re-enrollment section.

Python
Go

import grpc
import cobaltspeech.voicebio.v1.voicebio_pb2_grpc as stub
import cobaltspeech.voicebio.v1.voicebio_pb2 as voicebio

serverAddress = "localhost:2727"

# Using a channel without TLS enabled.
channel = grpc.insecure_channel(serverAddress)
client = stub.VoiceBioServiceStub(channel)

# Get server version.
versionResp = client.Version(voicebio.VersionRequest())
print(versionResp)

# Get list of models on the server.
modelResp = client.ListModels(voicebio.ListModelsRequest())

print("Models:")
for model in modelResp.models:
    print(model)

# Select a model ID from the list above. Going with the first model
# in this example.
modelID = modelResp.models[0].id

# Set the enrollment config. We don't set the audio format and let the
# server auto-detect the format from the file header.
cfg = voicebio.EnrollmentConfig(
    model_id=modelID,
    previous_voiceprint=None,
)

# The first request to the server should only contain the
# configuration. Subsequent requests should contain audio
# bytes. We can write a simple generator to do this.
def stream(cfg, audio, bufferSize=1024):
    yield voicebio.StreamingEnrollRequest(config=cfg)
    
    data = audio.read(bufferSize)
    while len(data) > 0:
        yield voicebio.StreamingEnrollRequest(audio=voicebio.Audio(data=data))
        data = audio.read(bufferSize)

# Streaming audio to the server.
with open("test.wav", "rb") as audio:
  result = client.StreamingEnroll(stream(cfg, audio))

# A certain minimum duration of speech is required for completing enrollment.
# The enrollment status contains information on Whether that has been met or
# whether additional audio is required.  
print(f"enrollment Status:\n{result.enrollment_status}\n")

# Saving the voiceprint data to a file. This can be provided again
# in another StreamingEnroll request (for continuing enrollment) or
# submitted for verification / identification requests.
with open("voiceprint.bin", 'w') as f:
  f.write(result.voiceprint.data)

package main

import (
	"context"
	"errors"
	"fmt"
	"io"
	"os"

	"google.golang.org/grpc"
	"google.golang.org/grpc/credentials/insecure"

	voicebio "github.com/cobaltspeech/go-genproto/cobaltspeech/voicebio/v1"
)

func main() {
	const (
		serverAddress = "localhost:2727"
	)

	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()

	opts := []grpc.DialOption{
		grpc.WithTransportCredentials(insecure.NewCredentials()), // Using a channel without TLS enabled.
		grpc.WithBlock(),
		grpc.WithReturnConnectionError(),
		grpc.FailOnNonTempDialError(true),
	}

	conn, err := grpc.DialContext(ctx, serverAddress, opts...)
	if err != nil {
		fmt.Printf("failed to dial gRPC connection: %v\n", err)
		os.Exit(1)
	}

	client := voicebio.NewVoiceBioServiceClient(conn)

	// Get server version.
	versionResp, err := client.Version(ctx, &voicebio.VersionRequest{})
	if err != nil {
		fmt.Printf("failed to get server version: %v\n", err)
		os.Exit(1)
	}

	fmt.Printf("%v\n", versionResp)

	// Get list model of models on the server.
	modelResp, err := client.ListModels(ctx, &voicebio.ListModelsRequest{})
	if err != nil {
		fmt.Printf("failed to get model list: %v\n", err)
		os.Exit(1)
	}

	fmt.Println("Models:")
	for _, m := range modelResp.Models {
		fmt.Println(m)
	}
	fmt.Println()

	// Selecting the first model.
	cfg := &voicebio.EnrollmentConfig{
		ModelId:            modelResp.Models[0].Id,
		PreviousVoiceprint: nil,
	}

	// Opening audio file.
	audio, err := os.Open("test.wav")
	if err != nil {
		fmt.Printf("failed to open audio file: %v\n", err)
		os.Exit(1)
	}

	defer audio.Close()

	// Starting enrollment.
	result, err := StreamingEnroll(ctx, client, cfg, audio)
	if err != nil {
		fmt.Printf("failed to run streaming enrollment: %v\n", err)
		os.Exit(1)
	}

	// A certain minimum duration of speech is required for completing enrollment.
	// The enrollment status contains information on Whether that has been met or
	// whether additional audio is required.
	fmt.Printf("Enrollment Status: %v\n", result.EnrollmentStatus)

	// Saving the voiceprint data to a file. This can be provided again
	// in another StreamingEnroll request (for continuing enrollment) or
	// submitted for verification / identification requests.
	if err := os.WriteFile("voiceprint.bin", []byte(result.Voiceprint.Data), os.ModePerm); err != nil {
		fmt.Printf("failed to write voiceprint data: %v\n", err)
		os.Exit(1)
	}
}

// StreamingEnroll wraps the streaming API for performing speaker enrollment
// (i.e. voiceprint generation) using  the given cfg.
//
// Data is read from the given audio reader into a buffer and streamed to VoiceBio
// server. The default buffer size may be overridden using Options when creating
// the Client.
//
// If any error occurs while reading the audio or sending it to the server, this
// method will immediately exit, returning that error.
func StreamingEnroll(
	ctx context.Context,
	client voicebio.VoiceBioServiceClient,
	cfg *voicebio.EnrollmentConfig,
	audio io.Reader,
) (*voicebio.StreamingEnrollResponse, error) {
	const (
		streamingBufSize = 1024
	)

	// Creating stream.
	stream, err := client.StreamingEnroll(ctx)
	if err != nil {
		return nil, err
	}

	// Sending audio.
	if err := sendAudio(stream, cfg, audio, streamingBufSize); err != nil && !errors.Is(err, io.EOF) {
		// if sendAudio encountered io.EOF, it's only a
		// notification that the stream has closed.  The actual
		// status will be obtained in the CloseAndRecv call. We
		// therefore return on non-EOF errors here.
		return nil, err
	}

	// Returning result.
	return stream.CloseAndRecv()
}

// sendAudio sends the config and audio to a stream.
func sendAudio(
	stream voicebio.VoiceBioService_StreamingEnrollClient,
	cfg *voicebio.EnrollmentConfig,
	audio io.Reader,
	bufSize uint32,
) error {
	// The first message needs to be a config message, and all subsequent
	// messages must be audio messages.

	// Send the config.
	if err := stream.Send(&voicebio.StreamingEnrollRequest{
		Request: &voicebio.StreamingEnrollRequest_Config{Config: cfg},
	}); err != nil {
		// if this failed, we don't need to CloseSend
		return err
	}

	// Stream the audio.
	buf := make([]byte, bufSize)
	for {
		n, err := audio.Read(buf)
		if n > 0 {
			if err2 := stream.Send(&voicebio.StreamingEnrollRequest{
				Request: &voicebio.StreamingEnrollRequest_Audio{
					Audio: &voicebio.Audio{Data: buf[:n]},
				},
			}); err2 != nil {
				// if we couldn't Send, the stream has
				// encountered an error and we don't need to
				// CloseSend.
				return err2
			}
		}

		if err != nil {
			// err could be io.EOF, or some other error reading from
			// audio.  In any case, we need to CloseSend, send the
			// appropriate error to errCh and return from the function
			if err2 := stream.CloseSend(); err2 != nil {
				return err2
			}

			if err != io.EOF {
				return err
			}

			return nil
		}
	}
}

Streaming from microphone

Streaming audio from microphone input basically requires a reader interface that can provided audio samples recorded from a microphone; typically this requires interaction with system libraries. Another option is to use an external command line tool like sox to record and pipe audio into the client.
The examples below use the latter approach by using the rec command provided with sox to record and stream the audio.

Python
Go

#!/usr/bin/env python3

# This example assumes sox is installed on the system and is available
# in the system's PATH variable. Instead of opening a regular file from
# disk, we open a subprocess that executes sox's rec command to record
# audio from the system's default microphone.

import subprocess
import grpc
import cobaltspeech.voicebio.v1.voicebio_pb2_grpc as stub
import cobaltspeech.voicebio.v1.voicebio_pb2 as voicebio

serverAddress = "localhost:2727"

# Using a channel without TLS enabled.
channel = grpc.insecure_channel(serverAddress)
client = stub.VoiceBioServiceStub(channel)

# Get server version.
versionResp = client.Version(voicebio.VersionRequest())
print(versionResp)

# Get list of models on the server.
modelResp = client.ListModels(voicebio.ListModelsRequest())

print("Models:")
for model in modelResp.models:
    print(model)

# Select a model ID from the list above. Going with the first model
# in this example.
m = modelResp.models[0]
modelID = m.id

# Setting audio format to be raw 16-bit signed little endian audio samples
# recorded at the sample rate expected by the model.
cfg = voicebio.EnrollmentConfig(
    model_id=modelID,
    previous_voiceprint=None,
    audio_format=voicebio.AudioFormat(
      audio_format_raw=voicebio.AudioFormatRAW(
        encoding="AUDIO_ENCODING_SIGNED",
        bit_depth=16,
        byte_order="BYTE_ORDER_LITTLE_ENDIAN",
        sample_rate=m.attributes.sample_rate,
        channels=1,
      )
    ),
)

# Open microphone stream using sox's rec command and record
# audio using the config specified above for *10 seconds*.
maxDuration = 10
cmd = f"rec -t raw -r {m.attributes.sample_rate} -e signed -b 16 -L -c 1 - trim 0 {maxDuration}"
mic = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE)
audio = mic.stdout

try:
  _ = audio.read(1024) # Trying to read some bytes as sanity check.
except Exception as err:
    print(f"[ERROR] failed to read audio from mic stream: {err}")

print(f"\n[INFO] recording {maxDuration} seconds of audio microphone ... \n")

# The first request to the server should only contain the
# recognition configuration. Subsequent requests should contain
# audio bytes. We can write a simple generator to do this.
def stream(cfg, audio, bufferSize=1024):
    yield voicebio.StreamingEnrollRequest(config=cfg)

    data = audio.read(bufferSize)
    while len(data) > 0:
        yield voicebio.StreamingEnrollRequest(audio=voicebio.Audio(data=data))
        data = audio.read(bufferSize)

# Streaming audio to the server.
result = client.StreamingEnroll(stream(cfg, audio))

# A certain minimum duration of speech is required for completing enrollment.
# The enrollment status contains information on Whether that has been met or
# whether additional audio is required.  
print(f"enrollment Status:\n{result.enrollment_status}\n")

# Saving the voiceprint data to a file. This can be provided again
# in another StreamingEnroll request (for continuing enrollment) or
# submitted for verification / identification requests.
with open("voiceprint.bin", 'w') as f:
  f.write(result.voiceprint.data)

audio.close()
mic.kill()

package main

import (
	"context"
	"errors"
	"fmt"
	"io"
	"os"
	"os/exec"
	"strings"

	"golang.org/x/sync/errgroup"
	"google.golang.org/grpc"
	"google.golang.org/grpc/credentials/insecure"

	voicebio "github.com/cobaltspeech/go-genproto/cobaltspeech/voicebio/v1"
)

func main() {
	const (
		serverAddress = "localhost:2727"
	)

	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()

	opts := []grpc.DialOption{
		grpc.WithTransportCredentials(insecure.NewCredentials()), // Using a channel without TLS enabled.
		grpc.WithBlock(),
		grpc.WithReturnConnectionError(),
		grpc.FailOnNonTempDialError(true),
	}

	conn, err := grpc.DialContext(ctx, serverAddress, opts...)
	if err != nil {
		fmt.Printf("failed to dial gRPC connection: %v\n", err)
		os.Exit(1)
	}

	client := voicebio.NewVoiceBioServiceClient(conn)

	// Get server version.
	versionResp, err := client.Version(ctx, &voicebio.VersionRequest{})
	if err != nil {
		fmt.Printf("failed to get server version: %v\n", err)
		os.Exit(1)
	}

	fmt.Printf("%v\n", versionResp)

	// Get list model of models on the server.
	modelResp, err := client.ListModels(ctx, &voicebio.ListModelsRequest{})
	if err != nil {
		fmt.Printf("failed to get model list: %v\n", err)
		os.Exit(1)
	}

	fmt.Println("Models:")
	for _, m := range modelResp.Models {
		fmt.Println(m)
	}
	fmt.Println()

	// Selecting first model.
	m := modelResp.Models[0]

	// Setting audio format to be raw 16-bit signed little endian audio samples
	// recorded at the sample rate expected by the model.
	cfg := &voicebio.EnrollmentConfig{
		ModelId:            m.Id,
		PreviousVoiceprint: nil,
		AudioFormat: &voicebio.AudioFormat{AudioFormat: &voicebio.AudioFormat_AudioFormatRaw{
			AudioFormatRaw: &voicebio.AudioFormatRAW{
				Encoding:   voicebio.AudioEncoding_AUDIO_ENCODING_SIGNED,
				SampleRate: m.Attributes.SampleRate,
				BitDepth:   16,
				ByteOrder:  voicebio.ByteOrder_BYTE_ORDER_LITTLE_ENDIAN,
				Channels:   1,
			},
		},
		},
	}

	// Open microphone stream using sox's rec command and record
	// audio using the config specified above for *10 seconds*.
	maxDuration := 10
	args := fmt.Sprintf("-t raw -r %d -e signed -b 16 -L -c 1 - trim 0 %d", m.Attributes.SampleRate, maxDuration)
	cmd := exec.CommandContext(ctx, "rec", strings.Fields(args)...)
	cmd.Stderr = os.Stderr

	audio, err := cmd.StdoutPipe()
	if err != nil {
		fmt.Printf("failed to open microphone stream: %v\n", err)
		os.Exit(1)
	}

	// Starting routines to record from microphone and stream to server
	// using an errgroup.Group that returns if either one encounters an error.
	eg, ctx := errgroup.WithContext(ctx)

	eg.Go(func() error {
		fmt.Printf("\n[INFO] recording %d seconds from microphone \n", maxDuration)

		if err := cmd.Run(); err != nil {
			return fmt.Errorf("record from microphone: %w", err)
		}

		return nil
	})

	// Starting enrollment.
	result, err := StreamingEnroll(ctx, client, cfg, audio)
	if err != nil {
		fmt.Printf("failed to run streaming enrollment: %v\n", err)
		os.Exit(1)
	}

	if err := eg.Wait(); err != nil {
		fmt.Printf("%v\n", err)
		os.Exit(1)
	}

	// A certain minimum duration of speech is required for completing enrollment.
	// The enrollment status contains information on Whether that has been met or
	// whether additional audio is required.
	fmt.Printf("Enrollment Status: %v\n", result.EnrollmentStatus)

	// Saving the voiceprint data to a file. This can be provided again
	// in another StreamingEnroll request (for continuing enrollment) or
	// submitted for verification / identification requests.
	if err := os.WriteFile("voiceprint.bin", []byte(result.Voiceprint.Data), os.ModePerm); err != nil {
		fmt.Printf("failed to wriet voiceprint data: %v\n", err)
		os.Exit(1)
	}
}

// StreamingEnroll wraps the streaming API for performing speaker enrollment
// (i.e. voiceprint generation) using  the given cfg.
//
// Data is read from the given audio reader into a buffer and streamed to VoiceBio
// server. The default buffer size may be overridden using Options when creating
// the Client.
//
// If any error occurs while reading the audio or sending it to the server, this
// method will immediately exit, returning that error.
func StreamingEnroll(
	ctx context.Context,
	client voicebio.VoiceBioServiceClient,
	cfg *voicebio.EnrollmentConfig,
	audio io.Reader,
) (*voicebio.StreamingEnrollResponse, error) {
	const (
		streamingBufSize = 1024
	)

	// Creating stream.
	stream, err := client.StreamingEnroll(ctx)
	if err != nil {
		return nil, err
	}

	// Sending audio.
	if err := sendAudio(stream, cfg, audio, streamingBufSize); err != nil && !errors.Is(err, io.EOF) {
		// if sendAudio encountered io.EOF, it's only a
		// notification that the stream has closed.  The actual
		// status will be obtained in the CloseAndRecv call. We
		// therefore return on non-EOF errors here.
		return nil, err
	}

	// Returning result.
	return stream.CloseAndRecv()
}

// sendAudio sends audio to a stream.
func sendAudio(
	stream voicebio.VoiceBioService_StreamingEnrollClient,
	cfg *voicebio.EnrollmentConfig,
	audio io.Reader,
	bufSize uint32,
) error {
	// The first message needs to be a config message, and all subsequent
	// messages must be audio messages.

	// Send the config.
	if err := stream.Send(&voicebio.StreamingEnrollRequest{
		Request: &voicebio.StreamingEnrollRequest_Config{Config: cfg},
	}); err != nil {
		// if this failed, we don't need to CloseSend
		return err
	}

	// Stream the audio.
	buf := make([]byte, bufSize)
	for {
		n, err := audio.Read(buf)
		if n > 0 {
			if err2 := stream.Send(&voicebio.StreamingEnrollRequest{
				Request: &voicebio.StreamingEnrollRequest_Audio{
					Audio: &voicebio.Audio{Data: buf[:n]},
				},
			}); err2 != nil {
				// if we couldn't Send, the stream has
				// encountered an error and we don't need to
				// CloseSend.
				return err2
			}
		}

		if err != nil {
			// err could be io.EOF, or some other error reading from
			// audio.  In any case, we need to CloseSend, send the
			// appropriate error to errCh and return from the function
			if err2 := stream.CloseSend(); err2 != nil {
				return err2
			}

			if err != io.EOF {
				return err
			}

			return nil
		}
	}
}

Re-enrollment

Voiceprints can be updated and made more robust by re-enrolling them with additional audio. This can be easily done by providing previous voiceprint data in the EnrollmentConfig along with additional audio in a new StreamingEnroll request.

Python
Go

# Connect to server ...

with open("voiceprint.bin", 'r') as f:
  voiceprint = f.read().strip()

cfg = voicebio.EnrollmentConfig(
  model_id=modelID,
  previous_voiceprint=voicebio.Voiceprint(data=voiceprint),
)

# Send audio to server ...

package main

import (
	"context"
	"errors"
	"fmt"
	"io"
	"os"

	"google.golang.org/grpc"
	"google.golang.org/grpc/credentials/insecure"

	voicebio "github.com/cobaltspeech/go-genproto/cobaltspeech/voicebio/v1"
)

func main() {

	// Connect to server ...

	// Reading old voiceprint data.
	data, err := os.ReadFile("voiceprint.bin")
	if err != nil {
		fmt.Printf("\nfailed to read voiceprint data: %v\n", err)
		os.Exit(1)
	}

	cfg := &voicebio.EnrollmentConfig{
		ModelId:            modelResp.Models[0].Id,
		PreviousVoiceprint: &voicebio.Voiceprint{Data: string(data)},
	}

	// Send audio to server ...
}

5 - Streaming Verification

Describes how to stream audio to VoiceBio server for verification against a voiceprint.

The following example shows how to stream audio using VoiceBio’s StreamingVerify request and verify whether the audio matches the provided voiceprint. The stream can come from a file on disk or be directly from a microphone in real time.

Streaming from an audio file

We support several headered file formats including WAV, MP3, FLAC etc. For more details, please see the protocol buffer specification here. For best accuracy, it is recommended to use an uncompressed / loss-less compression audio format like WAV or FLAC.
The examples below use a WAV file as input. We will query the server for available models and use the first model to score and verify given audio against a given voiceprint.

Info

Voiceprints provided in StreamingVerify requests must be generated using the same or compatible model via StreamingEnroll.

Python
Go

import grpc
import cobaltspeech.voicebio.v1.voicebio_pb2_grpc as stub
import cobaltspeech.voicebio.v1.voicebio_pb2 as voicebio

serverAddress = "localhost:2727"

# Using a channel without TLS enabled.
channel = grpc.insecure_channel(serverAddress)
client = stub.VoiceBioServiceStub(channel)

# Get server version.
versionResp = client.Version(voicebio.VersionRequest())
print(versionResp)

# Get list of models on the server.
modelResp = client.ListModels(voicebio.ListModelsRequest())

print("Models:")
for model in modelResp.models:
    print(model)

# Select a model ID from the list above. Going with the first model
# in this example.
modelID = modelResp.models[0].id

# Loading reference voiceprint.
with open("voiceprint.bin", 'r') as f:
    voiceprint = voicebio.Voiceprint(data=f.read().strip())

# Set the verification config. We don't set the audio format and let the
# server auto-detect the format from the file header.
cfg = voicebio.VerificationConfig(
    model_id=modelID,
    voiceprint=voiceprint,
)

# The first request to the server should only contain the
# configuration. Subsequent requests should contain audio
# bytes. We can write a simple generator to do this.
def stream(cfg, audio, bufferSize=1024):
    yield voicebio.StreamingVerifyRequest(config=cfg)
    
    data = audio.read(bufferSize)
    while len(data) > 0:
        yield voicebio.StreamingVerifyRequest(audio=voicebio.Audio(data=data))
        data = audio.read(bufferSize)

# Streaming audio to the server.
with open("test.wav", "rb") as audio:
  resp = client.StreamingVerify(stream(cfg, audio))

# Server returns a similarity score along with whether the score
# exceeded the server-configured threshold for being a match.
print(f"Verification Score: {resp.result.similarity_score:1.3f}, Match: {resp.result.is_match}")

package main

import (
	"context"
	"errors"
	"fmt"
	"io"
	"os"

	"google.golang.org/grpc"
	"google.golang.org/grpc/credentials/insecure"

	voicebio "github.com/cobaltspeech/go-genproto/cobaltspeech/voicebio/v1"
)

func main() {
	const (
		serverAddress = "localhost:2727"
	)

	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()

	opts := []grpc.DialOption{
		grpc.WithTransportCredentials(insecure.NewCredentials()), // Using a channel without TLS enabled.
		grpc.WithBlock(),
		grpc.WithReturnConnectionError(),
		grpc.FailOnNonTempDialError(true),
	}

	conn, err := grpc.DialContext(ctx, serverAddress, opts...)
	if err != nil {
		fmt.Printf("failed to dial gRPC connection: %v\n", err)
		os.Exit(1)
	}

	client := voicebio.NewVoiceBioServiceClient(conn)

	// Get server version.
	versionResp, err := client.Version(ctx, &voicebio.VersionRequest{})
	if err != nil {
		fmt.Printf("failed to get server version: %v\n", err)
		os.Exit(1)
	}

	fmt.Printf("%v\n", versionResp)

	// Get list model of models on the server.
	modelResp, err := client.ListModels(ctx, &voicebio.ListModelsRequest{})
	if err != nil {
		fmt.Printf("failed to get model list: %v\n", err)
		os.Exit(1)
	}

	fmt.Println("Models:")
	for _, m := range modelResp.Models {
		fmt.Println(m)
	}
	fmt.Println()

	// Reading voiceprint data.
	data, err := os.ReadFile("voiceprint.bin")
	if err != nil {
		fmt.Printf("\nfailed to read voiceprint data: %v\n", err)
		os.Exit(1)
	}

	// Selecting the first model.
	cfg := &voicebio.VerificationConfig{
		ModelId:    modelResp.Models[0].Id,
		Voiceprint: &voicebio.Voiceprint{Data: string(data)},
	}

	// Opening audio file.
	audio, err := os.Open("test.wav")
	if err != nil {
		fmt.Printf("failed to open audio file: %v\n", err)
		os.Exit(1)
	}

	defer audio.Close()

	// Starting verification.
	resp, err := StreamingVerify(ctx, client, cfg, audio)
	if err != nil {
		fmt.Printf("failed to run streaming verification: %v\n", err)
		os.Exit(1)
	}

	// Server returns a similarity score along with whether the score
	// exceeded the server-configured threshold for being a match.
	fmt.Printf("Verification Score: %1.3f, Match: %v\n", resp.Result.SimilarityScore, resp.Result.IsMatch)
}

// StreamingVerify wraps the streaming API for performing speaker verification
// using  the given cfg.
//
// Data is read from the given audio reader into a buffer and streamed to VoiceBio
// server. The default buffer size may be overridden using Options when creating
// the Client.
//
// If any error occurs while reading the audio or sending it to the server, this
// method will immediately exit, returning that error.
func StreamingVerify(
	ctx context.Context,
	client voicebio.VoiceBioServiceClient,
	cfg *voicebio.VerificationConfig,
	audio io.Reader,
) (*voicebio.StreamingVerifyResponse, error) {
	const (
		streamingBufSize = 1024
	)

	// Creating stream.
	stream, err := client.StreamingVerify(ctx)
	if err != nil {
		return nil, err
	}

	// Sending audio.
	if err := sendAudio(stream, cfg, audio, streamingBufSize); err != nil && !errors.Is(err, io.EOF) {
		// if sendAudio encountered io.EOF, it's only a
		// notification that the stream has closed.  The actual
		// status will be obtained in the CloseAndRecv call. We
		// therefore return on non-EOF errors here.
		return nil, err
	}

	// Returning result.
	return stream.CloseAndRecv()
}

// sendAudio sends the config and audio to a stream.
func sendAudio(
	stream voicebio.VoiceBioService_StreamingVerifyClient,
	cfg *voicebio.VerificationConfig,
	audio io.Reader,
	bufSize uint32,
) error {
	// The first message needs to be a config message, and all subsequent
	// messages must be audio messages.

	// Send the config.
	if err := stream.Send(&voicebio.StreamingVerifyRequest{
		Request: &voicebio.StreamingVerifyRequest_Config{Config: cfg},
	}); err != nil {
		// if this failed, we don't need to CloseSend
		return err
	}

	// Stream the audio.
	buf := make([]byte, bufSize)
	for {
		n, err := audio.Read(buf)
		if n > 0 {
			if err2 := stream.Send(&voicebio.StreamingVerifyRequest{
				Request: &voicebio.StreamingVerifyRequest_Audio{
					Audio: &voicebio.Audio{Data: buf[:n]},
				},
			}); err2 != nil {
				// if we couldn't Send, the stream has
				// encountered an error and we don't need to
				// CloseSend.
				return err2
			}
		}

		if err != nil {
			// err could be io.EOF, or some other error reading from
			// audio.  In any case, we need to CloseSend, send the
			// appropriate error to errCh and return from the function
			if err2 := stream.CloseSend(); err2 != nil {
				return err2
			}

			if err != io.EOF {
				return err
			}

			return nil
		}
	}
}

Streaming from microphone

Streaming audio from microphone input basically requires a reader interface that can provided audio samples recorded from a microphone; typically this requires interaction with system libraries. Another option is to use an external command line tool like sox to record and pipe audio into the client.
The examples below use the latter approach by using the rec command provided with sox to record and stream the audio.

Python
Go

#!/usr/bin/env python3

# This example assumes sox is installed on the system and is available
# in the system's PATH variable. Instead of opening a regular file from
# disk, we open a subprocess that executes sox's rec command to record
# audio from the system's default microphone.

import subprocess
import grpc
import cobaltspeech.voicebio.v1.voicebio_pb2_grpc as stub
import cobaltspeech.voicebio.v1.voicebio_pb2 as voicebio

serverAddress = "localhost:2727"

# Using a channel without TLS enabled.
channel = grpc.insecure_channel(serverAddress)
client = stub.VoiceBioServiceStub(channel)

# Get server version.
versionResp = client.Version(voicebio.VersionRequest())
print(versionResp)

# Get list of models on the server.
modelResp = client.ListModels(voicebio.ListModelsRequest())

print("Models:")
for model in modelResp.models:
    print(model)

# Select a model ID from the list above. Going with the first model
# in this example.
m = modelResp.models[0]
modelID = m.id

# Loading reference voiceprint.
with open("voiceprint.bin", 'r') as f:
    voiceprint = voicebio.Voiceprint(data=f.read().strip())

# Setting audio format to be raw 16-bit signed little endian audio samples
# recorded at the sample rate expected by the model.
cfg = voicebio.VerificationConfig(
    model_id=modelID,
    voiceprint=voiceprint,
    audio_format=voicebio.AudioFormat(
      audio_format_raw=voicebio.AudioFormatRAW(
        encoding="AUDIO_ENCODING_SIGNED",
        bit_depth=16,
        byte_order="BYTE_ORDER_LITTLE_ENDIAN",
        sample_rate=m.attributes.sample_rate,
        channels=1,
      )
    ),
)

# Open microphone stream using sox's rec command and record
# audio using the config specified above for *10 seconds*.
maxDuration = 10
cmd = f"rec -t raw -r {m.attributes.sample_rate} -e signed -b 16 -L -c 1 - trim 0 {maxDuration}"
mic = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE)
audio = mic.stdout

try:
  _ = audio.read(1024) # Trying to read some bytes as sanity check.
except Exception as err:
    print(f"[ERROR] failed to read audio from mic stream: {err}")

print(f"\n[INFO] recording {maxDuration} seconds of audio microphone ... \n")

# The first request to the server should only contain the
# recognition configuration. Subsequent requests should contain
# audio bytes. We can write a simple generator to do this.
def stream(cfg, audio, bufferSize=1024):
    yield voicebio.StreamingVerifyRequest(config=cfg)

    data = audio.read(bufferSize)
    while len(data) > 0:
        yield voicebio.StreamingVerifyRequest(audio=voicebio.Audio(data=data))
        data = audio.read(bufferSize)

# Streaming audio to the server.
resp = client.StreamingVerify(stream(cfg, audio))

# Server returns a similarity score along with whether the score
# exceeded the server-configured threshold for being a match.
print(f"Verification Score: {resp.result.similarity_score:1.3f}, Match: {resp.result.is_match}")

audio.close()
mic.kill()

package main

import (
	"context"
	"errors"
	"fmt"
	"io"
	"os"
	"os/exec"
	"strings"

	"golang.org/x/sync/errgroup"
	"google.golang.org/grpc"
	"google.golang.org/grpc/credentials/insecure"

	voicebio "github.com/cobaltspeech/go-genproto/cobaltspeech/voicebio/v1"
)

func main() {
	const (
		serverAddress = "localhost:2727"
	)

	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()

	opts := []grpc.DialOption{
		grpc.WithTransportCredentials(insecure.NewCredentials()), // Using a channel without TLS enabled.
		grpc.WithBlock(),
		grpc.WithReturnConnectionError(),
		grpc.FailOnNonTempDialError(true),
	}

	conn, err := grpc.DialContext(ctx, serverAddress, opts...)
	if err != nil {
		fmt.Printf("failed to dial gRPC connection: %v\n", err)
		os.Exit(1)
	}

	client := voicebio.NewVoiceBioServiceClient(conn)

	// Get server version.
	versionResp, err := client.Version(ctx, &voicebio.VersionRequest{})
	if err != nil {
		fmt.Printf("failed to get server version: %v\n", err)
		os.Exit(1)
	}

	fmt.Printf("%v\n", versionResp)

	// Get list model of models on the server.
	modelResp, err := client.ListModels(ctx, &voicebio.ListModelsRequest{})
	if err != nil {
		fmt.Printf("failed to get model list: %v\n", err)
		os.Exit(1)
	}

	fmt.Println("Models:")
	for _, m := range modelResp.Models {
		fmt.Println(m)
	}
	fmt.Println()

	// Selecting first model.
	m := modelResp.Models[0]

	// Reading voiceprint data.
	data, err := os.ReadFile("voiceprint.bin")
	if err != nil {
		fmt.Printf("\nfailed to read voiceprint data: %v\n", err)
		os.Exit(1)
	}

	// Setting audio format to be raw 16-bit signed little endian audio samples
	// recorded at the sample rate expected by the model.
	cfg := &voicebio.VerificationConfig{
		ModelId:    m.Id,
		Voiceprint: &voicebio.Voiceprint{Data: string(data)},
		AudioFormat: &voicebio.AudioFormat{AudioFormat: &voicebio.AudioFormat_AudioFormatRaw{
			AudioFormatRaw: &voicebio.AudioFormatRAW{
				Encoding:   voicebio.AudioEncoding_AUDIO_ENCODING_SIGNED,
				SampleRate: m.Attributes.SampleRate,
				BitDepth:   16,
				ByteOrder:  voicebio.ByteOrder_BYTE_ORDER_LITTLE_ENDIAN,
				Channels:   1,
			},
		},
		},
	}

	// Open microphone stream using sox's rec command and record
	// audio using the config specified above for *10 seconds*.
	maxDuration := 10
	args := fmt.Sprintf("-t raw -r %d -e signed -b 16 -L -c 1 - trim 0 %d", m.Attributes.SampleRate, maxDuration)
	cmd := exec.CommandContext(ctx, "rec", strings.Fields(args)...)
	cmd.Stderr = os.Stderr

	audio, err := cmd.StdoutPipe()
	if err != nil {
		fmt.Printf("failed to open microphone stream: %v\n", err)
		os.Exit(1)
	}

	// Starting routines to record from microphone and stream to server
	// using an errgroup.Group that returns if either one encounters an error.
	eg, ctx := errgroup.WithContext(ctx)

	eg.Go(func() error {
		fmt.Printf("\n[INFO] recording %d seconds from microphone \n", maxDuration)

		if err := cmd.Run(); err != nil {
			return fmt.Errorf("record from microphone: %w", err)
		}

		return nil
	})

	// Starting verification.
	resp, err := StreamingVerify(ctx, client, cfg, audio)
	if err != nil {
		fmt.Printf("failed to run streaming verification: %v\n", err)
		os.Exit(1)
	}

	// Server returns a similarity score along with whether the score
	// exceeded the server-configured threshold for being a match.
	fmt.Printf("Verification Score: %1.3f, Match: %v\n", resp.Result.SimilarityScore, resp.Result.IsMatch)
}

// StreamingVerify wraps the streaming API for performing speaker verification
// using  the given cfg.
//
// Data is read from the given audio reader into a buffer and streamed to VoiceBio
// server. The default buffer size may be overridden using Options when creating
// the Client.
//
// If any error occurs while reading the audio or sending it to the server, this
// method will immediately exit, returning that error.
func StreamingVerify(
	ctx context.Context,
	client voicebio.VoiceBioServiceClient,
	cfg *voicebio.VerificationConfig,
	audio io.Reader,
) (*voicebio.StreamingVerifyResponse, error) {
	const (
		streamingBufSize = 1024
	)

	// Creating stream.
	stream, err := client.StreamingVerify(ctx)
	if err != nil {
		return nil, err
	}

	// Sending audio.
	if err := sendAudio(stream, cfg, audio, streamingBufSize); err != nil && !errors.Is(err, io.EOF) {
		// if sendAudio encountered io.EOF, it's only a
		// notification that the stream has closed.  The actual
		// status will be obtained in the CloseAndRecv call. We
		// therefore return on non-EOF errors here.
		return nil, err
	}

	// Returning result.
	return stream.CloseAndRecv()
}

// sendAudio sends the config and audio to a stream.
func sendAudio(
	stream voicebio.VoiceBioService_StreamingVerifyClient,
	cfg *voicebio.VerificationConfig,
	audio io.Reader,
	bufSize uint32,
) error {
	// The first message needs to be a config message, and all subsequent
	// messages must be audio messages.

	// Send the config.
	if err := stream.Send(&voicebio.StreamingVerifyRequest{
		Request: &voicebio.StreamingVerifyRequest_Config{Config: cfg},
	}); err != nil {
		// if this failed, we don't need to CloseSend
		return err
	}

	// Stream the audio.
	buf := make([]byte, bufSize)
	for {
		n, err := audio.Read(buf)
		if n > 0 {
			if err2 := stream.Send(&voicebio.StreamingVerifyRequest{
				Request: &voicebio.StreamingVerifyRequest_Audio{
					Audio: &voicebio.Audio{Data: buf[:n]},
				},
			}); err2 != nil {
				// if we couldn't Send, the stream has
				// encountered an error and we don't need to
				// CloseSend.
				return err2
			}
		}

		if err != nil {
			// err could be io.EOF, or some other error reading from
			// audio.  In any case, we need to CloseSend, send the
			// appropriate error to errCh and return from the function
			if err2 := stream.CloseSend(); err2 != nil {
				return err2
			}

			if err != io.EOF {
				return err
			}

			return nil
		}
	}
}

6 - Streaming Identification

Describes how to stream audio to VoiceBio server for identification using given voiceprints.

The following example shows how to stream audio using VoiceBio’s StreamingIdentify request and identify the speaker in the audio using provided voiceprints. The stream can come from a file on disk or be directly from a microphone in real time.

Info

If you want to compare against a large number of voiceprints in multiple batches, it will be more efficient to extract the voiceprint from the audio once using the StreamingEnroll request, and then compare voiceprints directly without audio via the CompareVoiceprints request.

Streaming from an audio file

We support several headered file formats including WAV, MP3, FLAC etc. For more details, please see the protocol buffer specification here. For best accuracy, it is recommended to use an uncompressed / loss-less compression audio format like WAV or FLAC.
The examples below use a WAV file as input. We will query the server for available models and use the first model to score and identify the given audio against a given set of voiceprints.

Info

Voiceprints provided in StreamingIdentify requests must be generated using the same or compatible model via StreamingEnroll.

Python
Go

import grpc
import cobaltspeech.voicebio.v1.voicebio_pb2_grpc as stub
import cobaltspeech.voicebio.v1.voicebio_pb2 as voicebio

serverAddress = "localhost:2727"

# Using a channel without TLS enabled.
channel = grpc.insecure_channel(serverAddress)
client = stub.VoiceBioServiceStub(channel)

# Get server version.
versionResp = client.Version(voicebio.VersionRequest())
print(versionResp)

# Get list of models on the server.
modelResp = client.ListModels(voicebio.ListModelsRequest())

print("Models:")
for model in modelResp.models:
    print(model)

# Select a model ID from the list above. Going with the first model
# in this example.
modelID = modelResp.models[0].id

# Loading reference voiceprints.
voiceprints = []
for p in ["user1.bin", "user2.bin", "user3.bin"]:
    with open(p, 'r') as f:
        voiceprints.append(voicebio.Voiceprint(data=f.read().strip()))

# Set the identification config. We don't set the audio format and let the
# server auto-detect the format from the file header.
cfg = voicebio.IdentificationConfig(
    model_id=modelID,
    voiceprints=voiceprints,
)

# The first request to the server should only contain the
# configuration. Subsequent requests should contain audio
# bytes. We can write a simple generator to do this.
def stream(cfg, audio, bufferSize=1024):
    yield voicebio.StreamingIdentifyRequest(config=cfg)
    
    data = audio.read(bufferSize)
    while len(data) > 0:
        yield voicebio.StreamingIdentifyRequest(audio=voicebio.Audio(data=data))
        data = audio.read(bufferSize)

# Streaming audio to the server.
with open("test.wav", "rb") as audio:
  result = client.StreamingIdentify(stream(cfg, audio))

# Server returns the index of the voiceprint that matches the best, a similarity
# score for each voiceprint along with whether the score exceeded the server-configured
# threshold for being a match.
#
# If none of the voiceprints were a good match, the best match index will be negative.
matched = "❌ No Match found"
if result.best_match_index >= 0:
    best_score = result.voiceprint_comparison_results[result.best_match_index].similarity_score
    matched = f"✅ Match found: Index: {result.best_match_index}, Score: {best_score:1.3f}"

print(f"\nIdentification Result:\n")

print("Scores:")
for i, r in enumerate(result.voiceprint_comparison_results):
    print(f"Index: {i}, Score: {r.similarity_score:1.3f}, IsMatch: {r.is_match}")

print(f"\n{matched}")

package main

import (
	"context"
	"errors"
	"fmt"
	"io"
	"os"

	"google.golang.org/grpc"
	"google.golang.org/grpc/credentials/insecure"

	voicebio "github.com/cobaltspeech/go-genproto/cobaltspeech/voicebio/v1"
)

func main() {
	const (
		serverAddress = "localhost:2727"
	)

	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()

	opts := []grpc.DialOption{
		grpc.WithTransportCredentials(insecure.NewCredentials()), // Using a channel without TLS enabled.
		grpc.WithBlock(),
		grpc.WithReturnConnectionError(),
		grpc.FailOnNonTempDialError(true),
	}

	conn, err := grpc.DialContext(ctx, serverAddress, opts...)
	if err != nil {
		fmt.Printf("failed to dial gRPC connection: %v\n", err)
		os.Exit(1)
	}

	client := voicebio.NewVoiceBioServiceClient(conn)

	// Get server version.
	versionResp, err := client.Version(ctx, &voicebio.VersionRequest{})
	if err != nil {
		fmt.Printf("failed to get server version: %v\n", err)
		os.Exit(1)
	}

	fmt.Printf("%v\n", versionResp)

	// Get list model of models on the server.
	modelResp, err := client.ListModels(ctx, &voicebio.ListModelsRequest{})
	if err != nil {
		fmt.Printf("failed to get model list: %v\n", err)
		os.Exit(1)
	}

	fmt.Println("Models:")
	for _, m := range modelResp.Models {
		fmt.Println(m)
	}
	fmt.Println()

	// Reading voiceprint data.
	voiceprints := make([]*voicebio.Voiceprint, 0)

	for i, p := range []string{"user1.bin", "user2.bin", "user3.bin"} {
		data, err := os.ReadFile(p)
		if err != nil {
			fmt.Printf("\nfailed to read voiceprint[%d] data: %v\n", i, err)
			os.Exit(1)
		}

		voiceprints = append(voiceprints, &voicebio.Voiceprint{Data: string(data)})
	}

	// Selecting the first model.
	cfg := &voicebio.IdentificationConfig{
		ModelId:     modelResp.Models[0].Id,
		Voiceprints: voiceprints,
	}

	// Opening audio file.
	audio, err := os.Open("test.wav")
	if err != nil {
		fmt.Printf("failed to open audio file: %v\n", err)
		os.Exit(1)
	}

	defer audio.Close()

	// Starting identification.
	result, err := StreamingIdentify(ctx, client, cfg, audio)
	if err != nil {
		fmt.Printf("failed to run streaming identification: %v\n", err)
		os.Exit(1)
	}

	// Server returns the index of the voiceprint that matches the best, a similarity
	// score for each voiceprint along with whether the score exceeded the server-configured
	// threshold for being a match.
	//
	// If none of the voiceprints were a good match, the best match index will be negative.
	matched := "❌ No Match found"
	if result.BestMatchIndex >= 0 {
		bestScore := result.VoiceprintComparisonResults[result.BestMatchIndex].SimilarityScore
		matched = fmt.Sprintf("✅ Match found: Index: %d, Score: %1.3f", result.BestMatchIndex, bestScore)
	}

	fmt.Printf("\nIdentification Result:\n")

	fmt.Printf("Scores:\n")
	for i, r := range result.VoiceprintComparisonResults {
		fmt.Printf("Index: %d, Score: %1.3f, IsMatch: %v\n", i, r.SimilarityScore, r.IsMatch)
	}

	fmt.Printf("\n%s\n", matched)
}

// StreamingIdentify wraps the streaming API for performing speaker identification
// using the given cfg.
//
// Data is read from the given audio reader into a buffer and streamed to VoiceBio
// server. The default buffer size may be overridden using Options when creating
// the Client.
//
// If any error occurs while reading the audio or sending it to the server, this
// method will immediately exit, returning that error.
func StreamingIdentify(
	ctx context.Context,
	client voicebio.VoiceBioServiceClient,
	cfg *voicebio.IdentificationConfig,
	audio io.Reader,
) (*voicebio.StreamingIdentifyResponse, error) {
	const (
		streamingBufSize = 1024
	)

	// Creating stream.
	stream, err := client.StreamingIdentify(ctx)
	if err != nil {
		return nil, err
	}

	// Sending audio.
	if err := sendAudio(stream, cfg, audio, streamingBufSize); err != nil && !errors.Is(err, io.EOF) {
		// if sendAudio encountered io.EOF, it's only a
		// notification that the stream has closed.  The actual
		// status will be obtained in the CloseAndRecv call. We
		// therefore return on non-EOF errors here.
		return nil, err
	}

	// Returning result.
	return stream.CloseAndRecv()
}

// sendAudio sends the config and audio to a stream.
func sendAudio(
	stream voicebio.VoiceBioService_StreamingIdentifyClient,
	cfg *voicebio.IdentificationConfig,
	audio io.Reader,
	bufSize uint32,
) error {
	// The first message needs to be a config message, and all subsequent
	// messages must be audio messages.

	// Send the config.
	if err := stream.Send(&voicebio.StreamingIdentifyRequest{
		Request: &voicebio.StreamingIdentifyRequest_Config{Config: cfg},
	}); err != nil {
		// if this failed, we don't need to CloseSend
		return err
	}

	// Stream the audio.
	buf := make([]byte, bufSize)
	for {
		n, err := audio.Read(buf)
		if n > 0 {
			if err2 := stream.Send(&voicebio.StreamingIdentifyRequest{
				Request: &voicebio.StreamingIdentifyRequest_Audio{
					Audio: &voicebio.Audio{Data: buf[:n]},
				},
			}); err2 != nil {
				// if we couldn't Send, the stream has
				// encountered an error and we don't need to
				// CloseSend.
				return err2
			}
		}

		if err != nil {
			// err could be io.EOF, or some other error reading from
			// audio.  In any case, we need to CloseSend, send the
			// appropriate error to errCh and return from the function
			if err2 := stream.CloseSend(); err2 != nil {
				return err2
			}

			if err != io.EOF {
				return err
			}

			return nil
		}
	}
}

Streaming from microphone

Streaming audio from microphone input basically requires a reader interface that can provided audio samples recorded from a microphone; typically this requires interaction with system libraries. Another option is to use an external command line tool like sox to record and pipe audio into the client.
The examples below use the latter approach by using the rec command provided with sox to record and stream the audio.

Python
Go

#!/usr/bin/env python3

# This example assumes sox is installed on the system and is available
# in the system's PATH variable. Instead of opening a regular file from
# disk, we open a subprocess that executes sox's rec command to record
# audio from the system's default microphone.

import subprocess
import grpc
import cobaltspeech.voicebio.v1.voicebio_pb2_grpc as stub
import cobaltspeech.voicebio.v1.voicebio_pb2 as voicebio

serverAddress = "localhost:2727"

# Using a channel without TLS enabled.
channel = grpc.insecure_channel(serverAddress)
client = stub.VoiceBioServiceStub(channel)

# Get server version.
versionResp = client.Version(voicebio.VersionRequest())
print(versionResp)

# Get list of models on the server.
modelResp = client.ListModels(voicebio.ListModelsRequest())

print("Models:")
for model in modelResp.models:
    print(model)

# Select a model ID from the list above. Going with the first model
# in this example.
m = modelResp.models[0]
modelID = m.id

# Loading reference voiceprints.
voiceprints = []
for p in ["user1.bin", "user2.bin", "user3.bin"]:
    with open(p, 'r') as f:
        voiceprints.append(voicebio.Voiceprint(data=f.read().strip()))

# Setting audio format to be raw 16-bit signed little endian audio samples
# recorded at the sample rate expected by the model.
cfg = voicebio.IdentificationConfig(
    model_id=modelID,
    voiceprints=voiceprints,
    audio_format=voicebio.AudioFormat(
      audio_format_raw=voicebio.AudioFormatRAW(
        encoding="AUDIO_ENCODING_SIGNED",
        bit_depth=16,
        byte_order="BYTE_ORDER_LITTLE_ENDIAN",
        sample_rate=m.attributes.sample_rate,
        channels=1,
      )
    ),
)

# Open microphone stream using sox's rec command and record
# audio using the config specified above for *10 seconds*.
maxDuration = 10
cmd = f"rec -t raw -r {m.attributes.sample_rate} -e signed -b 16 -L -c 1 - trim 0 {maxDuration}"
mic = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE)
audio = mic.stdout

try:
  _ = audio.read(1024) # Trying to read some bytes as sanity check.
except Exception as err:
    print(f"[ERROR] failed to read audio from mic stream: {err}")

print(f"\n[INFO] recording {maxDuration} seconds of audio microphone ... \n")

# The first request to the server should only contain the
# recognition configuration. Subsequent requests should contain
# audio bytes. We can write a simple generator to do this.
def stream(cfg, audio, bufferSize=1024):
    yield voicebio.StreamingIdentifyRequest(config=cfg)

    data = audio.read(bufferSize)
    while len(data) > 0:
        yield voicebio.StreamingIdentifyRequest(audio=voicebio.Audio(data=data))
        data = audio.read(bufferSize)

# Streaming audio to the server.
result = client.StreamingIdentify(stream(cfg, audio))

# Server returns the index of the voiceprint that matches the best, a similarity
# score for each voiceprint along with whether the score exceeded the server-configured
# threshold for being a match.
#
# If none of the voiceprints were a good match, the best match index will be negative.
matched = "❌ No Match found"
if result.best_match_index >= 0:
    best_score = result.voiceprint_comparison_results[result.best_match_index].similarity_score
    matched = f"✅ Match found: Index: {result.best_match_index}, Score: {best_score:1.3f}"

print(f"\nIdentification Result:\n")

print("Scores:")
for i, r in enumerate(result.voiceprint_comparison_results):
    print(f"Index: {i}, Score: {r.similarity_score:1.3f}, IsMatch: {r.is_match}")

print(f"\n{matched}")

audio.close()
mic.kill()

package main

import (
	"context"
	"errors"
	"fmt"
	"io"
	"os"
	"os/exec"
	"strings"

	"golang.org/x/sync/errgroup"
	"google.golang.org/grpc"
	"google.golang.org/grpc/credentials/insecure"

	voicebio "github.com/cobaltspeech/go-genproto/cobaltspeech/voicebio/v1"
)

func main() {
	const (
		serverAddress = "localhost:2727"
	)

	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()

	opts := []grpc.DialOption{
		grpc.WithTransportCredentials(insecure.NewCredentials()), // Using a channel without TLS enabled.
		grpc.WithBlock(),
		grpc.WithReturnConnectionError(),
		grpc.FailOnNonTempDialError(true),
	}

	conn, err := grpc.DialContext(ctx, serverAddress, opts...)
	if err != nil {
		fmt.Printf("failed to dial gRPC connection: %v\n", err)
		os.Exit(1)
	}

	client := voicebio.NewVoiceBioServiceClient(conn)

	// Get server version.
	versionResp, err := client.Version(ctx, &voicebio.VersionRequest{})
	if err != nil {
		fmt.Printf("failed to get server version: %v\n", err)
		os.Exit(1)
	}

	fmt.Printf("%v\n", versionResp)

	// Get list model of models on the server.
	modelResp, err := client.ListModels(ctx, &voicebio.ListModelsRequest{})
	if err != nil {
		fmt.Printf("failed to get model list: %v\n", err)
		os.Exit(1)
	}

	fmt.Println("Models:")
	for _, m := range modelResp.Models {
		fmt.Println(m)
	}
	fmt.Println()

	// Selecting first model.
	m := modelResp.Models[0]

	// Reading voiceprint data.
	voiceprints := make([]*voicebio.Voiceprint, 0)

	for i, p := range []string{"user1.bin", "user2.bin", "user3.bin"} {
		data, err := os.ReadFile(p)
		if err != nil {
			fmt.Printf("\nfailed to read voiceprint[%d] data: %v\n", i, err)
			os.Exit(1)
		}

		voiceprints = append(voiceprints, &voicebio.Voiceprint{Data: string(data)})
	}

	// Setting audio format to be raw 16-bit signed little endian audio samples
	// recorded at the sample rate expected by the model.
	cfg := &voicebio.IdentificationConfig{
		ModelId:     m.Id,
		Voiceprints: voiceprints,
		AudioFormat: &voicebio.AudioFormat{AudioFormat: &voicebio.AudioFormat_AudioFormatRaw{
			AudioFormatRaw: &voicebio.AudioFormatRAW{
				Encoding:   voicebio.AudioEncoding_AUDIO_ENCODING_SIGNED,
				SampleRate: m.Attributes.SampleRate,
				BitDepth:   16,
				ByteOrder:  voicebio.ByteOrder_BYTE_ORDER_LITTLE_ENDIAN,
				Channels:   1,
			},
		},
		},
	}

	// Open microphone stream using sox's rec command and record
	// audio using the config specified above for *10 seconds*.
	maxDuration := 10
	args := fmt.Sprintf("-t raw -r %d -e signed -b 16 -L -c 1 - trim 0 %d", m.Attributes.SampleRate, maxDuration)
	cmd := exec.CommandContext(ctx, "rec", strings.Fields(args)...)
	cmd.Stderr = os.Stderr

	audio, err := cmd.StdoutPipe()
	if err != nil {
		fmt.Printf("failed to open microphone stream: %v\n", err)
		os.Exit(1)
	}

	// Starting routines to record from microphone and stream to server
	// using an errgroup.Group that returns if either one encounters an error.
	eg, ctx := errgroup.WithContext(ctx)

	eg.Go(func() error {
		fmt.Printf("\n[INFO] recording %d seconds from microphone \n", maxDuration)

		if err := cmd.Run(); err != nil {
			return fmt.Errorf("record from microphone: %w", err)
		}

		return nil
	})

	// Starting identification.
	result, err := StreamingIdentify(ctx, client, cfg, audio)
	if err != nil {
		fmt.Printf("failed to run streaming identification: %v\n", err)
		os.Exit(1)
	}

	// Server returns the index of the voiceprint that matches the best, a similarity
	// score for each voiceprint along with whether the score exceeded the server-configured
	// threshold for being a match.
	//
	// If none of the voiceprints were a good match, the best match index will be negative.
	matched := "❌ No Match found"
	if result.BestMatchIndex >= 0 {
		bestScore := result.VoiceprintComparisonResults[result.BestMatchIndex].SimilarityScore
		matched = fmt.Sprintf("✅ Match found: Index: %d, Score: %1.3f", result.BestMatchIndex, bestScore)
	}

	fmt.Printf("\nIdentification Result:\n")

	fmt.Printf("Scores:\n")
	for i, r := range result.VoiceprintComparisonResults {
		fmt.Printf("Index: %d, Score: %1.3f, IsMatch: %v\n", i, r.SimilarityScore, r.IsMatch)
	}

	fmt.Printf("\n%s\n", matched)
}

// StreamingIdentify wraps the streaming API for performing speaker identification
// using the given cfg.
//
// Data is read from the given audio reader into a buffer and streamed to VoiceBio
// server. The default buffer size may be overridden using Options when creating
// the Client.
//
// If any error occurs while reading the audio or sending it to the server, this
// method will immediately exit, returning that error.
func StreamingIdentify(
	ctx context.Context,
	client voicebio.VoiceBioServiceClient,
	cfg *voicebio.IdentificationConfig,
	audio io.Reader,
) (*voicebio.StreamingIdentifyResponse, error) {
	const (
		streamingBufSize = 1024
	)

	// Creating stream.
	stream, err := client.StreamingIdentify(ctx)
	if err != nil {
		return nil, err
	}

	// Sending audio.
	if err := sendAudio(stream, cfg, audio, streamingBufSize); err != nil && !errors.Is(err, io.EOF) {
		// if sendAudio encountered io.EOF, it's only a
		// notification that the stream has closed.  The actual
		// status will be obtained in the CloseAndRecv call. We
		// therefore return on non-EOF errors here.
		return nil, err
	}

	// Returning result.
	return stream.CloseAndRecv()
}

// sendAudio sends the config and audio to a stream.
func sendAudio(
	stream voicebio.VoiceBioService_StreamingIdentifyClient,
	cfg *voicebio.IdentificationConfig,
	audio io.Reader,
	bufSize uint32,
) error {
	// The first message needs to be a config message, and all subsequent
	// messages must be audio messages.

	// Send the config.
	if err := stream.Send(&voicebio.StreamingIdentifyRequest{
		Request: &voicebio.StreamingIdentifyRequest_Config{Config: cfg},
	}); err != nil {
		// if this failed, we don't need to CloseSend
		return err
	}

	// Stream the audio.
	buf := make([]byte, bufSize)
	for {
		n, err := audio.Read(buf)
		if n > 0 {
			if err2 := stream.Send(&voicebio.StreamingIdentifyRequest{
				Request: &voicebio.StreamingIdentifyRequest_Audio{
					Audio: &voicebio.Audio{Data: buf[:n]},
				},
			}); err2 != nil {
				// if we couldn't Send, the stream has
				// encountered an error and we don't need to
				// CloseSend.
				return err2
			}
		}

		if err != nil {
			// err could be io.EOF, or some other error reading from
			// audio.  In any case, we need to CloseSend, send the
			// appropriate error to errCh and return from the function
			if err2 := stream.CloseSend(); err2 != nil {
				return err2
			}

			if err != io.EOF {
				return err
			}

			return nil
		}
	}
}

7 - Comparing Voiceprints

Describes how to compare pre-extracted voiceprints using VoiceBio’s CompareVoiceprints API.

The CompareVoiceprints endpoint allows the user to compare pre-extracted voiceprints and get similarity scores and match results without needing to send audio data.
This is useful in cases where the user wants to compare a given voiceprint against a large number of other voiceprints, and sending audio data for each comparison would be inefficient. The client can enroll the voiceprint once using the StreamingEnroll method, and then use this method to compare it against a large number of other voiceprints in batches.
The following example shows how to compare pre-extracted voiceprints using VoiceBio’s CompareVoiceprints API, without streaming audio. The voiceprints can be loaded from files on disk or obtained from previous enrollment sessions.

Info

Voiceprints provided in CompareVoiceprints requests must be generated using the same or compatible model via StreamingEnroll.

Python
Go

import grpc
import cobaltspeech.voicebio.v1.voicebio_pb2_grpc as stub
import cobaltspeech.voicebio.v1.voicebio_pb2 as voicebio

serverAddress = "localhost:2727"

# Using a channel without TLS enabled.
channel = grpc.insecure_channel(serverAddress)
client = stub.VoiceBioServiceStub(channel)

# Get server version.
versionResp = client.Version(voicebio.VersionRequest())
print(versionResp)

# Get list of models on the server.
modelResp = client.ListModels(voicebio.ListModelsRequest())

print("Models:")
for model in modelResp.models:
    print(model)

# Select a model ID from the list above. Going with the first model
# in this example. The model ID should be the same as the one used to
# generate the voiceprints being compared.
modelID = modelResp.models[0].id

# Loading reference voiceprints.
reference_voiceprints = []
for p in ["user1.bin", "user2.bin", "user3.bin"]:
    with open(p, 'r') as f:
        reference_voiceprints.append(voicebio.Voiceprint(data=f.read().strip()))

# Load the target voiceprint that we want to compare against the reference voiceprints.
with open("unknown.bin", 'r') as f:
    target_voiceprint = voicebio.Voiceprint(data=f.read().strip())

# Set the comparison config.
req = voicebio.CompareVoiceprintsRequest(
    model_id=modelID,
    target_voiceprint=target_voiceprint,
    reference_voiceprints=reference_voiceprints,
)

# Compare voiceprints.
result = client.CompareVoiceprints(req)

# Server returns the index of the voiceprint that matches the best, a similarity
# score for each voiceprint along with whether the score exceeded the server-configured
# threshold for being a match.
#
# If none of the voiceprints were a good match, the best match index will be negative.
matched = "❌ No Match found"
if result.best_match_index >= 0:
    best_score = result.voiceprint_comparison_results[result.best_match_index].similarity_score
    matched = f"✅ Match found: Index: {result.best_match_index}, Score: {best_score:1.3f}"

print(f"\nComparison Result:\n")

print("Scores:")
for i, r in enumerate(result.voiceprint_comparison_results):
    print(f"Index: {i}, Score: {r.similarity_score:1.3f}, IsMatch: {r.is_match}")

print(f"\n{matched}")

package main

import (
	"context"
	"fmt"
	"os"

	"google.golang.org/grpc"
	"google.golang.org/grpc/credentials/insecure"

	voicebio "github.com/cobaltspeech/go-genproto/cobaltspeech/voicebio/v1"
)

func main() {
	const (
		serverAddress = "localhost:2727"
	)

	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()

	opts := []grpc.DialOption{
		grpc.WithTransportCredentials(insecure.NewCredentials()), // Using a channel without TLS enabled.
		grpc.WithBlock(),
		grpc.WithReturnConnectionError(),
		grpc.FailOnNonTempDialError(true),
	}

	conn, err := grpc.DialContext(ctx, serverAddress, opts...)
	if err != nil {
		fmt.Printf("failed to dial gRPC connection: %v\n", err)
		os.Exit(1)
	}

	client := voicebio.NewVoiceBioServiceClient(conn)

	// Get server version.
	versionResp, err := client.Version(ctx, &voicebio.VersionRequest{})
	if err != nil {
		fmt.Printf("failed to get server version: %v\n", err)
		os.Exit(1)
	}

	fmt.Printf("%v\n", versionResp)

	// Get list model of models on the server.
	modelResp, err := client.ListModels(ctx, &voicebio.ListModelsRequest{})
	if err != nil {
		fmt.Printf("failed to get model list: %v\n", err)
		os.Exit(1)
	}

	fmt.Println("Models:")
	for _, m := range modelResp.Models {
		fmt.Println(m)
	}
	fmt.Println()

	// Reading voiceprint data.
	reference_voiceprints := make([]*voicebio.Voiceprint, 0)

	for i, p := range []string{"user1.bin", "user2.bin", "user3.bin"} {
		data, err := os.ReadFile(p)
		if err != nil {
			fmt.Printf("\nfailed to read voiceprint[%d] data: %v\n", i, err)
			os.Exit(1)
		}

		reference_voiceprints = append(reference_voiceprints, &voicebio.Voiceprint{Data: string(data)})
	}

	// Load the target voiceprint that we want to compare against the reference voiceprints.
	data, err := os.ReadFile("unknown.bin")
	if err != nil {
		fmt.Printf("failed to read target voiceprint data: %v\n", err)
		os.Exit(1)
	}

	target_voiceprint := &voicebio.Voiceprint{Data: string(data)}

	// Selecting the first model. The model ID should be the same as the one used to generate the
	// voiceprints being compared.
	req := &voicebio.CompareVoiceprintsRequest{
		ModelId:            modelResp.Models[0].Id,
		TargetVoiceprint:   target_voiceprint,
		ReferenceVoiceprints: reference_voiceprints,
	}

	// Compare voiceprints.
	result, err := client.CompareVoiceprints(ctx, req)
	if err != nil {
		fmt.Printf("failed to compare voiceprints: %v\n", err)
		os.Exit(1)
	}

	// Server returns the index of the voiceprint that matches the best, a similarity
	// score for each voiceprint along with whether the score exceeded the server-configured
	// threshold for being a match.
	//
	// If none of the voiceprints were a good match, the best match index will be negative.
	matched := "❌ No Match found"
	if result.BestMatchIndex >= 0 {
		bestScore := result.VoiceprintComparisonResults[result.BestMatchIndex].SimilarityScore
		matched = fmt.Sprintf("✅ Match found: Index: %d, Score: %1.3f", result.BestMatchIndex, bestScore)
	}

	fmt.Printf("\n Comparison Result:\n")

	fmt.Printf("Scores:\n")
	for i, r := range result.VoiceprintComparisonResults {
		fmt.Printf("Index: %d, Score: %1.3f, IsMatch: %v\n", i, r.SimilarityScore, r.IsMatch)
	}

	fmt.Printf("\n%s\n", matched)
}

8 - Vectorizing Voiceprints

Describes how to convert voiceprints into vector representations using VoiceBio’s VectorizeVoiceprints API.

Voiceprints can also be vectorized using the VectorizeVoiceprints API, which returns a vector representation of each voiceprint that can be used for downstream tasks such as clustering, custom scoring, other machine learning models or even semantic searching in vector databases.
See the API reference for more details.
The following example shows how to use the VectorizeVoiceprints API to vectorize voiceprints. The voiceprints can be loaded from files on disk or obtained from previous enrollment sessions.

Info

Voiceprints provided in VectorizeVoiceprints requests must be generated using the same or compatible model via StreamingEnroll.

Python
Go

import numpy as np

import grpc
import cobaltspeech.voicebio.v1.voicebio_pb2_grpc as stub
import cobaltspeech.voicebio.v1.voicebio_pb2 as voicebio

serverAddress = "localhost:2727"

# Using a channel without TLS enabled.
channel = grpc.insecure_channel(serverAddress)
client = stub.VoiceBioServiceStub(channel)

# Get server version.
versionResp = client.Version(voicebio.VersionRequest())
print(versionResp)

# Get list of models on the server.
modelResp = client.ListModels(voicebio.ListModelsRequest())

print("Models:")
for model in modelResp.models:
    print(model)

# Select a model ID from the list above. Going with the first model
# in this example. The model ID should be the same as the one used to
# generate the voiceprints being vectorized.
modelID = modelResp.models[0].id

# Loading voiceprints.
voiceprints = []
for p in ["user1.bin", "user2.bin", "user3.bin"]:
    with open(p, 'r') as f:
        voiceprints.append(voicebio.Voiceprint(data=f.read().strip()))

# Set the vectorization config.
req = voicebio.VectorizeVoiceprintsRequest(
    model_id=modelID,
    voiceprints=voiceprints,
)

# Vectorize voiceprints.
result = client.VectorizeVoiceprints(req)

# The server returns a list of vectorized voiceprints in the same order as the input voiceprints.
#
# In most cases, the vectorized voiceprints can be compared using simple distance metrics such as
# cosine similarity or euclidean distance. This is not guaranteed, however, and depends on the model
# used to generate the voiceprints and vectorize them.

# Example using cosine similarity.
n = len(result.voiceprints)
similarity = np.zeros((n, n), dtype=np.float32)

for i, vi in enumerate(result.voiceprints):
  for j, vj in enumerate(result.voiceprints):
    similarity[i, j] = np.dot(vi.data, vj.data) / (np.linalg.norm(vi.data) * np.linalg.norm(vj.data))

print("Cosine Similarity Matrix:")
print(similarity)

package main

import (
	"context"
	"fmt"
	"os"

	"google.golang.org/grpc"
	"google.golang.org/grpc/credentials/insecure"

	voicebio "github.com/cobaltspeech/go-genproto/cobaltspeech/voicebio/v1"
)

func main() {
	const (
		serverAddress = "localhost:2727"
	)

	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()

	opts := []grpc.DialOption{
		grpc.WithTransportCredentials(insecure.NewCredentials()), // Using a channel without TLS enabled.
		grpc.WithBlock(),
		grpc.WithReturnConnectionError(),
		grpc.FailOnNonTempDialError(true),
	}

	conn, err := grpc.DialContext(ctx, serverAddress, opts...)
	if err != nil {
		fmt.Printf("failed to dial gRPC connection: %v\n", err)
		os.Exit(1)
	}

	client := voicebio.NewVoiceBioServiceClient(conn)

	// Get server version.
	versionResp, err := client.Version(ctx, &voicebio.VersionRequest{})
	if err != nil {
		fmt.Printf("failed to get server version: %v\n", err)
		os.Exit(1)
	}

	fmt.Printf("%v\n", versionResp)

	// Get list model of models on the server.
	modelResp, err := client.ListModels(ctx, &voicebio.ListModelsRequest{})
	if err != nil {
		fmt.Printf("failed to get model list: %v\n", err)
		os.Exit(1)
	}

	fmt.Println("Models:")
	for _, m := range modelResp.Models {
		fmt.Println(m)
	}
	fmt.Println()

	// Reading voiceprint data.
	voiceprints := make([]*voicebio.Voiceprint, 0)

	for i, p := range []string{"user1.bin", "user2.bin", "user3.bin"} {
		data, err := os.ReadFile(p)
		if err != nil {
			fmt.Printf("\nfailed to read voiceprint[%d] data: %v\n", i, err)
			os.Exit(1)
		}

		voiceprints = append(voiceprints, &voicebio.Voiceprint{Data: string(data)})
	}

	// Selecting the first model. The model ID should be the same as the one used to generate the
	// voiceprints being compared.
	req := &voicebio.VectorizeVoiceprintsRequest{
		ModelId:            modelResp.Models[0].Id,
		Voiceprints:        voiceprints,
	}

	// Vectorize voiceprints.
	result, err := client.VectorizeVoiceprints(ctx, req)
	if err != nil {
		fmt.Printf("failed to vectorize voiceprints: %v\n", err)
		os.Exit(1)
	}

	// The server returns a list of vectorized voiceprints in the same order as the input voiceprints.
	//
	// In almost cases, the vectorized voiceprints can be compared using simple distance metrics such as
	// cosine similarity or euclidean distance. This  is not guaranteed, however, and depends on the model
	// used to generate the voiceprints and vectorize them.

	// Example using cosine similarity.
	n := len(result.Voiceprints)
	similarity := make([][]float32, n)
	for i := range similarity {
		similarity[i] = make([]float32, n)
	}

	for i, vi := range result.Voiceprints {
		for j, vj := range result.Voiceprints {
			dotProduct := float32(0.0)
			normVi := float32(0.0)
			normVj := float32(0.0)

			for k := range vi.Data {
				dotProduct += vi.Data[k] * vj.Data[k]
				normVi += vi.Data[k] * vi.Data[k]
				normVj += vj.Data[k] * vj.Data[k]
			}

            denom := float32(math.Sqrt(float64(normVi)) * math.Sqrt(float64(normVj)))
			similarity[i][j] = dotProduct / denom
		}
	}

	fmt.Printf("Cosine Similarity Matrix:\n")
	for i := range similarity {
		for j := range similarity[i] {
			fmt.Printf("%1.3f ", similarity[i][j])
		}

		fmt.Println()
    }
}

9 - API Reference

Detailed reference for API requests and types.

The API is defined as a protobuf spec, so native bindings can be generated in any language with gRPC support. We recommend using buf to generate the bindings.

This section of the documentation is auto-generated from the protobuf spec. The service contains the methods that can be called, and the “messages” are the data structures (objects, classes or structs in the generated code, depending on the language) passed to and from the methods.

VoiceBioService

Service that implements the Cobalt VoiceBio API.

Version

Version(VersionRequest) VersionResponse

Returns version information from the server.

ListModels

ListModels(ListModelsRequest) ListModelsResponse

Returns information about the models available on the server.

StreamingEnroll

StreamingEnroll(StreamingEnrollRequest) StreamingEnrollResponse

Uses new audio data to perform enrollment of new users, or to update enrollment of existing users. Returns a new or updated voiceprint.

Clients should store the returned voiceprint against the ID of the user that provided the audio. This voiceprint can be provided later, with the Verify or Identify requests to match new audio against known speakers.

If this call is used to update an existing user’s voiceprint, the old voiceprint can be discarded and only the new one can be stored for that user.

StreamingVerify

StreamingVerify(StreamingVerifyRequest) StreamingVerifyResponse

Compares audio data against the provided voiceprint and verifies whether or not the audio matches against the voiceprint.

StreamingIdentify

StreamingIdentify(StreamingIdentifyRequest) StreamingIdentifyResponse

Compares audio data against the provided list of voiceprints and identifies which (or none) of the voiceprints is a match for the given audio.

VectorizeVoiceprints

VectorizeVoiceprints(VectorizeVoiceprintsRequest) VectorizeVoiceprintsResponse

Converts the given voiceprints into numerical vector representations that can be used for various downstream tasks such as clustering, visualization, or as input features for other machine learning models. The specific format and dimensionality of these vectors may vary depending on the model used.

CompareVoiceprints

CompareVoiceprints(CompareVoiceprintsRequest) CompareVoiceprintsResponse

Compares pre-extracted voiceprints and returns similarity scores and match results without needing to send audio data. This is useful in cases where the user wants to compare a given voiceprint against a large number of other voiceprints, and sending audio data for each comparison would be inefficient. The client can enroll the voiceprint once using the StreamingEnroll method, and then use this method to compare it against a large number of other voiceprints in batches.

Messages

If two or more fields in a message are labeled oneof, then each method call using that message must have exactly one of the fields populated
If a field is labeled repeated, then the generated code will accept an array (or struct, or list depending on the language).

Audio

Audio to be sent to VoiceBio.

Fields

data (bytes )

AudioFormat

Format of the audio to be sent for recognition.

Depending on how they are configured, server instances of this service may not support all the formats provided in the API. One format that is guaranteed to be supported is the RAW format with little-endian 16-bit signed samples with the sample rate matching that of the model being requested.

Fields

oneof audio_format.audio_format_raw (AudioFormatRAW ) Audio is raw data without any headers
oneof audio_format.audio_format_headered (AudioFormatHeadered ) Audio has a self-describing header. Headers are expected to be sent at the beginning of the entire audio file/stream, and not in every Audio message.

The default value of this type is AUDIO_FORMAT_HEADERED_UNSPECIFIED. If this value is used, the server may attempt to detect the format of the audio. However, it is recommended that the exact format be specified.

AudioFormatRAW

Details of audio in raw format

Fields

encoding (AudioEncoding ) Encoding of the samples. It must be specified explicitly and using the default value of AUDIO_ENCODING_UNSPECIFIED will result in an error.
bit_depth (uint32 ) Bit depth of each sample (e.g. 8, 16, 24, 32, etc.). This is a required field.
byte_order (ByteOrder ) Byte order of the samples. This field must be set to a value other than BYTE_ORDER_UNSPECIFIED when the bit_depth is greater than 8.
sample_rate (uint32 ) Sampling rate in Hz. This is a required field.
channels (uint32 ) Number of channels present in the audio. E.g.: 1 (mono), 2 (stereo), etc. This is a required field.

CompareVoiceprintsRequest

The top level message sent by the client for the CompareVoiceprints method. This is similar to StreamingIdentifyRequest, but operates on pre-extracted voiceprints without sending any audio data.

Fields

model_id (string ) ID of the model to use for comparison. The model used for comparison must match with the model used for enrollment of the voiceprints. A list of supported IDs can be found using the ListModels call.
target_voiceprint (Voiceprint ) The voiceprint to compare against the reference voiceprints.
reference_voiceprints (Voiceprint repeated) Voiceprints that should be compared against the target voiceprint.

CompareVoiceprintsResponse

The message returned by the server for the CompareVoiceprints method. This contains the similarity scores and match results for comparing the target voiceprint against each of the reference voiceprints, as well as the index of the best matching voiceprint in the reference list, if any of them is a match. This is similar to StreamingIdentifyResponse, but operates on pre-extracted voiceprints without sending any audio data.

Fields

best_match_index (int32 ) Index (0-based) of the best matching voiceprint in the list of reference voiceprints provided in the CompareVoiceprintsRequest message. If none of the voiceprints was a match, a negative value is returned.
voiceprint_comparison_results (VoiceprintComparisonResult repeated) Result of comparing the given the target voiceprint against each of the reference voiceprints. The order of this list is the same as the reference voiceprint list provided in the CompareVoiceprintsRequest message.

EnrollmentConfig

Configuration for Enrollment of speakers.

Fields

model_id (string ) ID of the model to use for enrollment. A list of supported IDs can be found using the ListModels call.
audio_format (AudioFormat ) Format of the audio to be sent for enrollment.
previous_voiceprint (Voiceprint ) Empty string for new users. For re-enrolling additional users with new audio data, set this to that user’s previous voiceprint. The previous voiceprint needs to have been generated using the same model as specified in this config.

EnrollmentStatus

The message returned as part of StreamingEnrollResponse, to provide information about whether voiceprint is sufficiently trained.

Fields

enrollment_complete (bool ) Whether sufficient data has been provided as part of this user’s enrollment. If this is false, more audio should be collected from the user and re-enrollment should be done. If this is true, it is still OK to enroll more data for the same user to update the voiceprint.
additional_audio_required_seconds (uint32 ) If enrollment is not yet complete, how many more seconds of user’s speech are required to complete the enrollment. If enrollment is completed successfully, this value will be set to 0.

IdentificationConfig

Configuration for Identification of a speaker.

Fields

model_id (string ) ID of the model to use for identification. A list of supported IDs can be found using the ListModels call. The model used for identification must match with the model used for enrollment.
audio_format (AudioFormat ) Format of the audio to be sent for enrollment.
voiceprints (Voiceprint repeated) Voiceprints of potential speakers that need to be identified in the given audio.

ListModelsRequest

The top-level message sent by the client for the ListModels method.

ListModelsResponse

The message returned to the client by the ListModels method.

Fields

models (Model repeated) List of models available for use that match the request.

Model

Description of a VoiceBio model.

Fields

id (string ) Unique identifier of the model. This identifier is used to choose the model that should be used for enrollment, verification or identification requests. This ID needs to be specified in the respective config messages for these requests.
name (string ) Model name. This is a concise name describing the model, and may be presented to the end-user, for example, to help choose which model to use for their voicebio task.
attributes (ModelAttributes ) Model Attributes

ModelAttributes

Attributes of a VoiceBio model

Fields

sample_rate (uint32 ) Audio sample rate (native) supported by the model

StreamingEnrollRequest

The top level messages sent by the client for the StreamingEnroll method. In this streaming call, multiple StreamingEnrollRequest messages should be sent. The first message must contain a EnrollmentConfig message, and all subsequent messages must contain Audio only. All Audio messages must contain non-empty audio. If audio content is empty, the server may choose to interpret it as end of stream and stop accepting any further messages.

Fields

oneof request.config (EnrollmentConfig )
oneof request.audio (Audio )

StreamingEnrollResponse

The message returned by the server for the StreamingEnroll method.

Fields

voiceprint (Voiceprint )
enrollment_status (EnrollmentStatus )

StreamingIdentifyRequest

The top level messages sent by the client for the StreamingIdentify method. In this streaming call, multiple StreamingIdentifyRequest messages should be sent. The first message must contain a IdentificationConfig message, and all subsequent messages must contain Audio only. All Audio messages must contain non-empty audio. If audio content is empty, the server may choose to interpret it as end of stream and stop accepting any further messages.

Fields

oneof request.config (IdentificationConfig )
oneof request.audio (Audio )

StreamingIdentifyResponse

The message returned by the server for the StreamingIdentify method.

Fields

best_match_index (int32 ) Index (0-based) of the best matching voiceprint in the list of input voiceprints provided in the IdentificationConfig message. If none of the voiceprints was a match, a negative value is returned.
voiceprint_comparison_results (VoiceprintComparisonResult repeated) Result of comparing the given audio against each of the input voiceprints. The order of this list is the same as the input voiceprint list provided in the IdentificationConfig message.

StreamingVerifyRequest

The top level messages sent by the client for the StreamingVerify method. In this streaming call, multiple StreamingVerifyRequest messages should be sent. The first message must contain a VerificationConfig message, and all subsequent messages must contain Audio only. All Audio messages must contain non-empty audio. If audio content is empty, the server may choose to interpret it as end of stream and stop accepting any further messages.

Fields

oneof request.config (VerificationConfig )
oneof request.audio (Audio )

StreamingVerifyResponse

The message returned by the server for the StreamingVerify method.

Fields

result (VoiceprintComparisonResult )

VectorVoiceprint

Voiceprint represented in vector form. The specific format and dimensionality of this vector may vary depending on the model used. The VectorizeVoiceprints method can be used to convert a Voiceprint into a VectorVoiceprint representation.

Fields

data (float repeated) List of floating point values representing the voiceprint in vector form.

VectorizeVoiceprintsRequest

The top level message sent by the client for the VectorizeVoiceprints method.

Fields

model_id (string ) ID of the model to use for vectorization. The model used for vectorization must match with the model used for enrollment of the voiceprints. A list of supported IDs can be found using the ListModels call.
voiceprints (Voiceprint repeated) Voiceprints to be vectorized.

VectorizeVoiceprintsResponse

The message returned by the server for the VectorizeVoiceprints method.

Fields

voiceprints (VectorVoiceprint repeated) Voiceprint data converted into a vector representation, which can be used for various downstream tasks such as clustering, visualization, or as input features for other machine learning models. The specific format and dimensionality of these vectors may vary depending on the model used.

The order of this list is the same as the input voiceprint list provided in the VectorizeVoiceprintsRequest message.

VerificationConfig

Configuration for Verification of a speaker.

Fields

model_id (string ) ID of the model to use for verification. A list of supported IDs can be found using the ListModels call. The model used for verification must match with the model used for enrollment.
audio_format (AudioFormat ) Format of the audio to be sent for enrollment.
voiceprint (Voiceprint ) Voiceprint with which audio should be compared.

VersionRequest

The top-level message sent by the client for the Version method.

VersionResponse

The message sent by the server for the Version method.

Fields

version (string ) Version of the server handling these requests.

Voiceprint

Voiceprint extracted from user’s audio.

Fields

data (string ) Voiceprint data serialized to a string.

VoiceprintComparisonResult

Message describing the result of comparing a voiceprint against given audio.

Fields

is_match (bool ) Whether or not the audio successfully matches with the provided voiceprint.
similarity_score (float ) Similarity score representing how closely the audio matched against the voiceprint. This score could be any negative or positive number. Lower value suggests that the audio and voiceprints are less similar, whereas a higher value indicates more similarity. The is_match field can be used to actually decide if the result should be considered a valid match.

Enums

AudioEncoding

The encoding of the audio data to be sent for recognition.

Name	Number	Description
AUDIO_ENCODING_UNSPECIFIED	0	AUDIO_ENCODING_UNSPECIFIED is the default value of this type and will result in an error.
AUDIO_ENCODING_SIGNED	1	PCM signed-integer
AUDIO_ENCODING_UNSIGNED	2	PCM unsigned-integer
AUDIO_ENCODING_IEEE_FLOAT	3	PCM IEEE-Float
AUDIO_ENCODING_ULAW	4	G.711 mu-law
AUDIO_ENCODING_ALAW	5	G.711 a-law

AudioFormatHeadered

Name	Number	Description
AUDIO_FORMAT_HEADERED_UNSPECIFIED	0	AUDIO_FORMAT_HEADERED_UNSPECIFIED is the default value of this type.
AUDIO_FORMAT_HEADERED_WAV	1	WAV with RIFF headers
AUDIO_FORMAT_HEADERED_MP3	2	MP3 format with a valid frame header at the beginning of data
AUDIO_FORMAT_HEADERED_FLAC	3	FLAC format
AUDIO_FORMAT_HEADERED_OGG_OPUS	4	Opus format with OGG header

ByteOrder

Byte order of multi-byte data

Name	Number	Description
BYTE_ORDER_UNSPECIFIED	0	BYTE_ORDER_UNSPECIFIED is the default value of this type.
BYTE_ORDER_LITTLE_ENDIAN	1	Little Endian byte order
BYTE_ORDER_BIG_ENDIAN	2	Big Endian byte order

Scalar Value Types

.proto Type	C++ Type	C# Type	Go Type	Java Type	PHP Type	Python Type	Ruby Type
double	double	double	float64	double	float	float	Float
float	float	float	float32	float	float	float	Float
int32	int32	int	int32	int	integer	int	Bignum or Fixnum (as required)
int64	int64	long	int64	long	integer/string	int/long	Bignum
uint32	uint32	uint	uint32	int	integer	int/long	Bignum or Fixnum (as required)
uint64	uint64	ulong	uint64	long	integer/string	int/long	Bignum or Fixnum (as required)
sint32	int32	int	int32	int	integer	int	Bignum or Fixnum (as required)
sint64	int64	long	int64	long	integer/string	int/long	Bignum
fixed32	uint32	uint	uint32	int	integer	int	Bignum or Fixnum (as required)
fixed64	uint64	ulong	uint64	long	integer/string	int/long	Bignum
sfixed32	int32	int	int32	int	integer	int	Bignum or Fixnum (as required)
sfixed64	int64	long	int64	long	integer/string	int/long	Bignum
bool	bool	bool	bool	boolean	boolean	boolean	TrueClass/FalseClass
string	string	string	string	String	string	str/unicode	String (UTF-8)
bytes	string	ByteString	[]byte	ByteString	string	str	String (ASCII-8BIT)

10 -

Cobalt VoiceBio SDK – Cobalt

VoiceBio

1 - Getting Started

Using Cobalt VoiceBio

Info

Running VoiceBio Server Locally on Linux

Running VoiceBio Server as a Docker Container

How to Get a Copy of the VoiceBio Server and Models

System Requirements

2 - Generating SDKs

Pre-generated SDKs

Golang

Python

Generating SDKs

Step 1. Installing buf

Step 2. Getting proto files

Step 3. Generating code

Step 4. Installing gPRC and protobuf

3 - Connecting to the Server

Info

Default Connection

Connect with TLS

Note

Client Authentication

4 - Streaming Enrollment

Streaming from an audio file

Streaming from microphone

Re-enrollment

5 - Streaming Verification

Streaming from an audio file

Info

Streaming from microphone

6 - Streaming Identification

Info

Streaming from an audio file

Info

Streaming from microphone

7 - Comparing Voiceprints

Info

8 - Vectorizing Voiceprints

Info

9 - API Reference

VoiceBioService

Version

ListModels

StreamingEnroll

StreamingVerify

StreamingIdentify

VectorizeVoiceprints

CompareVoiceprints

Messages

Audio

Fields

AudioFormat

Fields

AudioFormatRAW

Fields

CompareVoiceprintsRequest

Fields

CompareVoiceprintsResponse

Fields

EnrollmentConfig

Fields

EnrollmentStatus

Fields

IdentificationConfig

Fields

ListModelsRequest

ListModelsResponse

Fields

Model

Fields

ModelAttributes

Fields

StreamingEnrollRequest

Fields

StreamingEnrollResponse

Fields

StreamingIdentifyRequest

Fields

StreamingIdentifyResponse

Step 1. Installing `buf`

Step 2. Getting `proto` files