Class Client

A client that allows inferences from existing OctoAI endpoints. Sets various headers, establishes clients for Chat under Client.chat, AssetLibrary under Client.asset, FineTuningClient under Client.tune, and will check for OCTOAI_TOKEN from environment variable if no token is provided.

Throws

OctoAIClientError - For client-side failures (throttled, no token)

Throws

OctoAIServerError - For server-side failures (unreachable, etc)

Remarks

You can create an OctoAI API token by following the guide at How to Create an OctoAI Access Token

Index

Constructors

constructor

new Client(token?, secureLink?): Client
Constructor for the Client class.
Parameters
- Optional token: null | string
  OctoAI token. If none is set, checks for an OCTOAI_TOKEN envvar, or will default to null.
- secureLink: boolean = false
  Set to true to use SecureLink API instead of public API
Returns Client
- Defined in src/lib/client.ts:101

Properties

`Readonly` asset

asset: AssetLibrary

The AssetLibrary client, accessible with Client.asset.

`Readonly` chat

chat: Chat

The Chat client, accessible with Client.chat.

`Readonly` completions

completions: CompletionsAPI

The CompletionsAPI client, accessible with Client.completions.

`Readonly` headers

headers: {
    Accept: string;
    Authorization: string;
    Content-Type: string;
    User-Agent: string;
    X-OctoAI-Async: string;
}

Headers used to interact with OctoAI servers. Communicates authorization and request type.

Type declaration

Accept: string
Authorization: string
Content-Type: string
User-Agent: string
X-OctoAI-Async: string

`Readonly` secureLink

secureLink: boolean

Set to true to use the SecureLink API.

`Readonly` tune

tune: FineTuningClient

The FineTuningClient, accessible with Client.tune.

Methods

`Private` coldStartWarning

coldStartWarning(): void
Returns void
- Defined in src/lib/client.ts:236

getFutureResult

getFutureResult(future): Promise<Record<string, any>>
Return the result of a InferenceFuture generated from Client.inferAsync as long as Client.isFutureReady returned true.
Parameters
- future: InferenceFuture
  An InferenceFuture generated from Client.inferAsync
Returns Promise<Record<string, any>>
JSON outputs from the endpoint.
- Defined in src/lib/client.ts:380

healthCheck

healthCheck(endpointUrl, timeoutMS?, intervalMS?): Promise<number>
Check health of an endpoint using a get request. Try until timeout.
Parameters
- endpointUrl: string
  Target URL to run the health check.
- timeoutMS: number = 900000
  Milliseconds before request times out. Default is 15 minutes.
- intervalMS: number = 1000
  Interval in milliseconds before the healthCheck method queries
Returns Promise<number>
HTTP status code.

Remarks
The default timeout is set to 15 minutes to allow for potential cold start.

For custom containers, please follow Health Check Paths in Custom Containers to set a health check endpoint.

Information about health check endpoint URLs are available on relevant QuickStart Templates.
- Defined in src/lib/client.ts:264

infer

infer<T>(endpointUrl, inputs): Promise<T>
Send a request to the given endpoint with inputs as request body. For LLaMA2 LLMs, this requires "stream": false in the inputs. To stream for LLMs, please see the inferStream method.
Type Parameters
- T
Parameters
- endpointUrl: string
  Target URL to run inference
- inputs: Record<string, any>
  Necessary inputs for the endpointURL to run inference
Returns Promise<T>
JSON outputs from the endpoint
- Defined in src/lib/client.ts:143

inferAsync

inferAsync(endpointUrl, inputs): Promise<InferenceFuture>
Execute an inference in the background on the server.
Parameters
- endpointUrl: string
  Target URL to send inference request.
- inputs: Record<string, any>
  Contains necessary inputs for endpoint to run inference.
Returns Promise<InferenceFuture>
Future allows checking if results are ready then accessing them.
Remarks
Please read the Async Inference Reference for more information. Client.inferAsync returns an InferenceFuture, which can then be used with Client.isFutureReady to see the status. Once it returns true, you can use the Client.getFutureResult to get the response for your InferenceFuture.

Assuming you have a variable with your target endpoint URL and the inputs the model needs, and an OCTOAI_TOKEN set as an environment variable, you can run a server-side asynchronous inference from QuickStart Template endpoints with something like the below.
```
 const client = new Client();
 const future = await client.inferAsync(url, inputs);
 if (await client.isFutureReady(future) === true) {
   return await client.getFutureResult(future);
 }
```
- Defined in src/lib/client.ts:326

inferStream

inferStream(endpointUrl, inputs): Promise<Response>
Stream text event response body for supporting endpoints. This is an alternative to loading all response body into memory at once. Recommended for use with LLM models. Requires "stream": true in the inputs for LLaMA2 LLMs.
Parameters
- endpointUrl: string
  Target URL to run inference
- inputs: Record<string, any>
  Necessary inputs for the endpointURL to run inference
Returns Promise<Response>
Compatible with getReader method.
Remarks
This allows you to stream back tokens from the LLMs. Below is an example on how to do this with a LLaMA2 LLM using a completions style API.

HuggingFace style APIs will usually use the variable done below to indicate the end of the stream. OpenAI style APIs will often send a string in the stream "data: [DONE]\n" to indicate the stream is complete.

This example concatenates all values from the tokens into a single text variable. How you choose to use the tokens will likely be different, so please modify the code.

This examples assumes:
1. You've followed the guide at How to Create an OctoAI Access Token to create and set your OctoAI access token
2. Either that you will set this token as an OCTOAI_TOKEN envvar or edit the snippet to pass it as a value in the {@link Client.constructor}.
3. You have assigned your endpoint URL and inputs into variables named llamaEndpoint and streamInputs.
```
const client = new Client();
    const readableStream = await client.inferStream(
      llamaEndpoint,
      streamInputs
    );
let text = ``;
const streamReader = readableStream.getReader();
for (
  let { value, done } = await streamReader.read();
  !done;
  { value, done } = await streamReader.read()
) {
  if (done) break;
  const decoded = new TextDecoder().decode(value);
  if (
    decoded === "data: [DONE]\n" ||
    decoded.includes('"finish_reason": "')
  ) {
    break;
  }
  const token = JSON.parse(decoded.substring(5));
  if (token.object === "chat.completion.chunk") {
    text += token.choices[0].delta.content;
  }
console.log(text);
```
The const token = JSON.parse(decoded.substring(5)) line strips "data" from the returned text/event-stream then parses the token as an object.
- Defined in src/lib/client.ts:220

isFutureReady

isFutureReady(future): Promise<boolean>
Return whether the InferenceFuture generated from Client.inferAsync has been computed and can return results.
Parameters
- future: InferenceFuture
  Created from Client.inferAsync.
Returns Promise<boolean>
True if the InferenceFuture inference is completed and are able to use Client.getFutureResult. Else returns false.
- Defined in src/lib/client.ts:366

`Private` pollFuture

pollFuture(future): Promise<InferenceFutureResponse>
Parameters
- future: InferenceFuture
Returns Promise<InferenceFutureResponse>
- Defined in src/lib/client.ts:344

Class Client

Throws

Throws

Remarks

Index

Constructors

Properties

Methods

Constructors

constructor

Parameters

Optional token: null | string

secureLink: boolean = false

Returns Client

Properties

Readonly asset

Readonly chat

Readonly completions

Readonly headers

Type declaration

Accept: string

Authorization: string

Content-Type: string

User-Agent: string

X-OctoAI-Async: string

Readonly secureLink

Readonly tune

Methods

Private coldStartWarning

Returns void

getFutureResult

Parameters

future: InferenceFuture

Returns Promise<Record<string, any>>

healthCheck

Parameters

endpointUrl: string

timeoutMS: number = 900000

intervalMS: number = 1000

Returns Promise<number>

Remarks

infer

Type Parameters

T

Parameters

endpointUrl: string

inputs: Record<string, any>

Returns Promise<T>

inferAsync

Parameters

endpointUrl: string

inputs: Record<string, any>

Returns Promise<InferenceFuture>

Remarks

inferStream

Parameters

endpointUrl: string

inputs: Record<string, any>

Returns Promise<Response>

Remarks

isFutureReady

Parameters

future: InferenceFuture

Returns Promise<boolean>

Private pollFuture

Parameters

future: InferenceFuture

Returns Promise<InferenceFutureResponse>

Settings

Member Visibility

Theme

On This Page

`Optional` token: null | string

`Readonly` asset

`Readonly` chat

`Readonly` completions

`Readonly` headers

`Readonly` secureLink

`Readonly` tune

`Private` coldStartWarning

`Private` pollFuture