Open Inference Protocol API Specification¶

REST¶

GRPC¶

ServerLive¶

The ServerLive API indicates if the inference server is able to receive and respond to metadata and inference requests.

rpc inference.GRPCInferenceService/ServerLive(ServerLiveRequest) returns ServerLiveResponse

ServerReady¶

The ServerReady API indicates if the server is ready for inferencing.

rpc inference.GRPCInferenceService/ServerReady(ServerReadyRequest) returns ServerReadyResponse

ModelReady¶

The ModelReady API indicates if a specific model is ready for inferencing.

rpc inference.GRPCInferenceService/ModelReady(ModelReadyRequest) returns ModelReadyResponse

ServerMetadata¶

The ServerMetadata API provides information about the server. Errors are indicated by the google.rpc.Status returned for the request. The OK code indicates success and other codes indicate failure.

rpc inference.GRPCInferenceService/ServerMetadata(ServerMetadataRequest) returns ServerMetadataResponse

ModelMetadata¶

The per-model metadata API provides information about a model. Errors are indicated by the google.rpc.Status returned for the request. The OK code indicates success and other codes indicate failure.

rpc inference.GRPCInferenceService/ModelMetadata(ModelMetadataRequest) returns ModelMetadataResponse

ModelInfer¶

The ModelInfer API performs inference using the specified model. Errors are indicated by the google.rpc.Status returned for the request. The OK code indicates success and other codes indicate failure.

rpc inference.GRPCInferenceService/ModelInfer(ModelInferRequest) returns ModelInferResponse

Messages¶

InferParameter¶

An inference parameter value. The Parameters message describes a “name”/”value” pair, where the “name” is the name of the parameter and the “value” is a boolean, integer, or string corresponding to the parameter.

Field	Type	Description
oneof parameter_choice.bool_param	bool	A boolean parameter value.
oneof parameter_choice.int64_param	int64	An int64 parameter value.
oneof parameter_choice.string_param	string	A string parameter value.

InferTensorContents¶

The data contained in a tensor represented by the repeated type that matches the tensor's data type. Protobuf oneof is not used because oneofs cannot contain repeated fields.

Field	Type	Description
bool_contents	repeated bool	Representation for BOOL data type. The size must match what is expected by the tensor's shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.
int_contents	repeated int32	Representation for INT8, INT16, and INT32 data types. The size must match what is expected by the tensor's shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.
int64_contents	repeated int64	Representation for INT64 data types. The size must match what is expected by the tensor's shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.
uint_contents	repeated uint32	Representation for UINT8, UINT16, and UINT32 data types. The size must match what is expected by the tensor's shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.
uint64_contents	repeated uint64	Representation for UINT64 data types. The size must match what is expected by the tensor's shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.
fp32_contents	repeated float	Representation for FP32 data type. The size must match what is expected by the tensor's shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.
fp64_contents	repeated double	Representation for FP64 data type. The size must match what is expected by the tensor's shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.
bytes_contents	repeated bytes	Representation for BYTES data type. The size must match what is expected by the tensor's shape. The contents must be the flattened, one-dimensional, row-major order of the tensor elements.

ModelInferRequest¶

Field	Type	Description
model_name	string	The name of the model to use for inferencing.
model_version	string	The version of the model to use for inference. If not given the server will choose a version based on the model and internal policy.
id	string	Optional identifier for the request. If specified will be returned in the response.
parameters	map ModelInferRequest.ParametersEntry	Optional inference parameters.
inputs	repeated ModelInferRequest.InferInputTensor	The input tensors for the inference.
outputs	repeated ModelInferRequest.InferRequestedOutputTensor	The requested output tensors for the inference. Optional, if not specified all outputs produced by the model will be returned.
raw_input_contents	repeated bytes	The data contained in an input tensor can be represented in "raw" bytes form or in the repeated type that matches the tensor's data type. To use the raw representation 'raw_input_contents' must be initialized with data for each tensor in the same order as 'inputs'. For each tensor, the size of this content must match what is expected by the tensor's shape and data type. The raw data must be the flattened, one-dimensional, row-major order of the tensor elements without any stride or padding between the elements. Note that the FP16 and BF16 data types must be represented as raw content as there is no specific data type for a 16-bit float type.

If this field is specified then InferInputTensor::contents must not be specified for any input tensor. |

ModelInferRequest.InferInputTensor¶

An input tensor for an inference request.

Field	Type	Description
name	string	The tensor name.
datatype	string	The tensor data type.
shape	repeated int64	The tensor shape.
parameters	map ModelInferRequest.InferInputTensor.ParametersEntry	Optional inference input tensor parameters.
contents	InferTensorContents	The tensor contents using a data-type format. This field must not be specified if "raw" tensor contents are being used for the inference request.

ModelInferRequest.InferInputTensor.ParametersEntry¶

Field	Type	Description
key	string	N/A
value	InferParameter	N/A

ModelInferRequest.InferRequestedOutputTensor¶

An output tensor requested for an inference request.

Field	Type	Description
name	string	The tensor name.
parameters	map ModelInferRequest.InferRequestedOutputTensor.ParametersEntry	Optional requested output tensor parameters.

ModelInferRequest.InferRequestedOutputTensor.ParametersEntry¶

Field	Type	Description
key	string	N/A
value	InferParameter	N/A

ModelInferRequest.ParametersEntry¶

Field	Type	Description
key	string	N/A
value	InferParameter	N/A

ModelInferResponse¶

Field	Type	Description
model_name	string	The name of the model used for inference.
model_version	string	The version of the model used for inference.
id	string	The id of the inference request if one was specified.
parameters	map ModelInferResponse.ParametersEntry	Optional inference response parameters.
outputs	repeated ModelInferResponse.InferOutputTensor	The output tensors holding inference results.
raw_output_contents	repeated bytes	The data contained in an output tensor can be represented in "raw" bytes form or in the repeated type that matches the tensor's data type. To use the raw representation 'raw_output_contents' must be initialized with data for each tensor in the same order as 'outputs'. For each tensor, the size of this content must match what is expected by the tensor's shape and data type. The raw data must be the flattened, one-dimensional, row-major order of the tensor elements without any stride or padding between the elements. Note that the FP16 and BF16 data types must be represented as raw content as there is no specific data type for a 16-bit float type.

If this field is specified then InferOutputTensor::contents must not be specified for any output tensor. |

ModelInferResponse.InferOutputTensor¶

An output tensor returned for an inference request.

Field	Type	Description
name	string	The tensor name.
datatype	string	The tensor data type.
shape	repeated int64	The tensor shape.
parameters	map ModelInferResponse.InferOutputTensor.ParametersEntry	Optional output tensor parameters.
contents	InferTensorContents	The tensor contents using a data-type format. This field must not be specified if "raw" tensor contents are being used for the inference response.

ModelInferResponse.InferOutputTensor.ParametersEntry¶

Field	Type	Description
key	string	N/A
value	InferParameter	N/A

ModelInferResponse.ParametersEntry¶

Field	Type	Description
key	string	N/A
value	InferParameter	N/A

ModelMetadataRequest¶

Field	Type	Description
name	string	The name of the model.
version	string	The version of the model to check for readiness. If not given the server will choose a version based on the model and internal policy.

ModelMetadataResponse¶

Field	Type	Description
name	string	The model name.
versions	repeated string	The versions of the model available on the server.
platform	string	The model's platform. See Platforms.
inputs	repeated ModelMetadataResponse.TensorMetadata	The model's inputs.
outputs	repeated ModelMetadataResponse.TensorMetadata	The model's outputs.

ModelMetadataResponse.TensorMetadata¶

Metadata for a tensor.

Field	Type	Description
name	string	The tensor name.
datatype	string	The tensor data type.
shape	repeated int64	The tensor shape. A variable-size dimension is represented by a -1 value.

ModelReadyRequest¶

Field	Type	Description
name	string	The name of the model to check for readiness.
version	string	The version of the model to check for readiness. If not given the server will choose a version based on the model and internal policy.

ModelReadyResponse¶

Field	Type	Description
ready	bool	True if the model is ready, false if not ready.

ServerLiveRequest¶

ServerLiveResponse¶

Field	Type	Description
live	bool	True if the inference server is live, false if not live.

ServerMetadataRequest¶

ServerMetadataResponse¶

Field	Type	Description
name	string	The server name.
version	string	The server version.
extensions	repeated string	The extensions supported by the server.

ServerReadyRequest¶

ServerReadyResponse¶

Field	Type	Description
ready	bool	True if the inference server is ready, false if not ready.