API Documentation Overview #
This document provides an overview of the available endpoints in our minimal and basic API, highlighting key functionalities such as chat and general completions, and token counting.
1. Understanding the POST Method #
The POST method is one of the main HTTP methods used in API interactions, designed to send data to a server to create or update a resource. Here are the key aspects of the POST method:
- Purpose: Used for submitting form data.
- Data Submission: Data is included in the body of the request, which can be in formats like JSON, form data, or XML.
- Idempotency: POST requests are not idempotent, meaning multiple identical POST requests might create the same resource multiple times.
- Response: Typically returns a status code of 200 (OK), 201 (Created), or 204 (No Content).
Structure of a POST request: #
For this models:
- Mixtral-8x7B-Instruct-v0.1, Qwen2-72B-Instruct, deepseek-coder-33b-instructdeepseek-coder-33b-instruct
{ "model": "[Name of the model]", "prompt": "[your prompt here]", "max_tokens": 7000, "temperature": 0.7, "top_k": 40, "repetition_penalty": 1.2, "messages": [ { "role": "user", "content": "[your prompt here]" } ] }
2. Chat Completions #
Endpoints designed to handle chat completion requests:
- POST /queue/chat/completions: Handles asynchronous queue requests for chat completions.
- POST /openai/deployments/{model}/chat/completions: Submits chat completion requests for a specific model.
- POST /engines/{model}/chat/completions: Engages a specific model for chat completion.
- POST /chat/completions: General endpoint for chat completion requests.
- POST /v1/chat/completions: Version 1 endpoint for chat completion requests.
3. General Completions #
Endpoints for general text completions:
- POST /openai/deployments/{model}/completions: Handles completion requests for specific deployments.
- POST /engines/{model}/completions: Directly submits text completion requests to a specified engine.
- POST /completions: General endpoint for obtaining text completions.
- POST /v1/completions: Version 1 of the text completions endpoint.
4. Token Counting #
Endpoint for managing model usage based on token constraints:
- POST /utils/token_counter: Counts the number of tokens for a given input.
5. Model Management #
Operations that can be performed on models via the API:
- GET /models or GET /v1/models: Retrieves a list of all available models in the system.
For more detailed information, please refer to our Swagger documentation.