Request limits

infermatic — Wed, 12 Jun 2024 19:20:56 +0000

The quantity of request for the API in this moment is 18/minute and in the UI is 60/minute. If you need more feel free to reach out to us to search an alternative for you! You can contact us via admin@infermatic.ai or join the discord server and ask the moderator.

The post Request limits appeared first on Infermatic.

API Key

infermatic — Wed, 12 Jun 2024 19:20:45 +0000

In order to have your API key you need to:

1. Be a plus user
2. Log in the website Infermatic.ai

You will find your key on the sidebar below the Join Discord button on ‘API Keys’

The post API Key appeared first on Infermatic.

Overview

infermatic — Wed, 12 Jun 2024 19:20:32 +0000

API Documentation Overview

This document provides an overview of the available endpoints in our minimal and basic API, highlighting key functionalities such as chat and general completions, and token counting.

1. Understanding the POST Method

The POST method is one of the main HTTP methods used in API interactions, designed to send data to a server to create or update a resource. Here are the key aspects of the POST method:

Purpose: Used for submitting form data.
Data Submission: Data is included in the body of the request, which can be in formats like JSON, form data, or XML.
Idempotency: POST requests are not idempotent, meaning multiple identical POST requests might create the same resource multiple times.
Response: Typically returns a status code of 200 (OK), 201 (Created), or 204 (No Content).

Structure of a POST request:

For this models:

Mixtral-8x7B-Instruct-v0.1, Qwen2-72B-Instruct, deepseek-coder-33b-instructdeepseek-coder-33b-instruct

{
    "model": "[Name of the model]",
    "prompt": "[your prompt here]",
    "max_tokens": 7000,
    "temperature": 0.7,
    "top_k": 40,
    "repetition_penalty": 1.2,
    "messages": [
        {
            "role": "user",
            "content": "[your prompt here]"
        }
    ]
}

2. Chat Completions

Endpoints designed to handle chat completion requests:

POST /queue/chat/completions: Handles asynchronous queue requests for chat completions.
POST /openai/deployments/{model}/chat/completions: Submits chat completion requests for a specific model.
POST /engines/{model}/chat/completions: Engages a specific model for chat completion.
POST /chat/completions: General endpoint for chat completion requests.
POST /v1/chat/completions: Version 1 endpoint for chat completion requests.

3. General Completions

Endpoints for general text completions:

POST /openai/deployments/{model}/completions: Handles completion requests for specific deployments.
POST /engines/{model}/completions: Directly submits text completion requests to a specified engine.
POST /completions: General endpoint for obtaining text completions.
POST /v1/completions: Version 1 of the text completions endpoint.

4. Token Counting

Endpoint for managing model usage based on token constraints:

POST /utils/token_counter: Counts the number of tokens for a given input.

5. Model Management

Operations that can be performed on models via the API:

GET /models or GET /v1/models: Retrieves a list of all available models in the system.

For more detailed information, please refer to our Swagger documentation.

The post Overview appeared first on Infermatic.

Docs Archive - Infermatic