falcon

package

v0.3.2 Latest Latest Go to latest Published: Nov 15, 2024 License: MIT Imports: 6 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/azure/kaito

Links

Open Source Insights

README ¶

Supported Models

Model name	Model source	Sample workspace	Kubernetes Workload	Distributed inference
falcon-7b-instruct	tiiuae	link	Deployment	false
falcon-7b	tiiuae	link	Deployment	false
falcon-40b-instruct	tiiuae	link	Deployment	false
falcon-40b	tiiuae	link	Deployment	false

Image Source

Public: Kaito maintainers manage the lifecycle of the inference service images that contain model weights. The images are available in Microsoft Container Registry (MCR).

Usage

The inference service endpoint is /chat.

Basic example

curl -X POST "http://<SERVICE>:80/chat" -H "accept: application/json" -H "Content-Type: application/json" -d '{"prompt":"YOUR_PROMPT_HERE"}'

Example with full configurable parameters

curl -X POST \
    -H "accept: application/json" \
    -H "Content-Type: application/json" \
    -d '{
        "prompt":"YOUR_PROMPT_HERE",
        "return_full_text": false,
        "clean_up_tokenization_spaces": false, 
        "prefix": null,
        "handle_long_generation": null,
        "generate_kwargs": {
                "max_length":200,
                "min_length":0,
                "do_sample":true,
                "early_stopping":false,
                "num_beams":1,
                "num_beam_groups":1,
                "diversity_penalty":0.0,
                "temperature":1.0,
                "top_k":10,
                "top_p":1,
                "typical_p":1,
                "repetition_penalty":1,
                "length_penalty":1,
                "no_repeat_ngram_size":0,
                "encoder_no_repeat_ngram_size":0,
                "bad_words_ids":null,
                "num_return_sequences":1,
                "output_scores":false,
                "return_dict_in_generate":false,
                "forced_bos_token_id":null,
                "forced_eos_token_id":null,
                "remove_invalid_values":null
            }
        }' \
        "http://<SERVICE>:80/chat"

Parameters

prompt: The initial text provided by the user, from which the model will continue generating text.
return_full_text: If False only generated text is returned, else full text is returned.
clean_up_tokenization_spaces: True/False, determines whether to remove potential extra spaces in the text output.
prefix: Prefix added to the prompt.
handle_long_generation: Provides strategies to address generations beyond the model's maximum length capacity.
max_length: The maximum total number of tokens in the generated text.
min_length: The minimum total number of tokens that should be generated.
do_sample: If True, sampling methods will be used for text generation, which can introduce randomness and variation.
early_stopping: If True, the generation will stop early if certain conditions are met, for example, when a satisfactory number of candidates have been found in beam search.
num_beams: The number of beams to be used in beam search. More beams can lead to better results but are more computationally expensive.
num_beam_groups: Divides the number of beams into groups to promote diversity in the generated results.
diversity_penalty: Penalizes the score of tokens that make the current generation too similar to other groups, encouraging diverse outputs.
temperature: Controls the randomness of the output by scaling the logits before sampling.
top_k: Restricts sampling to the k most likely next tokens.
top_p: Uses nucleus sampling to restrict the sampling pool to tokens comprising the top p probability mass.
typical_p: Adjusts the probability distribution to favor tokens that are "typically" likely, given the context.
repetition_penalty: Penalizes tokens that have been generated previously, aiming to reduce repetition.
length_penalty: Modifies scores based on sequence length to encourage shorter or longer outputs.
no_repeat_ngram_size: Prevents the generation of any n-gram more than once.
encoder_no_repeat_ngram_size: Similar to no_repeat_ngram_size but applies to the encoder part of encoder-decoder models.
bad_words_ids: A list of token ids that should not be generated.
num_return_sequences: The number of different sequences to generate.
output_scores: Whether to output the prediction scores.
return_dict_in_generate: If True, the method will return a dictionary containing additional information.
pad_token_id: The token ID used for padding sequences to the same length.
eos_token_id: The token ID that signifies the end of a sequence.
forced_bos_token_id: The token ID that is forcibly used as the beginning of a sequence token.
forced_eos_token_id: The token ID that is forcibly used as the end of a sequence when max_length is reached.
remove_invalid_values: If True, filters out invalid values like NaNs or infs from model outputs to prevent crashes.

Documentation ¶

Overview ¶

Copyright (c) Microsoft Corporation. Licensed under the MIT license.

Index ¶

Variables

Constants ¶

This section is empty.

Variables ¶

View Source

var (
	PresetFalcon7BModel          = "falcon-7b"
	PresetFalcon40BModel         = "falcon-40b"
	PresetFalcon7BInstructModel  = PresetFalcon7BModel + "-instruct"
	PresetFalcon40BInstructModel = PresetFalcon40BModel + "-instruct"

	PresetFalconTagMap = map[string]string{
		"Falcon7B":          "0.0.6",
		"Falcon7BInstruct":  "0.0.6",
		"Falcon40B":         "0.0.7",
		"Falcon40BInstruct": "0.0.7",
	}
)

Functions ¶

This section is empty.

Types ¶

This section is empty.

Source Files ¶

View all Source files

model.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL