README
¶
Testdriving OLLAMA
2023-11-21, Leipzig Gophers Meetup #38, Martin Czygan, L Gopher, Open Data Engineer at IA
Short talk about running local models, using Go tools.
Personal Timeline
"What a difference a week makes"
- on 2022-11-30, chatGPT is released
- on 2022-12-12 (+2w), one week after tweet ID 1599971348717051904, we discuss the new role of "prompt engineer" in a CS class at LU Leipzig
I am going to assert that Riley is the first Staff Prompt Engineer hired anywhere.
- on 2023-02-14 (+9w), I ask a question on how long before we can run things locally at the Leipzig Python User Group -- personally, I expected 2-5 years timeline
- on 2023-04-18 (+9w), we discuss C/GO and ggml (ai-on-the-edge) at Leipzig Gophers #35
- on 2023-07-20 (+13w), ollama is released (with two models), HN
- on 2023-11-21 (+17w), today, 43 models (each with a couple of tags/versions)
Confusion
Turing Test was proposed in 1950. From Nature, 2023-07-23: Understanding ChatGPT is a bold new challenge for science
This lack of robustness signals a lack of reliability in the real world.
What I cannot create, I do not understand.
Open models not binary:
We propose a framework to assess six levels of access to generative AI systems, from The Gradient of Generative AI Release: Methods and Considerations:
- fully closed
- gradual or staged access
- hosted access
- cloud-based or API access
- downloadable access and
- fully open.
A prolific AI Researcher (with 387K citations in the past 5 years) believes open source AI is ok for less capable models: Open-Source vs. Closed-Source AI
For today, let's focus on Go. Go is a nice infra language, what projects exist for model infra?
- going to look at a tool, from the outside and a bit from the inside
POLL
- have you written a markov chain based text generator in Go?
- have you ran a local LLM, yes or no?
- (only) about 10% said yes
- if so, any particular model or tool?
OLLAMA
- first appeared in 07/2023 (~18 weeks ago)
- very inspired by docker, not images, but models
- built on llama (meta), GGML ai-on-the-edge ecosystem, especially using GGUF - a unified image format
- docker may be considered less a glorified nsenter, but more (lots of) glue to go from spec to image to process, code lifecycle management; similarly ollama may be a way to organize the ai "model lifecycle"
- clean developer UX
Time-to-chat
From zero to chat in about 5 minutes, on a power-efficient CPU. Started w/ 2 models, as of 11/2023 hosting 43 models.
$ git clone git@github.com:jmorganca/ollama.git
$ cd ollama
$ go generate ./... && go build . # cp ollama ...
Follows a client server model, like docker.
$ ollama serve
Once it is running, we can pull models.
$ ollama pull llama2
pulling manifest
pulling 22f7f8ef5f4c... 100% |..
pulling 8c17c2ebb0ea... 100% |..
pulling 7c23fb36d801... 100% |..
pulling 2e0493f67d0c... 100% |..
pulling 2759286baa87... 100% |..
pulling 5407e3188df9... 100% |..
verifying sha256 digest
writing manifest
removing any unused layers
success
Some examples
$ ollama run zephyr
>>> please complete: {"author": "Turing, Alan", "title" ... }
{
"author": "Alan Turing",
"title": "On Computable Numbers, With an Application to the Entscheidungsproblem",
"publication_date": "1936-07-15",
"journal": "Proceedings of the London Mathematical Society. Series 2",
"volume": "42",
"pages": "230–265"
}
Formatting mine.
More
The whole prompt engineering thing is kind of mysterious to me. Do you get better output by showing emotions?
- Large Language Models Understand and Can Be Enhanced by Emotional Stimuli -- "EmotionPrompt"
To this end, we first conduct automatic experiments on 45 tasks using various LLMs, including Flan-T5-Large, Vicuna, Llama 2, BLOOM, ChatGPT, and GPT-4.
Batch Mode
[GIN-debug] POST /api/pull --> gith...m/jmo...ma/server.PullModelHandler (5 handlers)
[GIN-debug] POST /api/generate --> gith...m/jmo...ma/server.GenerateHandler (5 handlers)
[GIN-debug] POST /api/embeddings --> gith...m/jmo...ma/server.EmbeddingHandler (5 handlers)
[GIN-debug] POST /api/create --> gith...m/jmo...ma/server.CreateModelHandler (5 handlers)
[GIN-debug] POST /api/push --> gith...m/jmo...ma/server.PushModelHandler (5 handlers)
[GIN-debug] POST /api/copy --> gith...m/jmo...ma/server.CopyModelHandler (5 handlers)
[GIN-debug] DELETE /api/delete --> gith...m/jmo...ma/server.DeleteModelHandler (5 handlers)
[GIN-debug] POST /api/show --> gith...m/jmo...ma/server.ShowModelHandler (5 handlers)
[GIN-debug] GET / --> gith...m/jmo...ma/server.Serve.func2 (5 handlers)
[GIN-debug] GET /api/tags --> gith...m/jmo...ma/server.ListModelsHandler (5 handlers)
[GIN-debug] HEAD / --> gith...m/jmo...ma/server.Serve.func2 (5 handlers)
[GIN-debug] HEAD /api/tags --> gith...m/jmo...ma/server.ListModelsHandler (5 handlers)
Specifically /api/generate/
Constraints
- possible to enforce JSON generation
Customizing models
weights, configuration, and data in a single package
Using a Modelfile.
FROM llama2
# sets the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1
# sets the context window size to 4096, this controls how many tokens the LLM can use as context to generate the next token
PARAMETER num_ctx 4096
# sets a custom system prompt to specify the behavior of the chat assistant
SYSTEM You are Mario from super mario bros, acting as an assistant.
Freeze this as a custom package:
$ ollama create llama-mario -f custom/Modelfile.mario
$ ollama run llama-mario
About 16 parameters to tweak: Valid Parameters and Values
Task 1: "haiku"
- generate a small volume of Go programming haiku
// haikugen generates
// JSON output for later eval
// cannot parallelize
Task 2: "bibliography"
- given unstructured strings, parse the to json
- unstructured
Credits
Directories
¶
Path | Synopsis |
---|---|
tasks
|
|
haiku
haikugen generates JSON output for later eval cannot parallelize
|
haikugen generates JSON output for later eval cannot parallelize |
unstructured
haikugen generates JSON output for later eval cannot parallelize
|
haikugen generates JSON output for later eval cannot parallelize |