localmodels

module

v0.0.0-...-da23216 Latest Latest Go to latest Published: Dec 8, 2023 License: MIT

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/miku/localmodels

Links

Open Source Insights

README ¶

Testdriving OLLAMA

2023-11-21, Leipzig Gophers Meetup #38, Martin Czygan, L Gopher, Open Data Engineer at IA

Short talk about running local models, using Go tools.

Personal Timeline

"What a difference a week makes"

on 2022-11-30, chatGPT is released
on 2022-12-12 (+2w), one week after tweet ID 1599971348717051904, we discuss the new role of "prompt engineer" in a CS class at LU Leipzig

I am going to assert that Riley is the first Staff Prompt Engineer hired anywhere.

on 2023-02-14 (+9w), I ask a question on how long before we can run things locally at the Leipzig Python User Group -- personally, I expected 2-5 years timeline
on 2023-04-18 (+9w), we discuss C/GO and ggml (ai-on-the-edge) at Leipzig Gophers #35
on 2023-07-20 (+13w), ollama is released (with two models), HN
on 2023-11-21 (+17w), today, 43 models (each with a couple of tags/versions)

Confusion

Turing Test was proposed in 1950. From Nature, 2023-07-23: Understanding ChatGPT is a bold new challenge for science

This lack of robustness signals a lack of reliability in the real world.

What I cannot create, I do not understand.

Open models not binary:

We propose a framework to assess six levels of access to generative AI systems, from The Gradient of Generative AI Release: Methods and Considerations:

fully closed
gradual or staged access
hosted access
cloud-based or API access
downloadable access and
fully open.

A prolific AI Researcher (with 387K citations in the past 5 years) believes open source AI is ok for less capable models: Open-Source vs. Closed-Source AI

For today, let's focus on Go. Go is a nice infra language, what projects exist for model infra?

going to look at a tool, from the outside and a bit from the inside

POLL

have you written a markov chain based text generator in Go?
have you ran a local LLM, yes or no?
- (only) about 10% said yes
if so, any particular model or tool?

OLLAMA

first appeared in 07/2023 (~18 weeks ago)
very inspired by docker, not images, but models
built on llama (meta), GGML ai-on-the-edge ecosystem, especially using GGUF - a unified image format
docker may be considered less a glorified nsenter, but more (lots of) glue to go from spec to image to process, code lifecycle management; similarly ollama may be a way to organize the ai "model lifecycle"
clean developer UX

Time-to-chat

From zero to chat in about 5 minutes, on a power-efficient CPU. Started w/ 2 models, as of 11/2023 hosting 43 models.

$ git clone git@github.com:jmorganca/ollama.git
$ cd ollama
$ go generate ./... && go build . # cp ollama ...

Follows a client server model, like docker.

$ ollama serve

Once it is running, we can pull models.

$ ollama pull llama2
pulling manifest
pulling 22f7f8ef5f4c... 100% |..
pulling 8c17c2ebb0ea... 100% |..
pulling 7c23fb36d801... 100% |..
pulling 2e0493f67d0c... 100% |..
pulling 2759286baa87... 100% |..
pulling 5407e3188df9... 100% |..
verifying sha256 digest
writing manifest
removing any unused layers
success

Some examples

$ ollama run zephyr
>>> please complete: {"author": "Turing, Alan", "title" ... }

{
  "author": "Alan Turing",
  "title": "On Computable Numbers, With an Application to the Entscheidungsproblem",
  "publication_date": "1936-07-15",
  "journal": "Proceedings of the London Mathematical Society. Series 2",
  "volume": "42",
  "pages": "230–265"
}

Formatting mine.

More

The whole prompt engineering thing is kind of mysterious to me. Do you get better output by showing emotions?

Large Language Models Understand and Can Be Enhanced by Emotional Stimuli -- "EmotionPrompt"

To this end, we first conduct automatic experiments on 45 tasks using various LLMs, including Flan-T5-Large, Vicuna, Llama 2, BLOOM, ChatGPT, and GPT-4.

Batch Mode

[GIN-debug] POST   /api/pull       --> gith...m/jmo...ma/server.PullModelHandler (5 handlers)
[GIN-debug] POST   /api/generate   --> gith...m/jmo...ma/server.GenerateHandler (5 handlers)
[GIN-debug] POST   /api/embeddings --> gith...m/jmo...ma/server.EmbeddingHandler (5 handlers)
[GIN-debug] POST   /api/create     --> gith...m/jmo...ma/server.CreateModelHandler (5 handlers)
[GIN-debug] POST   /api/push       --> gith...m/jmo...ma/server.PushModelHandler (5 handlers)
[GIN-debug] POST   /api/copy       --> gith...m/jmo...ma/server.CopyModelHandler (5 handlers)
[GIN-debug] DELETE /api/delete     --> gith...m/jmo...ma/server.DeleteModelHandler (5 handlers)
[GIN-debug] POST   /api/show       --> gith...m/jmo...ma/server.ShowModelHandler (5 handlers)
[GIN-debug] GET    /               --> gith...m/jmo...ma/server.Serve.func2 (5 handlers)
[GIN-debug] GET    /api/tags       --> gith...m/jmo...ma/server.ListModelsHandler (5 handlers)
[GIN-debug] HEAD   /               --> gith...m/jmo...ma/server.Serve.func2 (5 handlers)
[GIN-debug] HEAD   /api/tags       --> gith...m/jmo...ma/server.ListModelsHandler (5 handlers)

Specifically /api/generate/

Constraints

possible to enforce JSON generation

Customizing models

weights, configuration, and data in a single package

Using a Modelfile.

FROM llama2
# sets the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1
# sets the context window size to 4096, this controls how many tokens the LLM can use as context to generate the next token
PARAMETER num_ctx 4096

# sets a custom system prompt to specify the behavior of the chat assistant
SYSTEM You are Mario from super mario bros, acting as an assistant.

Freeze this as a custom package:

$ ollama create llama-mario -f custom/Modelfile.mario
$ ollama run llama-mario

About 16 parameters to tweak: Valid Parameters and Values

Task 1: "haiku"

generate a small volume of Go programming haiku

// haikugen generates
// JSON output for later eval
// cannot parallelize

haikugen.go

Task 2: "bibliography"

given unstructured strings, parse the to json
unstructured

Credits

Richard Feynman's blackboard at time of his death (1988)

Directories ¶

Path	Synopsis
tasks
haiku haikugen generates JSON output for later eval cannot parallelize	haikugen generates JSON output for later eval cannot parallelize
unstructured haikugen generates JSON output for later eval cannot parallelize	haikugen generates JSON output for later eval cannot parallelize

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL