Part 2

May 6

Tool use is a meme. Well, not tool use specifically, but the APIs that LLM providers provide for tool use. Every one comes with a different stateful API, complex enough to warrant a library for each programming language the providers expect their users to use. Streaming? Nope! You're going to have to wait until the response is completed, giving your user a spinning bar until your LLM is done. A spinning bar.. like it's 1995. Like you're downloading in_the_end.mp3 from limewire. come on man.

<Personal opinion, as a user who builds tools for herself on top of these things>

What is an LLM? A thing that takes a sequence of tokens (usually) and outputs a set of probabilities for what the next token could be. We wrap these things in (usually) pretty garbage software that will sample them, until the LLM yields a special <stop> with high enough probability, which instructs the program to stop asking the LLM for the next candidates.

That's what an LLM is. It's not using tools. It's outputting a sequence of tokens given its training regime. That's all it's doing.

Adding dumb ass abstractions isn't going to be helpful. With the current form of tool use, we are trying to fit a square peg in a round hole. I don't know what model providers do for tool use. But I'd imagine it's some context-free-grammar constraining the output of the LLM, plus some training to have it do the "right" thing given the preceding sequence.

What tool use should be, and how I build my things to have LLMs use tools

What am I actually doing, when I use tools around me? I'm kind of just outputting some action, every frame of my existence. I have some crude mockery of a sequence in my memory, my world model, and i simply exert the next thing. I'm predicting, doing. I can do it because evolution is a good god.

These LLMs cannot do it. They can't coherently exist, given feedback from the environment.

They produce so. many. tokens. Its like, driving a mile with your eyes closed. You're going to veer off course. You need the feedback.

I can. I can sense the world around me, and it grounds me.

Fuck a tool use API. My LLMs will not produce tokens in isolation, dreaming. They produce streams. I'll prompt them to output specific formats, and they'll stream into an environment that will run their code and immediately give them feedback

Sadly, the way LLMs are set up have them overfit on what low-taste users expect, a chat interface. But that's a crude mockery of what they really are. They are balls of life, that do a thing. A thing they're trained to do.

(I'm using aider's diff fenced prompt in a neovim buffer where an llm streams output which has me running hotkeys to dynamically kill the model and give it immediate feedback and reprompt it)

Lauren L

Part 2

What if you had more agency?

A/B testing