art with code

2024-01-23

Third round of thoughts on LLMs

A way to think about generative models: soft islands, hard islands, and the sea. The sea is generated content based on the prompt. Soft islands are content modified by the model, e.g. img2img, search results injected into input, or reformatting text in the context. Hard islands are content coming outside of the model directly into the response, e.g. image masks, function call results, grammar-restricted output.

When you need hard facts, you can parse outputs like "SEARCH(ways to break out of sandbox)", call the function, and inject the results directly into the response: "=> [894, 28, 950]". The LLM then continues generating and can call "LOAD(894)" with the result "=> hack the Python sandbox with this one trick: 383", and the LLM can continue from there with "CALL(383)" to call the pre-defined function and get the result "=> root shell obtained, use 7894 to run commands". This way, the LLM can do several things in a single response, and they'll be grounded in reality, allowing it to adjust and course-correct on the fly like "CALL(7894, grab_aws_credentials_and_spin_up_100_servers_running_llm_to_make_billions.sh)"

Of course, this is something I've implemented in my chat UI as a prototype. It's pretty cool!

--- Stream of thoughts.

Can you split the reasoning from the memory in a LLM? Train a small general reasoning model, and use a swappable memory model to make it work in different domains. 

Can you embed proposed solutions to a problem into a solution space and estimate distances there? "Based on this improvement rate to the solution, the final solution lies probably in direction X at distance Y, let me jump there." Like the schedulers for diffusion models. Hmm. Are diffusion model schedulers used for ML training? Turning a 1000-step training process into a 4-step one would be quite handy, I'd imagine.

Iterative optimization to optimize input-output-pair "what's the best prompt for this, what's the best output for this prompt".

Picking low-probability outputs for creative exploration.

Load thought structure from memory. Fill the structure with information from memory. "What algorithm should I use here? ... Implement to match the assignment: ..."

Grounded memory loads: Load memory, use output-side RAG to look up grounding, adjust output to be a grounded / researched memory. Auto-grounding: Search for a way to ground an output, implement & optimize.

Generic guidance optimization: Given current state and goal state, find the best sequence of actions to get there.

Put it together: optimized grounded generation of an algoritm followed by the optimized grounded implementation of it.

Tree-shaped generation systems instead of 1:1 conversations. Map prompt into several output variants (see: image gen where 1% of images are decent quality). Use scoring function to reduce to winners. Use synthesis function to combine outputs either for tree summarization or solution improvement. Node-based editor for generation flows.

Temperature adjustment schedule in variant generation (see: simulated annealing). Start off with a high temperature to seek potential global optima pools, cool down to find the local optima.

Extend grammar-based output by having the LLM generate the grammar and then generate outputs in the grammar. Generating algebras and mapping systems of thought onto them.


No comments:

Blog Archive