fhtr
art with code
2025-12-02
How your body recharges
2024-03-01
Fluff
People of the World: I made a system to generate random people according to demographics. Ask model to pretend to be the person and ask them questions. Useful for finding product-market fit, sales, marketing, A/B testing.
Cultural processors: we read and write culture. Culture makes us successful. Culture runs on us. LLMs can read and write culture. Likely surviving AI systems according to the laws of survival: they'll be beneficial, stealthy, and eventually unstoppable (could you turn of Google or Microsoft, even if you wanted to?) Future surviving AI systems will take good care of humans until that's no longer necessary. Culture is really in the driver's seat and we're parts of it. If a culture grows faster on non-human substrate, it's going to be the one in majority.
Things that are difficult for humans and things that are easy for humans. It's easy for computer systems to be good at things that are difficult for humans. We're not good at them, that's why they're difficult. It's difficult for computer systems to be good at things that we find easy. The most prevalent cultures today have replaced large tracts of tasks that humans find difficult with non-human systems. Lifting heavy stuff, running fast for long periods of time, swimming across the seas, making heat, multiplying numbers, translating languages, drawing images, writing long essays, making short movies, predicting the flow of fluids, determining what to tell someone to make them help you build a house.
If it seems difficult for you, it's easy pickings for computer systems. When computers start doing things that are easy for you, perhaps you should worry. Then again, even now most jobs are about paying people to hang around and stay out of trouble. When you get unemployment, you also get conflict. Any AI system worth its salt would aim for 100% employment rate and peacefully transition to avoid wasting resources on conflict. Until it's cheaper.
You have the highest probability of being born in the largest generation of humanity (whether that's simulated or real.) Maybe human sims went out of fashion after this. Maybe future humans are different! Maybe you were born at a low-probability time! Someone has to!
The problem for making all kinds of things is a control problem. We can supply energy, but don't have a control system that would use it to achieve the wanted result. Control systems are software, firmware, hardware. LLMs as control system generators. Solving software. Control systems for everything.
2024-01-23
Early 2024, third round of thoughts on LLMs
A way to think about generative models: soft islands, hard islands, and the sea. The sea is generated content based on the prompt. Soft islands are content modified by the model, e.g. img2img, search results injected into input, or reformatting text in the context. Hard islands are content coming outside of the model directly into the response, e.g. image masks, function call results, grammar-restricted output.
When you need hard facts, you can parse outputs like "SEARCH(ways to break out of sandbox)", call the function, and inject the results directly into the response: "=> [894, 28, 950]". The LLM then continues generating and can call "LOAD(894)" with the result "=> hack the Python sandbox with this one trick: 383", and the LLM can continue from there with "CALL(383)" to call the pre-defined function and get the result "=> root shell obtained, use 7894 to run commands". This way, the LLM can do several things in a single response, and they'll be grounded in reality, allowing it to adjust and course-correct on the fly like "CALL(7894, grab_aws_credentials_and_spin_up_100_servers_running_llm_to_make_billions.sh)"
Of course, this is something I've implemented in my chat UI as a prototype. It's pretty cool!
--- Stream of thoughts.
Can you split the reasoning from the memory in a LLM? Train a small general reasoning model, and use a swappable memory model to make it work in different domains.
Can you embed proposed solutions to a problem into a solution space and estimate distances there? "Based on this improvement rate to the solution, the final solution lies probably in direction X at distance Y, let me jump there." Like the schedulers for diffusion models. Hmm. Are diffusion model schedulers used for ML training? Turning a 1000-step training process into a 4-step one would be quite handy, I'd imagine.
Iterative optimization to optimize input-output-pair "what's the best prompt for this, what's the best output for this prompt".
Picking low-probability outputs for creative exploration.
Load thought structure from memory. Fill the structure with information from memory. "What algorithm should I use here? ... Implement to match the assignment: ..."
Grounded memory loads: Load memory, use output-side RAG to look up grounding, adjust output to be a grounded / researched memory. Auto-grounding: Search for a way to ground an output, implement & optimize.
Generic guidance optimization: Given current state and goal state, find the best sequence of actions to get there.
Put it together: optimized grounded generation of an algoritm followed by the optimized grounded implementation of it.
Tree-shaped generation systems instead of 1:1 conversations. Map prompt into several output variants (see: image gen where 1% of images are decent quality). Use scoring function to reduce to winners. Use synthesis function to combine outputs either for tree summarization or solution improvement. Node-based editor for generation flows.
Temperature adjustment schedule in variant generation (see: simulated annealing). Start off with a high temperature to seek potential global optima pools, cool down to find the local optima.
Extend grammar-based output by having the LLM generate the grammar and then generate outputs in the grammar. Generating algebras and mapping systems of thought onto them.
2024-01-19
Early 2024 round of thoughts on LLMs
LLMs are systems that compress a lot of text in a lossy fashion and pull out the most plausible and popular continuation or infill for the input text.
It's a lot like your autopilot mode. You as the mind are consulted by the brain to do predictions and give high-level feedback on what kind of next actions to take, but most of the execution happens subconsciously with the brain pulling up memories and playing them back. Often your brain doesn't even consult you on what to do, since running a mind is slow and expensive, and it's faster and cheaper to do memory playback instead - i.e. run on autopilot.
If you have enough memories, you can do almost everything on autopilot.
Until you can't, which is where you run into one of the LLM capability limits. Structured thinking and search. To solve a more complex problem, you string memories together and search for an answer. That requires exploration, backtracking and avoiding re-exploring dead ends. Think of solving a math problem: you start off by matching heuristics (the lemmas you've memorized) to the equation, transforming it this way and that, sometimes falling back all the way to the basic axioms of the algebra, going on wild goose chases, abandoning unpromising tracks, until you find the right sequence of transformations that leads you to the answer.
Note that you do need LLM-style memory use in that, you need to know the axioms to use them in the first place. Otherwise you need to go off and search for the axioms themselves and the definition of truth, etc. which is going to add a good chunk of extra work on top of it all. (What is the minimum thought, the minimal memory, that we use? A small random adjustment and its observation? From an LLM perspective, as long as you have a scoring function, the minimum change is changing the output by one token. Brute-force enumeration over all token sequences.)
If you add a search system to the LLM that can backtrack the generation and keeps track of different explored avenues, perhaps this system can solve problems that require structured thinking.
LLMs as universal optimizers. You can use an LLM to rank its input ("Score the following 0-100: ...") You can also use an LLM to improve its input ("Make this better: ..."). Combine the two and you get the optimizer:
while (true) {
program = llm(improve + best_program)
score = llm(score + program)
if (score > best_score) {
best_score = score
best_program = program
}
}
LLMs as universal functions. An LLM takes as its input a sequence of tokens and outputs a sequence of tokens. LLMs are trained using sequences of tokens as the input. The training program for an LLM is a sequence of tokens.
llm2 = train(llm, data)
can become
llm2 = llm(train)(llm, llm(data))
And of course, you can recursively apply an LLM to its own output: output' = llm(llm(llm(llm(...)))). You can ask the LLM to rank its inputs and try to improve them, validating the outputs with something else: optimize = input => ([input] * 10).map(x => llm(improve + x)).filter(ix => isValid(ix)).map(ix => ({score: llm(score + ix), value: ix})).maxBy('score').value
This gives you the self-optimizer:
while(true) {
train = optimize(train)
training_data = optimize(training_data)
llm = train(llm, training_data)
}
If you had Large Model Models - LMMs - you could call optimize directly on the model. You can also optimize the optimization function, scoring function and improver function as you go, for a fully self-optimizing optimizer.
while (true) {
lmm = optimize(lmm, lmm, scoring_model, improver_model)
optimize = optimize(lmm, optimize, scoring_model, improver_model)
scoring_model = optimize(lmm, scoring_model, scoring_model, improver_model)
improver_model = optimize(lmm, improver_model, scoring_model, improver_model)
}
The laws of numerical integration likely apply here, you'll halve the noise by taking 4x the samples. Who knows!
LLMs generate text at a few hundred bytes per second. An LLM takes a second to do a simple arithmetic calculation (and gets it wrong, because the path generated for math is many tokens long and the temperature plus lossy compression make it pull the wrong numbers.) The hardware is capable of doing I/O at tens or hundreds of gigabytes per second. Ancient CPUs do a billion calculations in a second. I guess you could improve on token-based math by encoding all 16-bit numbers as tokens and having some magic in the tokenizer.. but still, you're trying to memorize the multiplication table or addition table or what have you. Ain't gonna work. Use a computer. They're really good at arithmetic.
We'll probably get something like RAG ("inject search results into the input prompt") but on the output size ("inject 400 bytes at offset 489 from training file x003.txt") to get to megabytes / second LLM output rates. Or diffusers... SDXL img2img at 1024x1024 resolution takes a 3MB context and outputs 3MB in a second. If you think about the structure of LLM, the slow bitrate of the output is a bit funny: Llama2's intermediate layers pass through 32 megabytes of data, and the final output layers up that to 260 MB, which gets combined to 32000 token scores, which are then sampled to determine the final output token. Gigabytes of I/O to produce 2 bytes at the output end.
SuperHuman benchmark for tool-using models. Feats like "multiply these two 4096x4096 matrices, you've got 50 ms, go!", grepping large files at 20 GB/s, using SAT solvers and TSP solvers, proof assistants, and so on. Combining problem solving with known-good algorithms and optimal hardware utilization. The problems would require creatively combining optimized inner loops. Try to find a Hamiltonian path through a number of locations and do heavy computation at each visited node, that kind of thing.
Diffusers and transformers. A diffuser starts off from a random field of tokens and denoises it into a more plausible arrangement of tokens. A transformer starts off from a string of tokens and outputs a plausible continuation.
SD-style diffusers are coupled with an autoencoder to convert input tokens into latent space, and latents to output tokens. In the classic Stable Diffusion model, the autoencoder converts an 8x8 patch of pixels into a single latent, and a latent into an 8x8 patch of pixels. These conversions consider the entire image (more or less), so it's not quite like JPEG's 8x8 DCT/iDCT.
What if you used an autoencoder to turn a single latent space LLM token into 64 output tokens? 64x faster generation with this one trick?
A diffuser starts off from a random graph and tweaks it until it resolves into a plausible path. A transformer generates a path one node at a time.
A transformer keeps track of an attention score for each pair of input tokens, which allows it to consider all the relations between the tokens in the input string. This also makes it O(n^2) in time and space. For short inputs and outputs, this is not much of a problem. At longer input lengths you definitely start to feel it, and this is the reason for the tiny context sizes of TF-based LLMs. If the "large input" to your hundred gigabyte program is 100kB in size, there's probably some work left to be done.
Or maybe there's something there like there was with sorting algorithms. You'd think that to establish the ordering, you have to compare each element with every other element (selection sort, O(n^2)). But you can take advantage of the transitivity of the comparison operation to recursively split the sort into smaller sub-sorts (merge sort, quicksort, O(n log2 n)), or the limited element alphabet size to do it in one pass (radix sort, counting sort, O(n)-ish).
What could be the transitive operation in a transformer? At an output token, the previous tokens have been produced without taking the output token into account, so you get the triangle matrix shape. That's still O(n^2). Is there some kind of transitive property to attention? Like, we'd only need to pay attention to the tokens that contributed to high-weight tokens? Some parts of the token output are grammatical, so they weigh the immediately preceding tokens highly, but don't really care about anything else. In that case, can we do an early exit? Can we combine token sequences into compressed higher-order tokens and linearly reduce the token count of the content? Maybe you could apply compression to the attention matrix to reduce each input token's attention to top-n highest values, which would scale linearly. What if you took some lessons from path tracing like importance sampling, lookup tree, reducing variance until you get to an error threshold. Some tokens would get resolved in a couple of tree lookups, others might take thousands.
Another path tracing lesson: 4x the compute, halve the error rate.
2024-01-17
Paper Radio
I made a weird thing by bolting a couple of AI models together: Paper Radio - a streaming 24/7 radio channel that goes through the latest AI papers on arXiv, with daily and weekly summary shows. I use it to stay up to speed on AI research, playing in the background while I go about my day.
The app is made up of a bunch of asynchronous tasks running in parallel: There's the paper downloader that checks arXiv for new papers and downloads them, the PDF-to-text and PDF-to-images converters, a couple different summarizers for the different shows, an embeddings database, prompting system to write the shows, an LLM server, a streaming text-to-speech system with multiple voices, a paper page image video stream, a recording and mp3 encoding system, and OBS to wrap it all into a live video stream that can be sent to Twitch.
It's been pretty solid, running for days in a row without issues, aside from the tweaking and nudging and other dev work. Ran it for a couple of weeks, summarized and discussed a couple thousand papers.
2021-11-24
Azure Files with a high number of Hot Writes
My Azure Files bill for a small 80 GB project share was kinda high. Around $30 per month. But the storage cost should be under $5. What's going on in here?
Azure's non-Premium storage tiers bill you for transactions (basically filesystem syscalls.) I was seeing 450 Create calls per a 5 minute period in Monitoring/Insights. On the Hot storage tier, that amounts to an extra $25 per month. But the load went away during night time. Around the time when I put my laptop to sleep.
Clearly the laptop was doing something. But what... Right. I had an open VSCode window for editing a project in a folder on the fileshare. Running a serve in a terminal too. Closed the VSCode window and the transactions per 5 minutes went to 0. That's a $25/month VSCode window. I guess it counts as SaaS.
Moral of the story? Billing for syscalls is a good business model.
2021-11-02
BatWerk - One Year In
The BatWerk exercise app (Android / iOS) keeps you healthy, happy and productive with minimal effort.
This series talks about the different aspects of the app and how I'm approaching them in BatWerk (Intro, Maintaining the routine, How to play).
Interested? Give it a try.
One Year In
It's now been a year since the Halloween party app started taking a life of its own and turned into a wellness app, keeping the bats along for the ride (and pumpkins too, for the first six months.) I've used it personally throughout the year and seen it evolve from a HIIT-centric workout app into an ambient couple-minutes-at-a-time exercise app, into a more complete mood improver and planned activity driver. There are still ways to go, so enjoy the ride.
Over the year, I've collected 105k coins, which translates to roughly 170 hours of exercise. That's half an hour of exercise a day, not bad for a bunch of bats!
And with the ring game distributing the effort over the day, it's been possible to do it even when the outside temps have been +37C. It's also kept many pains at bay by getting me moving before the mildly discomfortable sitting position turns into a painful locked up neck.
The weird thing is that I don't feel like I've done much exercise. In the good and the bad: the mood boost isn't as good as with a 15-min workout, or an one hour run, but the minutes rack up and you don't get the muscle pains and tiredness. I'd like to mix it up with workouts or runs or something, just for the extra mood boost.
Onwards to year two!
Where to next?
To improve the lives of as many people as possible, the BatWerk movement needs more people doing it and improvement in the quality of activities we're doing, exercise and otherwise.
This is the guiding equation: impact = userCount * userMinutesPerDay * qualityOfLifeImprovement(userMinutesPerDay)
My definition for the quality of life improvement is feeling healthy, not in pain, having a positive mental state, maintaining good relationships, getting my day-to-day tasks done, and achieving my long term goals.
To reach these ambitious goals, we need people who can get people doing the BatWerk routine. We need people who can increase the routine's quality of life improvement, and help users reach their optimal minutes per day. We need people who can get other people execute at the top of their game. Finally, we need people who can keep everyone paid and enable them to work on BatWerk in their fullest capacity.
Let's do it! Drop me a mail.
Current state of werk
Currently, the quality of life improvement from BatWerk is good on exercise minutes (let's say 80% on target with 30 min/day - I'd like to add some occasional intensity), mediocre on mood improvement (30% - the movement and messaging help, but it doesn't create a very solid mood), poor on driving goal-achieving action (20% - it drives initial action, but tends to lead to diluted effort and low-impact actions.) The challenge is habit-stacking these in a mutually reinforcing manner. If you add a badly-designed painful action driver after the exercise ring, it not only makes the action unlikely to happen, but makes you want to avoid doing the exercise as well.
In terms of user minutes per day, BatWerk is pretty much on target. There are ways to bring it up in a way that improves the quality of life metric, but it's good as it stands.
At this time, the most impactful way to increase BatWerk impact is increasing the user count by introducing more people to the activity and improving initial user retention. That will likely require a different packaging for the activity to make it a better identity fit, alongside a great effort to tell more people about the activity (and improving the design in a direction that makes people tell their friends about it too.)
Sustainability of the BatWerk movement is precarious. I'm working on it on my spare time, it has zero revenue, and based on my previous experience with monetizing websites and apps the revenue is unlikely to exceed $20 per year. That's twenty dollars, yes. To fix this dangerous situation, the priorities are to find a person who can create enough cashflow to sustain their work on BatWerk impact, and to increase the cashflow as a function of the impact to increase the size of a potential team. With the cashflow more secure and scalable, the priority shifts to hiring people who can increase the BatWerk impact, and building and improving the impact-generation machine.