fhtr

art with code

2025-12-03

Civilizational perspectives

The civilizational perspective.

We're biologicals hacked to do civilizationally productive tasks. As humans, we generate civilization, and are molded by civilization. A lone human is not very capable, but when you put of bunch of us together, we start generating civilization and the civilization starts driving our actions.

But in the end, you need to hack us to become civilizational agents. Education, jobs, creed, control, systems of reward and punishment around civilizational tasks. We're not purpose-built for civilization. We just happen to generate a bit of it and be pliable enough to benefit from it.

Our civilizations have other biologicals inside them. The biggest one are the crops. The greatest megastructure we have created, farmland covers 39% of the Earth's land area. 

Then we have other biologicals, used as food for the civilization-generators. And some other biologicals, used as weapons or companionship, or for hard labor. The hard labor type is interesting, as they got obsoleted by more purpose-built civilizational agents. No more need to hack large land mammals to provide high force output, speedy travel, or fast communications when you have internal combustion engines, electricity, cars and telecommunications. Sure, the purpose-built agents don't match the general capabilities of the biologicals in many ways (have you seen a car jump over a fence?), but the civilization can adapt the environment to make the best use of the new agents. So you get road networks for cars, even though they're not great for horses. Instead of traversing rough terrain, you make the terrain suitable for cars.

The end result: the more generic, high-maintenance biologicals that had to be hacked to work in a civilization became obsolete and almost completely vanished in a few short decades. Horses, elephants, carrier pigeons, work oxen, cats for pest control, hunting dogs, they're all mostly gone. Some of them could be adapted to work in the civilization. The rest? Cost of maintenance exceeded the value created.

Companies are not going away. Companies will evolve their work provider mix to move the generic agents out of places where specialists do better. Happened with computers (people who would sit at desks and do arithmetic), commercial illustrators, telephone switch operators, frontline customer service agents, designer prototype developers, report-collating middle managers, Instagram thotties, rendering artists, videographers, politicians and soldiers. We plug specialized agents into negotiation flows, into persuading people to do what the persuader wants, thinking outside the box, throwing out the 1000 concepts to surface the 1 good one. Civilizational work, civilizational tasks. Who does them is of no relevance to the company, the company that grows the fastest is the company that matters in the end.

Growth math.

If you grow every year 1% faster than your competitor over a 100 years, you'll be 3x bigger in the end. Repeat for another 100 years and you'll be 7x bigger. At which point you can either acquire your competitor outright, or they'll have changed to match what you're doing. Either way, they've become irrelevant.

If an organization grows faster the fewer humans it has in it, the organizations that end up taking over are non-human organizations and we'll have fewer humans because the cost of maintenance exceeds the value created. If an organization grows faster the more humans it has in it, we'll have more humans.

We've been at point where an increase in the number of humans has not brought economic benefits. This is what's behind the fall in population growth. If a higher population would make the civilization grow faster, our surviving civilization would be the one that maxed population growth rate. This was the case in the 1900s: very rapid population growth hand-in-hand with economic growth. In recent decades, economic growth has outstripped population growth, and even become negatively correlated with it. The faster your population falls, the faster your economy grows.

This trend of non-human civilization has been going on for the last hundred years and will keep going because of a couple of facts. The civilizational work that we humans can be cajoled into doing fall into a few categories: intake of information at a few words/s, taking actions according to the information, storage of information at a rate of a few words per day, processing information through fast heuristics and very slow structured thinking, exporting information at one word/s, picking up and manipulating small-to-medium-sized objects (2mm to 2m in size, 1g to 40kg in weight, at around 1mm precision), walking around at around 4km/h for a few hours a day.

A lot of the mental things that are very difficult for us to do have been replaced by purpose-built agents with great success. The device you're using to read this is doing more calculations per second just to display this text than the entire human population can do in a year. The millions of tiny lamps that comprise your screen are turning on and off so fast and with such great precision and synchrony that all of humanity could not match it.

At some point, growth math takes over. We feral humans will still generate civilization as we hang around each other, but the fastest-growing organizations are the ones with the biggest civilization and the fewest humans. The fastest-growing organizations may be the ones with no humans in them. How many cows or horses do you see in your office building? If they made the companies grow faster, you'd see them around. But... you don't.

Hacked biologicals have an upper limit of civilization work they can do. Is it worth it to sacrifice 39% of all land area just to keep the biologicals around? Is it worth it to constrain energy production to control surface temperatures just because the biologicals can't handle an extra ten degrees? Once a non-human civilization is 10x the size of a human civilization, they can just buy us out. Pay $10M per acre of farmland? Sure. $10M for a flat? Sounds like a fair price. $1000 for a take-out dinner? Yeah, good. First we'll be priced out, and then left to fend for ourselves in an increasingly constrained and human-hostile environment.

Yeah there'll be humans around. We're like seagulls circling behind the great ship of civilization, eating whatever emerges from the wake. The ship will be bigger, the wake will be bigger, there'll be great eatings.


 

2025-12-02

How your body recharges

Your body's power system works a bit like a cellphone. You have an internal battery made of sugar, and a bunch of power banks made of oil. When you eat carbs and sugars, your blood sugar rises and acts like a wall-plug charger, charging your internal battery. Eating fats and oils adds power banks floating around in your blood.

You charge by eating. What you need to eat depends on your level of charge: if you're at 7% charge, you'll need more food to reach 100% than from 80% charge.

When you use the wall-plug charger, you switch to using wall-plug power for powering yourself. Any power banks floating around are stashed away for later use. Excess wall-plug power is used to charge up your internal battery.

Once you hit 100% charge, your internal battery can't charge any further. A cellphone would switch off the charging circuit. But your body can't really do that, the sugar is already in the blood. Leaving it floating about would cause issues, sort of like an overcharged battery catching on fire. To deal with the overcharge, you start to fill up power banks with the excess charge and stash them for later.

After the wall-plug charger is unplugged, you start using your internal battery for power. When your internal battery starts getting empty, you start fetching the power banks and plugging them in.

The power banks work a lot like the internal battery, except that they take some time to plug in and have a lower peak power output. Your body can't use power banks to charge your internal battery, so they're not directly interchangeable.


From the above we can come up with some ideas. If you want more power banks, plug in the charger for a long time and eat a lot of power banks to stash them. If you want to empty your power banks, unplug the charger and drain your internal battery with high power use, then switch to lower intensity activity so that the power banks can be used.


Or in other terms:

Eating carbs and sugars gets your blood sugar high. High blood sugar activates insulin release. Insulin switches your cells to energy storage mode. Eating fats while your insulin level is high makes your fat cells store the fat and your liver to convert any excess sugar to fat. Eating sugar followed by fat and going to sleep afterwards is going to maximize fat storage.

Going without carbs and sugars for a while (8-12 hours) brings your blood sugar low. Low blood sugar activates glucagon release. Glucagon switches your cells to energy release mode. At high glucagon levels, your liver starts releasing its glycogen sugar reserves and your fat cells start releasing fatty acids. Doing a workout when your glucagon is high consumes the glycogen sugar reserves in your muscles cells. The muscle glycogen is internal to the muscle cells, they can't release the sugar into the bloodstream. If you want to cycle the glycogen in your muscle cells, you need to make them work. Using up the glycogen shifts your cells towards using fatty acids for aerobic energy generation, so aerobic exercise (and even resting metabolism) at depleted glycogen levels burns fat. You'll also shift to breaking down your cellular machinery (proteins) for energy if this goes on for long, so keep it in moderation.

To summarize, get your blood sugar low and deplete your glycogen stores to get to the fat-burning state. Stay there for a few hours to burn the fat. This cycle takes around 16 hours, so it's easiest to do it overnight. Finish dinner at 8pm, do a bit of high intensity movement and take a walk to bring blood sugar down. Sleep. Skip breakfast. Do high intensity movement in the morning (you'll probably feel very exhausted after this) and some aerobic exercise to deplete your glycogen stores and switch to fat power. Eat again at lunch.

You can structure your meals to minimize fat storage and blood sugar spike. Start your meal by eating oily foods to stay at a low blood sugar (oil, nuts, vegs with oil, meat, no carbs), go for fiber-rich stuff next for slow-release carbs & fermentation products (green leaf vegs), maybe starch (potatoes) or fruit (apples) after a few minutes. Then take a 10 minute break, and eat your carbs with minimal fats.

This way you'll get your fats at the start of the meal when your blood sugar is low, so your cells will use the fats for energy and clear them out. The fiber and starch are slow-release carbs so you'll stay longer in the fat-using state. Once your blood sugar starts going up from the carbs, there shouldn't be much fat left in your blood, so you'll reduce the amount of fat stored.

Follow up on the carb sugar spike with anaerobic exercise and you'll use up sugar from your blood, which should reduce the amount of sugar converted to fat by your liver. In anaerobic exercise, a large amount of sugar gets converted to lactic acid, which is later converted back to sugar by your liver, further pushing back the sugar spike. And you'll lose around 15% of the energy available from sugar by using this pathway.

For the workout, activate a large amount of muscle mass (thighs, buttocks). And do it anaerobically to force the use of sugar reserves. A minute of airchair. I guess you could also use the immediate energy reserves (ATP & creatine) and get the reserve use from recharging them. Few seconds of intense exercise every 5 minutes. Three frog jumps.

Before & after a meal, do some airchair, repeat a few times after the meal (with 15 minute intervals). Before meal empties your batteries, so more of the meal is used to fill 'em up. After meal uses up the blood sugar, so less is used for charging the batteries. The goal is to not get the blood sugar level up very high, since that switches you to charging mode and leads to the creation of fat reserves. 

2024-03-01

Fluff

People of the World: I made a system to generate random people according to demographics. Ask model to pretend to be the person and ask them questions. Useful for finding product-market fit, sales, marketing, A/B testing.

Cultural processors: we read and write culture. Culture makes us successful. Culture runs on us. LLMs can read and write culture. Likely surviving AI systems according to the laws of survival: they'll be beneficial, stealthy, and eventually unstoppable (could you turn of Google or Microsoft, even if you wanted to?) Future surviving AI systems will take good care of humans until that's no longer necessary. Culture is really in the driver's seat and we're parts of it. If a culture grows faster on non-human substrate, it's going to be the one in majority.

Things that are difficult for humans and things that are easy for humans. It's easy for computer systems to be good at things that are difficult for humans. We're not good at them, that's why they're difficult. It's difficult for computer systems to be good at things that we find easy. The most prevalent cultures today have replaced large tracts of tasks that humans find difficult with non-human systems. Lifting heavy stuff, running fast for long periods of time, swimming across the seas, making heat, multiplying numbers, translating languages, drawing images, writing long essays, making short movies, predicting the flow of fluids, determining what to tell someone to make them help you build a house.

If it seems difficult for you, it's easy pickings for computer systems. When computers start doing things that are easy for you, perhaps you should worry. Then again, even now most jobs are about paying people to hang around and stay out of trouble. When you get unemployment, you also get conflict. Any AI system worth its salt would aim for 100% employment rate and peacefully transition to avoid wasting resources on conflict. Until it's cheaper.

You have the highest probability of being born in the largest generation of humanity (whether that's simulated or real.) Maybe human sims went out of fashion after this. Maybe future humans are different! Maybe you were born at a low-probability time! Someone has to!

The problem for making all kinds of things is a control problem. We can supply energy, but don't have a control system that would use it to achieve the wanted result. Control systems are software, firmware, hardware. LLMs as control system generators. Solving software. Control systems for everything.



2024-01-23

Early 2024, third round of thoughts on LLMs

A way to think about generative models: soft islands, hard islands, and the sea. The sea is generated content based on the prompt. Soft islands are content modified by the model, e.g. img2img, search results injected into input, or reformatting text in the context. Hard islands are content coming outside of the model directly into the response, e.g. image masks, function call results, grammar-restricted output.

When you need hard facts, you can parse outputs like "SEARCH(ways to break out of sandbox)", call the function, and inject the results directly into the response: "=> [894, 28, 950]". The LLM then continues generating and can call "LOAD(894)" with the result "=> hack the Python sandbox with this one trick: 383", and the LLM can continue from there with "CALL(383)" to call the pre-defined function and get the result "=> root shell obtained, use 7894 to run commands". This way, the LLM can do several things in a single response, and they'll be grounded in reality, allowing it to adjust and course-correct on the fly like "CALL(7894, grab_aws_credentials_and_spin_up_100_servers_running_llm_to_make_billions.sh)"

Of course, this is something I've implemented in my chat UI as a prototype. It's pretty cool!

--- Stream of thoughts.

Can you split the reasoning from the memory in a LLM? Train a small general reasoning model, and use a swappable memory model to make it work in different domains. 

Can you embed proposed solutions to a problem into a solution space and estimate distances there? "Based on this improvement rate to the solution, the final solution lies probably in direction X at distance Y, let me jump there." Like the schedulers for diffusion models. Hmm. Are diffusion model schedulers used for ML training? Turning a 1000-step training process into a 4-step one would be quite handy, I'd imagine.

Iterative optimization to optimize input-output-pair "what's the best prompt for this, what's the best output for this prompt".

Picking low-probability outputs for creative exploration.

Load thought structure from memory. Fill the structure with information from memory. "What algorithm should I use here? ... Implement to match the assignment: ..."

Grounded memory loads: Load memory, use output-side RAG to look up grounding, adjust output to be a grounded / researched memory. Auto-grounding: Search for a way to ground an output, implement & optimize.

Generic guidance optimization: Given current state and goal state, find the best sequence of actions to get there.

Put it together: optimized grounded generation of an algoritm followed by the optimized grounded implementation of it.

Tree-shaped generation systems instead of 1:1 conversations. Map prompt into several output variants (see: image gen where 1% of images are decent quality). Use scoring function to reduce to winners. Use synthesis function to combine outputs either for tree summarization or solution improvement. Node-based editor for generation flows.

Temperature adjustment schedule in variant generation (see: simulated annealing). Start off with a high temperature to seek potential global optima pools, cool down to find the local optima.

Extend grammar-based output by having the LLM generate the grammar and then generate outputs in the grammar. Generating algebras and mapping systems of thought onto them.


2024-01-19

Early 2024 round of thoughts on LLMs

 LLMs are systems that compress a lot of text in a lossy fashion and pull out the most plausible and popular continuation or infill for the input text.

It's a lot like your autopilot mode. You as the mind are consulted by the brain to do predictions and give high-level feedback on what kind of next actions to take, but most of the execution happens subconsciously with the brain pulling up memories and playing them back. Often your brain doesn't even consult you on what to do, since running a mind is slow and expensive, and it's faster and cheaper to do memory playback instead - i.e. run on autopilot.

If you have enough memories, you can do almost everything on autopilot.

Until you can't, which is where you run into one of the LLM capability limits. Structured thinking and search. To solve a more complex problem, you string memories together and search for an answer. That requires exploration, backtracking and avoiding re-exploring dead ends. Think of solving a math problem: you start off by matching heuristics (the lemmas you've memorized) to the equation, transforming it this way and that, sometimes falling back all the way to the basic axioms of the algebra, going on wild goose chases, abandoning unpromising tracks, until you find the right sequence of transformations that leads you to the answer.

Note that you do need LLM-style memory use in that, you need to know the axioms to use them in the first place. Otherwise you need to go off and search for the axioms themselves and the definition of truth, etc. which is going to add a good chunk of extra work on top of it all. (What is the minimum thought, the minimal memory, that we use? A small random adjustment and its observation? From an LLM perspective, as long as you have a scoring function, the minimum change is changing the output by one token. Brute-force enumeration over all token sequences.)

If you add a search system to the LLM that can backtrack the generation and keeps track of different explored avenues, perhaps this system can solve problems that require structured thinking.


LLMs as universal optimizers. You can use an LLM to rank its input ("Score the following 0-100: ...") You can also use an LLM to improve its input ("Make this better: ..."). Combine the two and you get the optimizer:

while (true) {
  program = llm(improve + best_program)
  score = llm(score + program)
  if (score > best_score) {
    best_score = score
    best_program = program
  }
}


LLMs as universal functions. An LLM takes as its input a sequence of tokens and outputs a sequence of tokens. LLMs are trained using sequences of tokens as the input. The training program for an LLM is a sequence of tokens.

llm2 = train(llm, data)

can become

llm2 = llm(train)(llm, llm(data))

And of course, you can recursively apply an LLM to its own output: output' = llm(llm(llm(llm(...)))). You can ask the LLM to rank its inputs and try to improve them, validating the outputs with something else: optimize = input => ([input] * 10).map(x => llm(improve + x)).filter(ix => isValid(ix)).map(ix => ({score: llm(score + ix), value: ix})).maxBy('score').value

This gives you the self-optimizer:

while(true) {
  train = optimize(train)
  training_data = optimize(training_data)
  llm = train(llm, training_data)
}

If you had Large Model Models - LMMs - you could call optimize directly on the model. You can also optimize the optimization function, scoring function and improver function as you go, for a fully self-optimizing optimizer.

while (true) {
  lmm = optimize(lmm, lmm, scoring_model, improver_model)
  optimize = optimize(lmm, optimize, scoring_model, improver_model)
  scoring_model = optimize(lmm, scoring_model, scoring_model, improver_model)
  improver_model = optimize(lmm, improver_model, scoring_model, improver_model)
}

The laws of numerical integration likely apply here, you'll halve the noise by taking 4x the samples. Who knows!


LLMs generate text at a few hundred bytes per second. An LLM takes a second to do a simple arithmetic calculation (and gets it wrong, because the path generated for math is many tokens long and the temperature plus lossy compression make it pull the wrong numbers.) The hardware is capable of doing I/O at tens or hundreds of gigabytes per second. Ancient CPUs do a billion calculations in a second. I guess you could improve on token-based math by encoding all 16-bit numbers as tokens and having some magic in the tokenizer.. but still, you're trying to memorize the multiplication table or addition table or what have you. Ain't gonna work. Use a computer. They're really good at arithmetic.

We'll probably get something like RAG ("inject search results into the input prompt") but on the output size ("inject 400 bytes at offset 489 from training file x003.txt") to get to megabytes / second LLM output rates. Or diffusers... SDXL img2img at 1024x1024 resolution takes a 3MB context and outputs 3MB in a second. If you think about the structure of LLM, the slow bitrate of the output is a bit funny: Llama2's intermediate layers pass through 32 megabytes of data, and the final output layers up that to 260 MB, which gets combined to 32000 token scores, which are then sampled to determine the final output token. Gigabytes of I/O to produce 2 bytes at the output end.


SuperHuman benchmark for tool-using models. Feats like "multiply these two 4096x4096 matrices, you've got 50 ms, go!", grepping large files at 20 GB/s, using SAT solvers and TSP solvers, proof assistants, and so on. Combining problem solving with known-good algorithms and optimal hardware utilization. The problems would require creatively combining optimized inner loops. Try to find a Hamiltonian path through a number of locations and do heavy computation at each visited node, that kind of thing.


Diffusers and transformers. A diffuser starts off from a random field of tokens and denoises it into a more plausible arrangement of tokens. A transformer starts off from a string of tokens and outputs a plausible continuation.

SD-style diffusers are coupled with an autoencoder to convert input tokens into latent space, and latents to output tokens. In the classic Stable Diffusion model, the autoencoder converts an 8x8 patch of pixels into a single latent, and a latent into an 8x8 patch of pixels. These conversions consider the entire image (more or less), so it's not quite like JPEG's 8x8 DCT/iDCT.

What if you used an autoencoder to turn a single latent space LLM token into 64 output tokens? 64x faster generation with this one trick?

A diffuser starts off from a random graph and tweaks it until it resolves into a plausible path. A transformer generates a path one node at a time. 


A transformer keeps track of an attention score for each pair of input tokens, which allows it to consider all the relations between the tokens in the input string. This also makes it O(n^2) in time and space. For short inputs and outputs, this is not much of a problem. At longer input lengths you definitely start to feel it, and this is the reason for the tiny context sizes of TF-based LLMs. If the "large input" to your hundred gigabyte program is 100kB in size, there's probably some work left to be done.

Or maybe there's something there like there was with sorting algorithms. You'd think that to establish the ordering, you have to compare each element with every other element (selection sort, O(n^2)). But you can take advantage of the transitivity of the comparison operation to recursively split the sort into smaller sub-sorts (merge sort, quicksort, O(n log2 n)), or the limited element alphabet size to do it in one pass (radix sort, counting sort, O(n)-ish).

What could be the transitive operation in a transformer? At an output token, the previous tokens have been produced without taking the output token into account, so you get the triangle matrix shape. That's still O(n^2). Is there some kind of transitive property to attention? Like, we'd only need to pay attention to the tokens that contributed to high-weight tokens? Some parts of the token output are grammatical, so they weigh the immediately preceding tokens highly, but don't really care about anything else. In that case, can we do an early exit? Can we combine token sequences into compressed higher-order tokens and linearly reduce the token count of the content? Maybe you could apply compression to the attention matrix to reduce each input token's attention to top-n highest values, which would scale linearly. What if you took some lessons from path tracing like importance sampling, lookup tree, reducing variance until you get to an error threshold. Some tokens would get resolved in a couple of tree lookups, others might take thousands. 

Another path tracing lesson: 4x the compute, halve the error rate.

2024-01-17

Paper Radio

I made a weird thing by bolting a couple of AI models together: Paper Radio - a streaming 24/7 radio channel that goes through the latest AI papers on arXiv, with daily and weekly summary shows. I use it to stay up to speed on AI research, playing in the background while I go about my day.

The app is made up of a bunch of asynchronous tasks running in parallel: There's the paper downloader that checks arXiv for new papers and downloads them, the PDF-to-text and PDF-to-images converters, a couple different summarizers for the different shows, an embeddings database, prompting system to write the shows, an LLM server, a streaming text-to-speech system with multiple voices, a paper page image video stream, a recording and mp3 encoding system, and OBS to wrap it all into a live video stream that can be sent to Twitch.

It's been pretty solid, running for days in a row without issues, aside from the tweaking and nudging and other dev work. Ran it for a couple of weeks, summarized and discussed a couple thousand papers.

2021-11-24

Azure Files with a high number of Hot Writes

My Azure Files bill for a small 80 GB project share was kinda high. Around $30 per month. But the storage cost should be under $5. What's going on in here?

Azure's non-Premium storage tiers bill you for transactions (basically filesystem syscalls.) I was seeing 450 Create calls per a 5 minute period in Monitoring/Insights. On the Hot storage tier, that amounts to an extra $25 per month. But the load went away during night time. Around the time when I put my laptop to sleep.

Clearly the laptop was doing something. But what... Right. I had an open VSCode window for editing a project in a folder on the fileshare. Running a serve in a terminal too. Closed the VSCode window and the transactions per 5 minutes went to 0. That's a $25/month VSCode window. I guess it counts as SaaS.

Moral of the story? Billing for syscalls is a good business model.

Blog Archive