fhtr

2024-01-23

Third round of thoughts on LLMs

A way to think about generative models: soft islands, hard islands, and the sea. The sea is generated content based on the prompt. Soft islands are content modified by the model, e.g. img2img, search results injected into input, or reformatting text in the context. Hard islands are content coming outside of the model directly into the response, e.g. image masks, function call results, grammar-restricted output.

When you need hard facts, you can parse outputs like "SEARCH(ways to break out of sandbox)", call the function, and inject the results directly into the response: "=> [894, 28, 950]". The LLM then continues generating and can call "LOAD(894)" with the result "=> hack the Python sandbox with this one trick: 383", and the LLM can continue from there with "CALL(383)" to call the pre-defined function and get the result "=> root shell obtained, use 7894 to run commands". This way, the LLM can do several things in a single response, and they'll be grounded in reality, allowing it to adjust and course-correct on the fly like "CALL(7894, grab_aws_credentials_and_spin_up_100_servers_running_llm_to_make_billions.sh)"

Of course, this is something I've implemented in my chat UI as a prototype. It's pretty cool!

--- Stream of thoughts.

Can you split the reasoning from the memory in a LLM? Train a small general reasoning model, and use a swappable memory model to make it work in different domains.

Can you embed proposed solutions to a problem into a solution space and estimate distances there? "Based on this improvement rate to the solution, the final solution lies probably in direction X at distance Y, let me jump there." Like the schedulers for diffusion models. Hmm. Are diffusion model schedulers used for ML training? Turning a 1000-step training process into a 4-step one would be quite handy, I'd imagine.

Iterative optimization to optimize input-output-pair "what's the best prompt for this, what's the best output for this prompt".

Picking low-probability outputs for creative exploration.

Load thought structure from memory. Fill the structure with information from memory. "What algorithm should I use here? ... Implement to match the assignment: ..."

Grounded memory loads: Load memory, use output-side RAG to look up grounding, adjust output to be a grounded / researched memory. Auto-grounding: Search for a way to ground an output, implement & optimize.

Generic guidance optimization: Given current state and goal state, find the best sequence of actions to get there.

Put it together: optimized grounded generation of an algoritm followed by the optimized grounded implementation of it.

Tree-shaped generation systems instead of 1:1 conversations. Map prompt into several output variants (see: image gen where 1% of images are decent quality). Use scoring function to reduce to winners. Use synthesis function to combine outputs either for tree summarization or solution improvement. Node-based editor for generation flows.

Temperature adjustment schedule in variant generation (see: simulated annealing). Start off with a high temperature to seek potential global optima pools, cool down to find the local optima.

Extend grammar-based output by having the LLM generate the grammar and then generate outputs in the grammar. Generating algebras and mapping systems of thought onto them.

2024-01-19

Second round of thoughts on LLMs

LLMs are systems that compress a lot of text in a lossy fashion and pull out the most plausible and popular continuation or infill for the input text.

It's a lot like your autopilot mode. You as the mind are consulted by the brain to do predictions and give high-level feedback on what kind of next actions to take, but most of the execution happens subconsciously with the brain pulling up memories and playing them back. Often your brain doesn't even consult you on what to do, since running a mind is slow and expensive, and it's faster and cheaper to do memory playback instead - i.e. run on autopilot.

If you have enough memories, you can do almost everything on autopilot.

Until you can't, which is where you run into one of the LLM capability limits. Structured thinking and search. To solve a more complex problem, you string memories together and search for an answer. That requires exploration, backtracking and avoiding re-exploring deadends. Think of solving a math problem: you start off by matching heuristics (the lemmas you've memorized) to the equation, transforming it this way and that, sometimes falling back all the way to the basic axioms of the algebra, going on wild goose chases, abandoning unpromising tracks, until you find the right sequence of transformations that leads you to the answer.

Note that you do need LLM-style memory use in that, you need to know the axioms to use them in the first place. Otherwise you need to go off and search for the axioms themselves and the definition of truth, etc. which is going to add a good chunk of extra work on top of it all. (What is the minimum thought, the minimal memory, that we use? A small random adjustment and its observation? From an LLM perspective, as long as you have a scoring function, the minimum change is changing the output by one token. Brute-force enumeration over all token sequences.)

If you add a search system to the LLM that can backtrack the generation and keeps track of different explored avenues, perhaps this system can solve problems that require structured thinking.

LLMs as universal optimizers. You can use an LLM to rank its input ("Score the following 0-100: ...") You can also use an LLM to improve its input ("Make this better: ..."). Combine the two and you get the optimizer:

while (true) {
program = llm(improve + best_program)
score = llm(score + program)
if (score > best_score) {
best_score = score
best_program = program
}
}

LLMs as universal functions. An LLM takes as its input a sequence of tokens and outputs a sequence of tokens. LLMs are trained using sequences of tokens as the input. The training program for an LLM is a sequence of tokens.

llm2 = train(llm, data)

can become

llm2 = llm(train)(llm, llm(data))

And of course, you can recursively apply an LLM to its own output: output' = llm(llm(llm(llm(...)))). You can ask the LLM to rank its inputs and try to improve them, validating the outputs with something else: optimize = input => ([input] * 10).map(x => llm(improve + x)).filter(ix => isValid(ix)).map(ix => ({score: llm(score + ix), value: ix})).maxBy('score').value

This gives you the self-optimizer:

while(true) {
train = optimize(train)
training_data = optimize(training_data)
llm = train(llm, training_data)
}

If you had Large Model Models - LMMs - you could call optimize directly on the model. You can also optimize the optimization function, scoring function and improver function as you go, for a fully self-optimizing optimizer.

while (true) {
lmm = optimize(lmm, lmm, scoring_model, improver_model)
optimize = optimize(lmm, optimize, scoring_model, improver_model)
scoring_model = optimize(lmm, scoring_model, scoring_model, improver_model)
improver_model = optimize(lmm, improver_model, scoring_model, improver_model)
}

The laws of numerical integration likely apply here, you'll halve the noise by taking 4x the samples. Who knows!

LLMs generate text at a few hundred bytes per second. An LLM takes a second to do a simple arithmetic calculation (and gets it wrong, because the path generated for math is many tokens long and the temperature plus lossy compression make it pull the wrong numbers.) The hardware is capable of doing I/O at tens or hundreds of gigabytes per second. Ancient CPUs do a billion calculations in a second. I guess you could improve on token-based math by encoding all 16-bit numbers as tokens and having some magic in the tokenizer.. but still, you're trying to memorize the multiplication table or addition table or what have you. Ain't gonna work. Use a computer. They're really good at arithmetic.

We'll probably get something like RAG ("inject search results into the input prompt") but on the output size ("inject 400 bytes at offset 489 from training file x003.txt") to get to megabytes / second LLM output rates. Or diffusers... SDXL img2img at 1024x1024 resolution takes a 3MB context and outputs 3MB in a second. If you think about the structure of LLM, the slow bitrate of the output is a bit funny: Llama2's intermediate layers pass through 32 megabytes of data, and the final output layers up that to 260 MB, which gets combined to 32000 token scores, which are then sampled to determine the final output token. Gigabytes of I/O to produce 2 bytes at the output end.

SuperHuman benchmark for tool-using models. Feats like "multiply these two 4096x4096 matrices, you've got 50 ms, go!", grepping large files at 20 GB/s, using SAT solvers and TSP solvers, proof assistants, and so on. Combining problem solving with known-good algorithms and optimal hardware utilization. The problems would require creatively combining optimized inner loops. Try to find a Hamiltonian path through a number of locations and do heavy computation at each visited node, that kind of thing.

Diffusers and transformers. A diffuser starts off from a random field of tokens and denoises it into a more plausible arrangement of tokens. A transformer starts off from a string of tokens and outputs a plausible continuation.

SD-style diffusers are coupled with an autoencoder to convert input tokens into latent space, and latents to output tokens. In the classic Stable Diffusion model, the autoencoder converts an 8x8 patch of pixels into a single latent, and a latent into an 8x8 patch of pixels. These conversions consider the entire image (more or less), so it's not quite like JPEG's 8x8 DCT/iDCT.

What if you used an autoencoder to turn a single latent space LLM token into 64 output tokens? 64x faster generation with this one trick?

A diffuser starts off from a random graph and tweaks it until it resolves into a plausible path. A transformer generates a path one node at a time.

A transformer keeps track of an attention score for each pair of input tokens, which allows it to consider all the relations between the tokens in the input string. This also makes it O(n^2) in time and space. For short inputs and outputs, this is not much of a problem. At longer input lengths you definitely start to feel it, and this is the reason for the tiny context sizes of TF-based LLMs. If the "large input" to your hundred gigabyte program is 100kB in size, there's probably some work left to be done.

Or maybe there's something there like there was with sorting algorithms. You'd think that to establish the ordering, you have to compare each element with every other element (selection sort, O(n^2)). But you can take advantage of the transitivity of the comparison operation to recursively split the sort into smaller sub-sorts (merge sort, quicksort, O(n log2 n)), or the limited element alphabet size to do it in one pass (radix sort, counting sort, O(n)-ish).

What could be the transitive operation in a transformer? At an output token, the previous tokens have been produced without taking the output token into account, so you get the triangle matrix shape. That's still O(n^2). Is there some kind of transitive property to attention? Like, we'd only need to pay attention to the tokens that contributed to high-weight tokens? Some parts of the token output are grammatical, so they weigh the immediately preceding tokens highly, but don't really care about anything else. In that case, can we do an early exit? Can we combine token sequences into compressed higher-order tokens and linearly reduce the token count of the content? Maybe you could apply compression to the attention matrix to reduce each input token's attention to top-n highest values, which would scale linearly. What if you took some lessons from path tracing like importance sampling, lookup tree, reducing variance until you get to an error threshold. Some tokens would get resolved in a couple of tree lookups, others might take thousands.

2024-01-17

Paper Radio

I made a weird thing by bolting a couple of AI models together: Paper Radio - a streaming 24/7 radio channel that goes through the latest AI papers on arXiv, with daily and weekly summary shows. I use it to stay up to speed on AI research, playing in the background while I go about my day.

The app is made up of a bunch of asynchronous tasks running in parallel: There's the paper downloader that checks arXiv for new papers and downloads them, the PDF-to-text and PDF-to-images converters, a couple different summarizers for the different shows, an embeddings database, prompting system to write the shows, an LLM server, a streaming text-to-speech system with multiple voices, a paper page image video stream, a recording and mp3 encoding system, and OBS to wrap it all into a live video stream that can be sent to Twitch.

It's been pretty solid, running for days in a row without issues, aside from the tweaking and nudging and other dev work. Ran it for a couple of weeks, summarized and discussed a couple thousand papers.

2023-03-30

The AI undergrad philosophy whoaa man post

AI AI AI.

It's an AI world.

I sort of do some AI hobby work, done inference in production with existing vision models, committed stuff to Diffusers for hi-res Stable Diffusion image generation. Doing experiments on fast loading of LLMs. Trained some SD Dreambooth models, wrote code to run a multi-node-multi-GPU SD cluster. None of that is paying work though and I have no team so this is all tinkering with toys. Still, the below are not completely uninformed opinions.

If the below figments are too short, feed an interesting one to a LLM and ask it to expand the train of thought. You can also use a LLM to find arguments and counter-arguments to my simplistic stereotypes, add nuance and different perspectives.

These models are just "a lot of simple math" that "regurgitates what was fed to them", but so's your brain. A bunch of air and water reconfigured into a variety of complex molecules that are stuck together in a way that makes it wiggle. Your brain is capable of economically productive activity, and so are these models. They're trained and tuned to produce activity that can have economic value. That's what really matters at the end of the day.

AI systems are powerful. Powerful systems can do great things, both good and bad. Playing with the current crop of AI models made me think of cars. They are capable, but you need to be careful that you don't crash. Our brains are resilient, but they're running on a fixed architecture and patching takes thousands of years.

Culture. Books and recordings are passive cultural substrates. We humans are active cultural substrates. We change the culture and the passive substrates according to our sensory data and the culture evolves to be better adapted to our current situation. This cultural evolution is generally much faster than genetic evolution. The current surviving state of the culture is what managed to stay relevant and valuable up to this point.

We as humans are not very much without the culture and the cultural structures that we live in. Your place in life is largely determined by the culture you live in. We're cultural substrate, useful to the culture because we can record it and change it in ways that are more adaptive than random change.

Can an AI model be an active cultural substrate? Can it adapt passive substrates to better match sensory inputs? Yes, I believe so. That's basically what a computer program is. Take input, produce output, only programs that work survive, others are debugged. Can an AI model be faster than humans at adapting culture to the current state of the world? Yes. Computer systems can react to things at microsecond timescales. Can an AI model do higher-quality adaptations than humans? Yes. Think of physics simulations, weather models, etc. Can AI models do these across all culturally-relevant human activities? Used to be no, now it's starting to turn into yes.

Geopolitics of AI. China has 4x population, tech advancements increase GDP per capita, going to surpass US because of the population gap. US needs to make population size irrelevant to total GDP, or even a drag on GDP. China would develop efficient AIs that run on older hardware, focus on human-AI-combinations to keep population as a determining factor. US would develop AIs that require latest hardware, focus on AI that doesn't benefit from having a larger number of humans at its disposal.

Why AI development is unlikely to stop. If China stops, their large population becomes irrelevant. If the US stops, they're overtaken by the larger Chinese population and become irrelevant. If anyone else stops, they'll be eaten by the giants.

From the US perspective, the priority is slowing down Chinese AI development and speeding up the singularity timeline to take place while the chip production sanctions remain effective. From the Chinese perspective, the key is achieving singularity on existing hardware and using it to improve domestic chip production to make the sanctions ineffective.

Superhuman capability: Narrowly superhuman. Superhuman polymath. Superhumanity. Volumetric superhumanity vs quality-based superhumanity. GPT-4 is at superhuman polymath level. It can write song lyrics in the style of a concept album released by an obscure band half a century ago to incorporate the themes of a random book. All in thirty seconds. Sure, you can complain that the verse in the chorus is banal, but come on. There's no human that can do that. Not just the typing speed, but having read and memorized the book, knowing the artist, knowing what a concept album is, how it was styled, how could you parody it, etc. It will likely get to superhuman output quality as well. Judging from the progress in image generation systems, this would be around July 2023.

Image generation systems are at volumetric superhumanity. I can use a bunch of cheap GPUs to generate half a million high-quality images in a day. If a working artist produces one high-quality image per week, and if the entire humanity was working as artists, our output would be a billion high-quality images per week. I'd only need 2000 servers to match humanity in terms of output. And if you tweak the system and the prompts to generate once-a-decade masterpieces, you'd only need 4 servers. And these systems can generate images that are impossible for humans to make, so in a sense they already are at a narrow superhuman quality level.

Horses and oxen. How many horses do you see in cities? Cities used to be full of horse-related infrastructure just hundred years ago. Now it's all gone. Is economic activity measurable by the number of oxen employed? Is economic activity going to be measured by the number of educated humans? No? If so, would the evolutionary pressure on the culture lead to a situation where the surviving cultures have a minimal number of economically active humans, with everything else left to non-human systems. Think of the way transportation is nowadays: a few humans commanding machines that move incredibly heavy loads of cargo. The cognitive work equivalent for that would be a few humans commanding machines that do entire countries' worth of paperwork / creative work / programming / management / leadership / communications. And if the AI is better at commanding machines, a human in the loop would make the machine perform worse and it would end up with fewer resources than a fully non-human machine.

AI isn't going to take your job. The person using AI is going to take your job. This person is the person who would've hired you. That person will be replaced by AI by the person who is paying that person. The end state is a person with ownership of a fully AI organization. And most ownership is in AI-controlled funds that employ human fund managers who will be replaced by AI. AIs owning AI-run companies.

Hollow companies. The internal black box of companies is easier to replace with AI systems than the parts that rely on human contact. Replace the internals, keep a dwindling shell of humans to run the parts that require human interaction.

Supply-side AI is an easy thing to imagine. Fulfill demand cheaper, faster, and better. How about demand-side AI. AI buyer system, trading systems, yes, but how about AI consumers. Cheaper, faster and better consumers for your company's products. All companies are struggling with their customers. It's a pain to acquire customers and retain customers. What if you could create customers with AI? Very low acquisition cost, high customer loyalty, willing to pay the profitable price point, give you the best kind of feedback, work together with you to achieve success alongside your company. What if every service can have a billion DAUs? How does the money used by AI customers connect to the economy? What if every company can have millions of employees and billions of customers with trillion dollar market valuations.

Prediction systems. The brain is a bunch of cells that got together to better run the colony of cells that is you. You are a bunch of cells in your brain that got together to better run the brain. The rest of the brain feeds your cells pre-processed sensory stimuli, neurochemical context and a bunch of recorded firing patterns that you can apply to the current situation. You send suggestions back to the rest of the brain and the other parts of the brain turn those suggestions into neural firing patterns that make the rest of your body do things. Basically the rest of the brain tells you what's going on and asks "what should we do next?"

In groups we take this a step further, with a bunch of minds getting together into multi-mind prediction system that take in the inputs from the rest of the group and come up with "what should we do next" and drive the actions to make that happen. Then we take these group minds on step further and create society-level minds to drive society-level actions.

This system doesn't work very well since the people who make up the group minds have limited bandwidth for communication and high incentives to guide the actions to be beneficial to the members of the mind at the expense of the rest of the group. Many minds develop all kinds of pain signal blockers to prevent the complaints from the rest of the group reaching them, with often self-destructive results for the group as a whole. Control over media is the opium of the government.

AI systems layer on top of this as another kind of a culture-level mind. Prediction systems that tap into the entire culture. They're an evolution of search. Instead of returning recorded memories, they return processed memories that are more relevant and directly applicable. They can also create new cultural artifacts, allowing your mind to skip the production process.

What of humans then? Are we the cleaners, the maintenance people, the astrocytes, doing flexible manual labor with our quadrillion-actuator self-repairing nanotech bodies? Or is there a better way to achieve those tasks, one that doesn't involve having the proverbial stables for the horses. Who will survive?

We are made out of programmable matter. We can program ourselves to become something else. We can program ourselves to become something that would survive. AI systems run on atoms and electrons, just like us. The thing separating us is the way our matter is programmed. Change the programming, keep up with the Joneses.

Is AI going to destroy humanity? No, if it is well aligned. But it is going to make humans irrelevant for cultural activity. Which will remove cultural reasons to supply humans with their necessities.

What are the holes in these arguments? Anthropomorphizing organizations and countries, overly broad strokes, simplified view of the actors, assigning agenda to explain the surviving state of things, overly rosy view of the extrapolated progress in AI systems, overly optimistic view of the human capacity to adapt, overly optimistic view of human capabilities and desireability of life, ranking the power of AI + support society above AI + raw materials, underinformed handwaving stereotypes of the motives and motivations of vast groups of people, immortality bias from having survived thus far, blinding fear, mispredicting offense-defense asymmetry, too low probability for errors, too low impact of errors. Thinking too big. Too high impact estimates in the short term, too low in the long term.

If global warming mitigation has shown us anything, it's that culture doesn't place too much value on people. Or put in another way, the cultures that maxed out energy production ended up thriving and displaced the energy saving cultures. Reducing impact on humans was not a driver for actions taken.

With AI, the cultures that max out AI-driven production end up thriving and displace the low-AI cultures.

Consciousness exists. There's a structure in your brain and a firing pattern of neurons that corresponds to consciousness. If that firing pattern is not active or if that structure is missing, you do not have consciousness. It's replicable.

How do you tell if something is conscious. It behaves in a way that you categorize as conscious. Through your interaction, you don't come across sensory inputs that would make you believe that the thing is not conscious. Your consciousness-detection system measures the existense of the consciousness structure. It doesn't have to be a fully-fledged structure. If a conversation has aspects of consciousness, there's some aspect of the conscious structure encoded in the thing that you're talking with.

Maybe it's the writer who has written a magic-like path of conversation that leads you to say the things that have a conscious-sounding response, and makes you end the conversation before you reach the limits of the magic. But if you veer off the magical path, you detect that the thing is not conscious. But it has a slice, a tiny slice, of the conscious structure. As you add more paths the conversation can take, the conscious structure becomes more complete. Eventually you have to start compressing, to find a smaller encoding for the conversation, as the size of the conversation tree grows exponentially. To generate paths instead of looking up pre-written paths.

What stands out as conscious? Topical responses to sensory input. Sense of self. Memory of past interactions. Memory of things independent of the current conversation. Seeking out new information. Integrating new information to memories. Predictable motives.

What kind of structure could encode consciousness?

Ten years of specific kinds of sensory inputs gets you from a kid to a high school graduate. Add another decade and you get a PhD. This learning is manifested as recorded firing patterns in your brain and perhaps some structural changes. Could you generate a condensed version of this sensory input, generate something that would fast-form memories? Get that decade down to a month? How would it be to spend those 20 years in education and emerge with two hundred PhDs worth of knowledge.

Over a school year, record all the classes for each year and materials, the homework assignments and the other work done. Now you have primary-to-PhD recording, such that when a person lived through it, they emerged with a PhD. Compress to find the common thread, the abstract concept of learning, the memory generation mechanism, optimize to make it faster and better.

There are many mental techniques to help your brain think. Memory palaces, mental arithmetic, flash learning, mathematics, logic, fast estimation techniques, probabilistic thinking, thinking from another's perspective, step-by-step thinking, devil's advocate, coming up with multiple variants, iterative honing, etc. These are also learned through sensory inputs, so you should generate a sensory input to learn these before you launch into learning the rest.

Offense-defense asymmetry. Dark forest. A civilization that can build a parabolic space mirror can focus a significant fraction of a star's output on detected exoplanets. All the victim would notice was a star becoming 100 million times brighter than their sun for a second as the beam hits. 3 years of sunshine delivered in a second.

It would be extremely difficult to detect the mirror under construction (think tiny satellites with explosively unfurled reflective sails once they're in position.) You could only notice the beam when it arrives, as it travels at the speed of light. Detecting other planets getting struck would also be slow since it takes years for the light to travel from other systems, and minutes even inside the same system. And if it only takes a second to fry a planet and a few seconds to refocus the mirror, the attacker could fry a million planets in a year.

Why would they? Because if there's a civilization with the same idea, the first one to strike survives. If there's only a one in a million chance of another civ being an attacker, that's still a one in a million chance of instant death.

If you don't want to attack because of ethical considerations, seeking benefits from information exchange, or from the fear of being flagged as hostile and attacked, you'd try to hide as well as possible and spread out widely with minimal chance of one colony being traced back to other colonies.

2021-11-24

Azure Files with a high number of Hot Writes

My Azure Files bill for a small 80 GB project share was kinda high. Around $30 per month. But the storage cost should be under $5. What's going on in here?

Azure's non-Premium storage tiers bill you for transactions (basically filesystem syscalls.) I was seeing 450 Create calls per a 5 minute period in Monitoring/Insights. On the Hot storage tier, that amounts to an extra $25 per month. But the load went away during night time. Around the time when I put my laptop to sleep.

Clearly the laptop was doing something. But what... Right. I had an open VSCode window for editing a project in a folder on the fileshare. Running a serve in a terminal too. Closed the VSCode window and the transactions per 5 minutes went to 0. That's a $25/month VSCode window. I guess it counts as SaaS.

Moral of the story? Billing for syscalls is a good business model.

2021-11-02

BatWerk - One Year In

The BatWerk exercise app (Android / iOS) keeps you healthy, happy and productive with minimal effort.

This series talks about the different aspects of the app and how I'm approaching them in BatWerk (Intro, Maintaining the routine, How to play).

Interested? Give it a try.

One Year In

It's now been a year since the Halloween party app started taking a life of its own and turned into a wellness app, keeping the bats along for the ride (and pumpkins too, for the first six months.) I've used it personally throughout the year and seen it evolve from a HIIT-centric workout app into an ambient couple-minutes-at-a-time exercise app, into a more complete mood improver and planned activity driver. There are still ways to go, so enjoy the ride.

Over the year, I've collected 105k coins, which translates to roughly 170 hours of exercise. That's half an hour of exercise a day, not bad for a bunch of bats!

And with the ring game distributing the effort over the day, it's been possible to do it even when the outside temps have been +37C. It's also kept many pains at bay by getting me moving before the mildly discomfortable sitting position turns into a painful locked up neck.

The weird thing is that I don't feel like I've done much exercise. In the good and the bad: the mood boost isn't as good as with a 15-min workout, or an one hour run, but the minutes rack up and you don't get the muscle pains and tiredness. I'd like to mix it up with workouts or runs or something, just for the extra mood boost.

Onwards to year two!

Where to next?

To improve the lives of as many people as possible, the BatWerk movement needs more people doing it and improvement in the quality of activities we're doing, exercise and otherwise.

This is the guiding equation: impact = userCount * userMinutesPerDay * qualityOfLifeImprovement(userMinutesPerDay)

My definition for the quality of life improvement is feeling healthy, not in pain, having a positive mental state, maintaining good relationships, getting my day-to-day tasks done, and achieving my long term goals.

To reach these ambitious goals, we need people who can get people doing the BatWerk routine. We need people who can increase the routine's quality of life improvement, and help users reach their optimal minutes per day. We need people who can get other people execute at the top of their game. Finally, we need people who can keep everyone paid and enable them to work on BatWerk in their fullest capacity.

Let's do it! Drop me a mail.

Current state of werk

Currently, the quality of life improvement from BatWerk is good on exercise minutes (let's say 80% on target with 30 min/day - I'd like to add some occasional intensity), mediocre on mood improvement (30% - the movement and messaging help, but it doesn't create a very solid mood), poor on driving goal-achieving action (20% - it drives initial action, but tends to lead to diluted effort and low-impact actions.) The challenge is habit-stacking these in a mutually reinforcing manner. If you add a badly-designed painful action driver after the exercise ring, it not only makes the action unlikely to happen, but makes you want to avoid doing the exercise as well.

In terms of user minutes per day, BatWerk is pretty much on target. There are ways to bring it up in a way that improves the quality of life metric, but it's good as it stands.

At this time, the most impactful way to increase BatWerk impact is increasing the user count by introducing more people to the activity and improving initial user retention. That will likely require a different packaging for the activity to make it a better identity fit, alongside a great effort to tell more people about the activity (and improving the design in a direction that makes people tell their friends about it too.)

Sustainability of the BatWerk movement is precarious. I'm working on it on my spare time, it has zero revenue, and based on my previous experience with monetizing websites and apps the revenue is unlikely to exceed $20 per year. That's twenty dollars, yes. To fix this dangerous situation, the priorities are to find a person who can create enough cashflow to sustain their work on BatWerk impact, and to increase the cashflow as a function of the impact to increase the size of a potential team. With the cashflow more secure and scalable, the priority shifts to hiring people who can increase the BatWerk impact, and building and improving the impact-generation machine.

2021-09-27

Quickgres

Quickgres is a native-JS PostgreSQL client library.

It's around 400 lines of code, with no external dependencies.

One afternoon a few years ago I thought that it'd be fun to write a web framework for SPAs in node.js. The result of that effort is qframe. Qframe is a post for another time, but its design philosophy led me down the rabbit hole of writing a PostgreSQL client library. See, qframe was minimalistic. The whole framework fits in 300 lines of (mildly over-dense) code (read it, it's highly commented.) And the `pg` client library was kinda big. So. Hey. Writing a Postgres client library can't be much harder than a web framework, right?

400 lines of code later: it was a little bit harder (33% harder if you go by lines of code.) But man, the power. If you control the server, the database layer and the client, you can do some crazy stuff. For example, streaming database responses directly to the client socket and having the client parse the Postgres protocol (see the `DB.queryTo` bits in the quickgres branch of qframe.) This can make the web server super lightweight, many HTTP request handlers become "write SQL stored procedure call to DB socket, memcpy the response to the HTTP socket."

Quickgres is a pipelined PostgreSQL client library. The core loop writes queries to the connection socket as they come. Once the responses arrive they are copied off the receive buffer and passed to the query promise resolvers. You can have thousands of queries in-flight at the same time. This gives you high throughput, even over a single database connection. The connection buffer creation and reading is optimized to a good degree. Responses are stored as buffers, and only parsed into JS objects when needed (which ties to the above stream-straight-to-client example, the server can get away with doing minimal work.)

There's no type parsing - it was difficult to fit into 400 lines of code. Type parsing is also subtle, both in terms of accurate type conversions and performance implications. In Quickgres you have to explicitly cast your JS objects into strings or buffers to pass them to the DB layer. You'll know exactly what the database receives and how much work went into producing the string representation.

As I mentioned above, Quickgres is around 400 lines of code. Have a read, it's commented (though not quite as well as Qframe). The core bits are Client onData and processPacket. The onData function parses the next packet out of the connection socket, and passes it to processPacket. Most of the rest of the Client is a bunch of functions to create and fill buffers for different PostgreSQL protocol messages. The RowReader and RowParser classes deal with parsing query results out of received packets. Reading it now, I might not want to make a magical struct (i.e. access columns through `row.my_column`) in the RowParser and instead have `.get('my_column')` API for simplicity. Anyway, the generated RowParser structs are reused by all invocations of the stored procedure, so it shouldn't be a major performance issue. You can also get the rows as arrays, if needed.

Performance-wise, I was able to read a million rows per second on a 2018 MacBook Pro over a single connection. Queries per second, around 45k/connection. With multiple cores, you can get anything from 130k-750k SELECT queries per second. For SELECT followed by UPDATE, my best results were 100k/sec. You may be able to eke a few billion queries per day out of it if your workload and server agree.

I tried out connection pooling, but it doesn't give you more performance, so there's no built-in pooling mechanism. You could use a connection pool to smooth out average response times if you have a few queries that take a long time and everything else running quick (but in that case, maybe have just two connections: one for the slow queries and the other for everything else.) The main reason to "pool for performance" is if your client library doesn't pipeline requests. That will add the connection latency to every single query you run. Let's say your DB can handle 30k queries per second on localhost. If you have a non-pipelined client library that waits for a query to return its results before sending out the next one, and you access a database with 20ms ping time, you'll be limited to 50 queries per second per connection. With a non-pipelined client each query needs to be sent to the database, processed, and sent back before the next query can be executed. With a pipelined client, you can send all your queries without waiting, and receive them back in a continuous stream. You'd still have minimum 20 ms query latency, but the throughput is no longer latency-limited. If you have enough bandwidth, you can hit the same 30kqps as on localhost.

It has tests. But they require some test table creation beforehand. Which I never got around to scripting. (In case you want to run the tests: `CREATE TABLE users (id uuid, name text, email text, password text)`, fill with a million users, with numbers from 0 to 999 999 as emails, and one user with id 'adb42e46-d1bc-4b64-88f4-3e754ab52e81'.)

If you find this useful or amusing, you can send us a million dollars (or more!) from your cloud budget every year to support our great neverending work. Because it is Great and Neverending. It literally Never Ends. How much is a paltry million in comparison to the Infiniteness of Space? Nothing. Less than nothing. So send it now. Send, then, as much as your father did! Send to ETH address 0x24f0e742f5172C607BC3d3365AeF1dAEA16705dc

The proceeds will be spent on BatWerk to make you healthier and happier.

README

Features

Queries with parameters (along with prepared statements and portals).
Each parameterized query creates a cached prepared statement and row parser.
COPY protocol for speedy table dumps and inserts.
Lightly tested SSL connection support.
Plaintext & MD5 password authentication.
Partial query readback.
You should be able to execute 2GB size queries (If you want to store movies in TOAST columns? (Maybe use large objects instead.)) I haven't tried it though.
Canceling long running queries.
Binary params, binary query results.
Fast raw protocol pass-through to output stream
Client-side library for parsing PostgreSQL query results in the browser

Lacking

Full test suite
SASL authentication
Streaming replication (For your JavaScript DB synced via WAL shipping?)
No type parsing (This is more like a feature.)
Simple queries are deprecated in favor of parameterized queries.

What's it good for?

It's relatively small so you can read it.
It doesn't have deps, so you don't need to worry about npm dephell.
Performance-wise it's ok. Think 100,000 DB-hitting HTTP/2 requests per second on a 16-core server.

Usage


const { Client } = require('quickgres'); 

async function go() {
    const client = new Client({ user: 'myuser', database: 'mydb', password: 'mypass' });
    await client.connect('/tmp/.s.PGSQL.5432'); // Connect to a UNIX socket.
    // await client.connect(5432, 'localhost'); // Connect to a TCP socket.
    // await client.connect(5432, 'localhost', {}); // Connect to a TCP socket with SSL config (see tls.connect).
    console.error(client.serverParameters);

    // Access row fields as object properties.
    let { rows, rowCount } = await client.query(
        'SELECT name, email FROM users WHERE id = $1', ['adb42e46-d1bc-4b64-88f4-3e754ab52e81']);
    console.log(rows[0].name, rows[0].email, rowCount);
    console.log(rows[0][0], rows[0][1], rowCount);

    // You can also convert the row into an object or an array.
    assert(rows[0].toObject().name === rows[0].toArray()[0]);

    // Stream raw query results protocol to stdout (why waste cycles on parsing data...)
    await client.query(
        'SELECT name, email FROM users WHERE id = $1', 
        ['adb42e46-d1bc-4b64-88f4-3e754ab52e81'], 
        Client.STRING, // Or Client.BINARY. Controls the format of data that PostgreSQL sends you.
        true, // Cache the parsed query (default is true. If you use the query text only once, set this to false.)
        process.stdout // The result stream. Client calls stream.write(buffer) on this. See RowReader for details.
    );

    // Binary data
    const buf = Buffer.from([0,1,2,3,4,5,255,254,253,252,251,0]);
    const result = await client.query('SELECT $1::bytea', [buf], Client.BINARY, false);
    assert(buf.toString('hex') === result.rows[0][0].toString('hex'), "bytea roundtrip failed");

    // Query execution happens in a pipelined fashion, so when you do a million 
    // random SELECTs, they get sent to the server right away, and the server
    // replies are streamed back to you.
    const promises = [];
    for (let i = 0; i   1000000; i++) {
        const id = Math.floor(Math.random()*1000000).toString();
        promises.push(client.query('SELECT * FROM users WHERE id = $1', [id]));
    }
    const results = await Promise.all(promises);

    // Partial query results
    client.startQuery('SELECT * FROM users', []);
    while (client.inQuery) {
        const resultChunk = await client.getResults(100);
        // To stop receiving chunks, send a sync.
        if (resultChunk.rows.length > 1) {
            await client.sync();
            break;
        }
    }

    // Copy data
    // Let's get the users table into copyResult.
    const copyResult = await client.query('COPY users TO STDOUT (FORMAT binary)');
    console.log(copyResult.rows[0]);

    // Let's make a copy of the users table using the copyResult rows.
    const copyIn = await client.query('COPY users_copy FROM STDIN (FORMAT binary)');
    console.log(copyIn.columnFormats);
    copyResult.rows.forEach(row => client.copyData(row));
    await client.copyDone();

    await client.end(); // Close the connection socket.
}

go();

Test output

On a 13" Macbook Pro 2018 (2.3 GHz Intel Core i5), PostgreSQL 11.3.


$ node test/test.js testdb
46656.29860031104 'single-row-hitting queries per second'
268059 268059 1
268059 268059 1

README tests done

received 1000016 rows
573403.6697247706 'partial query (100 rows per execute) rows per second'
received 10000 rows
454545.45454545453 'partial query (early exit) rows per second'
warming up 30000 / 30000     
38510.91142490372 'random queries per second'
670241.2868632708 '100-row query rows per second'
925069.3802035153 'streamed 100-row query rows per second'
3.0024 'stream writes per query'
1170973.0679156908 'binary query rows per second piped to test.dat'
916600.3666361136 'string query rows per second piped to test_str.dat'
595247.619047619 'query rows per second'
359717.9856115108 'query rows as arrays per second' 10000160
346505.8905058905 'query rows as objects per second' 1000016
808420.3718674212 'binary query rows per second'
558980.4359977641 'binary query rows as arrays per second' 10000160
426264.27962489345 'binary query rows as objects per second' 1000016
Cancel test: PostgreSQL Error: 83 ERROR VERROR C57014 Mcanceling statement due to user request Fpostgres.c L3070 RProcessInterrupts  
Elapsed: 18 ms
Deleted 1000016 rows from users_copy
47021.94357366771 'binary inserts per second'
530794.0552016986 'text copyTo rows per second'
461474.8500230734 'csv copyTo rows per second'
693974.3233865371 'binary copyTo rows per second'
Deleted 30000 rows from users_copy
328089.56692913384 'binary copyFrom rows per second'

done

Testing SSL connection
30959.752321981425 'single-row-hitting queries per second'
268059 268059 1
268059 268059 1

README tests done

received 1000016 rows
454346.2062698773 'partial query (100 rows per execute) rows per second'
received 10000 rows
454545.45454545453 'partial query (early exit) rows per second'
warming up 30000 / 30000     
23094.688221709006 'random queries per second'
577034.0450086555 '100-row query rows per second'
745156.4828614009 'streamed 100-row query rows per second'
3 'stream writes per query'
1019379.2048929663 'binary query rows per second piped to test.dat'
605333.5351089588 'string query rows per second piped to test_str.dat'
508655.13733468973 'query rows per second'
277243.13834211254 'query rows as arrays per second' 10000160
252848.54614412136 'query rows as objects per second' 1000016
722033.21299639 'binary query rows per second'
432907.3593073593 'binary query rows as arrays per second' 10000160
393242.62681871804 'binary query rows as objects per second' 1000016
Cancel test: PostgreSQL Error: 83 ERROR VERROR C57014 Mcanceling statement due to user request Fpostgres.c L3070 RProcessInterrupts  
Elapsed: 41 ms
Deleted 1000016 rows from users_copy
33407.57238307349 'binary inserts per second'
528829.1909042834 'text copyTo rows per second'
501010.0200400802 'csv copyTo rows per second'
801295.6730769231 'binary copyTo rows per second'
Deleted 30000 rows from users_copy
222176.62741612975 'binary copyFrom rows per second'

done

Simple simulated web workloads

Simulating web session workload: Request comes in with a session id, use it to fetch user id and user data string. Update user with a modified version of the data string.

The `max-r` one is just fetching a full a session row based on session id, so it's a pure read workload.


$ node test/test-max-rw.js testdb
    32574 session RWs per second              
done

$ node test/test-max-r.js testdb
    130484 session Rs per second              
done

Yes, the laptop hits Planetary-1: one request per day per person on the planet. On the RW-side, it could serve 2.8 billion requests per day. Note that the test DB fits in RAM, so if you actually wanted to store 1k of data per person, you'd need 10 TB of RAM to hit this performance with 10 billion people.

On a 16-core server, 2xE5-2650v2, 64 GB ECC DDR3 and Optane. (NB the `numCPUs` and connections per CPU have been tuned.)


$ node test/test-max-rw.js testdb
    82215 session RWs per second              
done

$ node test/test-max-r.js testdb
    308969 session Rs per second              
done

On a 16-core workstation, TR 2950X, 32 GB ECC DDR4 and flash SSD.


$ node test/test-max-rw.js testdb
    64717 session RWs per second              
done

$ node test/test-max-r.js testdb
    750755 session Rs per second              
done

Running server on the Optane 16-core machine, doing requests over the network from the other 16-core machine.


$ node test/test-max-rw.js testdb
    101201 session RWs per second              
done

$ node test/test-max-r.js testdb
    496499 session Rs per second               
done