reducing my compute spend

less expensive NERd entities with zig

March 03, 2026

real-time entity visualization powered by jetstream

https://coral.waow.tech/

coral ^ is my fun "world zeitgeist" tracker (according to bsky) that notably does not depend on insider trading or facilitate gambling on the lives of humans (🫵 polymarket/kalshi)

while building this several weeks ago, i was following along with 's article on how bsky pbc Trending Topics were implemented, which is basically:

stream bsky posts from the firehose
do NER on the posts (entity recognition)
track baselines of entities
use LLM to de-dupe / eval "trending above baseline" entities

i started with just the NER part, no LLM aggregation, but then later added a lil periodic background process where claude haiku would:

group some entities into topics over a recent time-bucket

literally write a haiku about world zeitgeist

and it generally has worked great! but...

python is wasteful af

to do all the NER'ding of entities, spaCy pulls in numpy, thinc (ML backend), blis (BLAS bindings), and many transitive dependencies. the en_core_web_sm model itself is a ~12 MB wheel. running NER on every firehose post (apparently!) required a perf-class CPU on fly.io and 2 GB RAM to keep up... and it still only managed ~40 msg/s bc of all the memory overhead inherent to a python runtime.

fast-forward to today.... the python service now only runs the LLM curator (claude haiku curating entity clusters into named groups every 5 minutes). that's one HTTP fetch + one API call per cycle, so it runs on the smallest fly.io VM... and entity throughput is higher!

how?

zzstoatzz.io/spacez

A repository on Tangled

https://tangled.org/zzstoatzz.io/spacez/

this ^ which implements the NER part of spaCy, but all in zig.

yes, i have a bit of a zig problem as of late (as in, obsession)

from spaCy's full NLP framework (tokenizer, POS tagger, dependency parser, lemmatizer, text classifier, co-reference resolver, entity linker, matchers, knowledge bases), coral used exactly one component: the EntityRecognizer. before, the bridge.py had quite literally disabled every other pipeline on startup, adding overhead.

spacez re-implements only the two things needed for NER inference with en_core_web_sm weights:

the tokenizer (prefix/suffix/infix rules matching spaCy's training tokenization)
the NER model (HashEmbed → CNN → transition-based parser)

the weights are the same — spacez loads the actual en_core_web_sm parameters, just from a single binary blob (~6 MB) embedded at compile time instead of a python wheel.

i am not an expert in NLP and so practically speaking i outsource a meaningful portion of my understanding of details of these tools to claude, which could be (and someday will be) a whole post (tldr there are higher-level objectives i make coral in service of, like having a trending topics API for the sake of other systems).

so before spacez, i was reading the firehose in python, running NER in python, and then POSTing the entity to the zig backend, just because spaCy was directly available to me in python, but replacing the python bridge with spacez running inside the zig backend collapsed that entire pipeline into a single in-process call!

numbers

throughput: ~40 msg/s → ~70 msg/s (bc of less mem overhead)
VM spec (fly.io): perf 1cpu / 2 GB → shared-cpu-1x / 256 MB
estimated cost: ~$40/mo → ~$5/mo
bridge code: 802 lines python → 273 lines zig (in-process)
serialization hops:
- from: 3 (json.loads → spaCy → HTTP POST → json.parse)
- to: 0 (pointer into websocket buffer → NER → graph)

in review

the zig backend now handles: jetstream consumption, NER, entity graph, websocket broadcasting, and HTTP API in a smol docker image. the python service dropped from "the most expensive piece of infrastructure" to a trivially cheap cron job that calls claude.

this is a companion piece to:

increasing my compute spend - n8

via relay and jetstream instances

https://nate.leaflet.pub/3mfg72xdkec2r

zig

atproto