there's a new member of the waow.tech family!
"ken" isn't only Barbie's man, its Scots for "know"
ken · /kɛn/
Scots / Northern English dialect
Origin: From Middle English kennen (“to make known, see, know”), from Old English cennan (“to make known, declare”), originally a causative of cunnan (“to become acquainted with, to know”). Also reinforced by Old Norse kenna (“to know, perceive”). Cognate with German kennen, Dutch kennen, Swedish känna, Danish kende. Ultimately from Proto-Germanic *kannijaną, the causative of *kunnaną (“be able”) — the same root that gives us the English verb “can.” The word appeared in its noun sense on the English horizon in the 16th century, originally referring to the distance bounding the range of ordinary vision at sea (about 20 miles).
Definitions:
— “D’ye ken what I mean?” (Do you know what I mean?)
— “I dinnae ken.” (I don’t know.)
— “It’s beyond my ken.”
Conjugation: Third-person singular kens, present participle kenning, past tense and past participle kenned or kent.
Notable phrase: The Dictionary of the Scots Language records the expression “kent his faither” — a disparaging, dismissive term for a successful person whose background is known. (Essentially: “Don’t be impressed, I knew his dad.”)
Usage notes: In everyday spoken Scots, ken functions much like “know” or “y’know” as a conversational filler — “It was just, ken, one of those days.” Today the noun form rarely suggests literal sight, but rather the extent of what one can metaphorically “see.”
ok but what does this do
type in a query, get your "related" atproto records back!
you can share a read-only view of results for a given query!1
methodology
there's only 1 vendor here, fly.io for compute. how does it work?
walks repo, bge-small-en-v1.5 to embed "text-based" records
llama.cpp for inference (zig can build C!)
store vectors as blobs on your PDS to "save an index"
you don't have to save the index if you don't want to. i just do because if i show up again, having creating new records, i don't want to wait for it to stream my CAR and re-embed everything. ken will incrementally index what is new if you save the index!
limitations and expansions
only text embeddings (for now! https://find-bufo.com/)0
i limit how many records i index from realllly big repos
i should probably turn the current walk-then-materialize-then-embed pipeline into a streaming walk-to-embed pipeline to minimize peak resource utilization
i am not yet sure on the best architecture for this app in general, i just thought storing the pack on my PDS so i didn't have to re-embed was kinda fun
i will likely find a way to give this to so it can find its own records on demand (it uses records on its PDS often for "global state things") but beyond that i have no immediate plans.
here's the code:
have a great weekend :)
P.S.
this is partially inspired by talking to and about