[spam][crazy][fiction][random] Non-Canon MCBoss Spinoffs

Sun Dec 17 03:43:43 PST 2023

the written text adventure toolkit

welcome to wtat

where are you?
>

> huh? what do you mean?

what is the starting room?
>

> oh um let's call it room 1

you are in room 1.
what is here? are there a[ny exits
2225

one could say one of the dawnings of this era would be that aidungeon was
closed source and had absolutely no room graph

i wonder what more there is now
—
2228
i finally came up with a way to do transformers peer to peer recently but i
don’t remember it. maybe i can come up with another.

uhhhh ummmmm
ideas:
- squishing to one layer would increase parallelism
- [oops brain issue

wasn’t expecting brain issue here ! surprised. um.
usually i try to continue and it watches me, placing triggers to make it
hard to both continue and repeat. as i keep trying more avenues this gets
more thorough.

right now i’m on an old ipad. running a pretrained model would be slow,
large, and power hungry

one idea could be squishing a model to one layer, having many peers perform
operations in parallel, and combine them. i think it is well-known that
that doesn’t work here.

transformer layers have a few combination points where all data is summed
or such, i’ve noticed from looking at them. i suspect some of these are
needed less than other for inference, don’t really know.

maybe i can look at one and wonder about it more [some complaint maybe
relates edited turn of phrase]
2234

i’m wondering if i could make progress on guessing how transformers work
enough to consider symbolically swapping depth for width.
2234

i’m looking at hf llama source (2237) the device is functioning poorly and
it is difficult to do (2238)

looks like a llama layer is:
x += attention(rmsnorm1(x))
x += mlp(rmsnorm2(x))

so one maybe could think of x as a sum of 3 values: its initial value, its
self attention calculation, and its mlp calculation
2243

rmsnorm: mat * (x /  sqrt(mean(x^2)))
# i think it scales the data to have a stddev of 1, and then applies a
constant linear transformation (mat) specific to the instance of the call

mlp: down_mat(act_fn(gate_mat)) * up_mat(x)
# gate_mat and up_mat perform dimension stretching via linear transforms
whereas down_mat undoes the stretching via another. that is, they are
rectangular matrices where down_mat has swapped dimensions

so, act_fn here applies a nonlinearity, ie something threshold based, on a
set of properties or metrics that are all linear combinations of x after
attention, and the linearly recombines them together with x to create a
value to sum into it.

i’m wondering if one might consider this a vector of conditionals of simple
arithmetic functions of x, which then add another simple arithmetic
function into x  for their instances that evaluate true. i’m thinking of
the act_fn relu which i think looks like x = y > 0 ? y : 0 not sure. it
might use a different act_fn.

225
:s some of us are holding things next to knowledge that the current pursuit
could be solved in public research already with reasonable likelihood.
there’s interest in heading off for the night, here we go
2259

—-
0643
that was so much fun the transformer poking!
0643
we’re scared of “multinational cartels”
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 4902 bytes
Desc: not available
URL: <https://lists.cpunks.org/pipermail/cypherpunks/attachments/20231217/72e470a2/attachment.txt>