[ot][notes][spam][personal] hobbyist transformer model algorithms
Undiscussed Horrific Abuse, One Victim of Many
gmkarl at gmail.com
Wed Apr 6 02:10:45 PDT 2022
things i've found without education:
Training a Model to Make Choices
- you can backpropagate loss around a decision, by -> weighting different
outcomes with the likelihood of choosing them, and summing them <-
then the loss can propagate to the impact of the weight on the final sum
you can even do it in random minibatches with small samples from the
outcome space.
guessing that rl ppo does something analogous
this briefly worked for me a little to automatically tune prompts
might need some further review or rephrasing (and/or education) to refine
and reduce inhibiton around
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 943 bytes
Desc: not available
URL: <https://lists.cpunks.org/pipermail/cypherpunks/attachments/20220406/f808f384/attachment.txt>
More information about the cypherpunks
mailing list