[ot][spam][crazy][data] transformer model 'attention' improvement

Undiscussed Horrific Abuse, One Victim & Survivor of Many gmkarl at gmail.com
Wed Feb 2 02:54:25 PST 2022


- gptj uses a pregenerated constant causal mask that is O(n^2). since
it is simply a constant function of sequence index it could be made
via a callback or inside a loop.


More information about the cypherpunks mailing list