[ot][spam][crazy][data] transformer model 'attention' improvement

Undiscussed Horrific Abuse, One Victim & Survivor of Many gmkarl at gmail.com
Wed Feb 2 02:54:25 PST 2022

Previous message (by thread): [ot][spam][crazy][data] transformer model 'attention' improvement
Next message (by thread): [ot][spam][crazy][data] transformer model 'attention' improvement
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

- gptj uses a pregenerated constant causal mask that is O(n^2). since
it is simply a constant function of sequence index it could be made
via a callback or inside a loop.

Previous message (by thread): [ot][spam][crazy][data] transformer model 'attention' improvement
Next message (by thread): [ot][spam][crazy][data] transformer model 'attention' improvement
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the cypherpunks mailing list