[ot][spam][crazy][data] transformer model 'attention' improvement
k
gmkarl at gmail.com
Sat Jan 22 01:40:10 PST 2022
- their example gpu code is based around an attention() function on
line 42 that takes the query, key, and value as function parameters,
as well as a chunk size.
- this engages the concept of 'heads'. i _ think_ a 'head' is
basically a chunk of the input data, already, not sure.
- their attention() function breaks the query into chunks of the
passed size, each chunk associated with all values and all keys, and
passes each one to _query_chunk_attention() ...
More information about the cypherpunks
mailing list