[crazy][hobby][spam] Automated Reverse Engineering

Sun Jan 16 04:39:33 PST 2022

[after a number of psychotic breaks] the training loop runs now.  it's
likely not running very effectively.  for the notebook to run right
now, an uncommitted change is needed:

         # compute loss
         loss = optax.softmax_cross_entropy(logits,
flax.training.common_utils.onehot(labels, logits.shape[-1]))
-        padding_mask = decoder_attention_mask
+        padding_mask = batch['decoder_attention_mask']
         loss = (loss * padding_mask).sum() / padding_mask.sum()