Jan 8 2026

Alex Shapiro

Jan 8 2026

Things I Tried Today

Worked

Shared initial layers & and shared loss in an actor critic model
Additional simulation head in the AC net (big improvement in training speed in ablation study)
Resetting weights after catastrophic forgetting
Parquet for storing traces (20x more efficient than pickle)
Tracking weight magnitude CDF (useful for debugging)
Tracking overall + component losses over time
Normalizing value net loss
Using weight decay to reduce maximum weight magnitudes

Did Not Work

Sine Layer: replace a fully connected linear layer with a fully connected sine layer, where each output is defined by sum[wi_a * sin(xi * wi_b)]
Absolute Value Layer: replace a fully connected linear layer with a fully connected abs layer, where each output is defined by sum[wi * abs(xi)]

Meh

Tracking individual parameter salience via integrated gradients. It wasn't very actionable. Salience may be more useful when tracked at a higher level, e.g. tracking the % of neurons in a layer that are salient. That might give some insight into the need for larger or smaller layers.