Alex Shapiro
Jan 8 2026
Things I Tried Today
Worked
- Shared initial layers & and shared loss in an actor critic model
- Additional simulation head in the AC net (big improvement in training speed in ablation study)
- Resetting weights after catastrophic forgetting
- Parquet for storing traces (20x more efficient than pickle)
- Tracking weight magnitude CDF (useful for debugging)
- Tracking overall + component losses over time
- Normalizing value net loss
- Using weight decay to reduce maximum weight magnitudes
Did Not Work
- Sine Layer: replace a fully connected linear layer with a fully connected sine layer, where each output is defined by
sum[wi_a * sin(xi * wi_b)]
- Absolute Value Layer: replace a fully connected linear layer with a fully connected abs layer, where each output is defined by
sum[wi * abs(xi)]
Meh
- Tracking individual parameter salience via integrated gradients. It wasn't very actionable. Salience may be more useful when tracked at a higher level, e.g. tracking the % of neurons in a layer that are salient. That might give some insight into the need for larger or smaller layers.