Research notes on machine learning, interpretability and the things we find inside the models.
Long-form write-ups from open-ended experiments — mechanistic interpretability, fine-tuning recipes, unexpected failure modes. Each post tries to leave a small, transferable lesson behind.
How a 162-second retrain recovers SAM3's open-vocab refusal
Post 01 left SAM3 with a catastrophic-forgetting failure: open-vocab refusal collapsed from 95.8 % to 3.2 % on SA-Co/Gold. This post tests three data-side recipes (replay, replay with negatives, post-hoc recovery), explains mechanically why each one stops where it does, and shows a 132 k-parameter retrain of a single MLP that recovers more refusal than any of them in 162 seconds of training. Post 2 of the SAM3 series.
How a vision transformer learns a new task — what we found inside SAM3
A mechanistic-interpretability tour of fine-tuning SAM3 on 37 watch-component concepts. The weights move in a low-rank subspace, the task crystallises at a single 256-dim mid-stack tensor, the text encoder turns out to be optional — and the same checkpoint fails catastrophically out of domain. Post 1 of the SAM3 series.