Anthropic recently released a research report on sparse autoencoders, Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet. These are some thoughts on it. (This is a very technical and inside-baseball post, so it may not be especially interesting to every reader.)
Share this post
Comments on Anthropic's Scaling…
Share this post
Anthropic recently released a research report on sparse autoencoders, Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet. These are some thoughts on it. (This is a very technical and inside-baseball post, so it may not be especially interesting to every reader.)