Friday, May 31, 2024

Dwarkesh with sholto-douglas-trenton-bricken

 sholto-douglas-trenton-bricken

AI scaling

grokking

monosemanticity

sparse penalty

Distilled models

"one hot vector that says, “this is the token that you should have predicted.”"


chain-of-thought as adaptive compute.

key value weights

Roam notes on dwarkesh patel conversation with sholto douglas trenton bricken

No comments: