
Supervised Fine Tuning on Curated Data is Reinforcement Learning (and can be improved)
Large language models have made big progress over the last years. Today, many of us are already using these models in our daily lives – to give suggestions, to brainstorm ideas, to do research. As we suddenly find ourselves in a situation of societal co-evolution with the models we train, it