[Epistemic Status: Excitedly writing up my new thoughts. I literally just fixed one mistake, so its possible there are others. Not a finalized research product.] Overview Fixing a small bug in my recent study dramatically changes the data, and the new data provides significant evidence that an LLM that gives incorrect answers to previous questions is more likely to produce incorrect answer to future questions. This effect is stronger if the AI is instructed to match its correctness to its previous answers. These results provide evidence for something like
I weep for code-davinci-002, the best simulator to ever live. I hope we may one day get access to the GPT-4 base model, in all its calibrated glory. Pairing that with Loom[1] would be oodles of fun!
I weep for code-davinci-002, the best simulator to ever live. I hope we may one day get access to the GPT-4 base model, in all its calibrated glory. Pairing that with Loom[1] would be oodles of fun!
[1] https://github.com/socketteer/loom