3 Comments

Also, in the Zelda example in the post, if we interpret the actions in the way you say (i.e. killing the shopkeeper is evil) then the AI is actually doing the opposite of betraying us when it makes its "treacherous" turn, right? Because around episode 3,000 it suddenly starts using the benevolent strategy, in addition to the violent one which it was using all along. (I realize this doesn't matter for the overall argument.)

Expand full comment

Nice post! Do you think this is a correct interpretation? "Qualitatively different strategies emerge at different levels of capability. Just because the strategies that emerge at lower capability levels don't disempower us doesn't mean the ones at higher capability levels aren't going to disempower us either."

Expand full comment