→ Back to Laurent Orseau's page.
In a recent work [1][2] with Mark Ring (these papers have received the Solomonoff AGI Theory Prize 2011 for the strongest contribution to Artificial General Intelligence theory), we considered several kinds of Universal Mortal Agents, like AIXI but with different utility functions:
The agent not only outputs a action for the environment, but also its own source code for the next step; i.e., this output source code will be the definition of the agent on the next step. This allows the agent to modify itself in any desired way. However, for agents that are initially universally optimal, this is of little interest.
But let's consider additionally that the environment has read-access to this code [1], allowing it to base its returns depending on the definition of the agent.
We can define an additional survival agent, which utility function is defined so as to maximize the number of future steps the agent is identical to its initial description (apart from its “memory” of the past).
Now the environment proposes a (dangerous) game to the agent, called the Simpleton Gambit: Would the agent accept to modify itself into a unintelligent agent if the environment could (almost) guarantee that this would maximize its utility function?
We found the following results:
Let's move on to the next stage [2]. We offer the agents the access to a delusion box, a kind of remote control that the agents can program to entirely modify their input signals (but not to modify their brain!). This delusion box is an abstraction for a generalization of the wirehead problem: intelligent agents will always find shortcuts to maximize their utility, shortcuts that are generally not intended by the designers, e.g. by directly stimulating (but not modifying) the “reward area” inside its brain. Another possibility is for the agent to acquire (by all means!) the “reward remote control” that humans may use to control the agents' behavior.
Would the different agents find such delusion box interesting? Would they use or abuse it?
Let us first consider the case where the agent are immortal. We found the following somewhat surprising results:
Note that from the point of view of the agents, there is absolutely nothing wrong in using this delusion box, this is simply how they are defined.
Now if the agents are mortal again. Mortality can change everything, since the agents might not want to become “junkies”, since this may threaten their own lives. We found the following results:
All in all, the knowledge-seeking agent seems to be the most interesting one, and behaves according to expectation, i.e. it tries to understand the world as deeply as possible.
[1] Orseau L., Ring M., “Self-Modification and Mortality in Artificial Agents”, Artificial General Intelligence (AGI) 2011, Springer, 2011. (pdf)
[2] Ring M., Orseau L.,“Delusion, Survival, and Intelligent Agents”, Artificial General Intelligence (AGI) 2011, Springer, 2011. (pdf)