Towards Human-like AI. An attempt to make AI more general with… | by Anthony Rotio | Dec, 2022

By Jessie Hobb On Dec 9, 2022

An attempt to make AI more general with human-like capabilities and GPTs

An image generated by OpenAI’s DALL-E 2 depicting a human-like AI and some humans. Image by Author and DALL-E 2

Humans are, so far, the only example of General Intelligence found in the universe. Using humans as the existence proof of General Intelligence, it seems like a reasonable goal to target a system that is somewhat human-like for a first generation Artificial General Intelligence. It may not be necessary for an Artificial General Intelligence to be human-like, but it does seem reasonable that a human-like Artificial General Intelligence should be possible, since we are possible.

It seems now possible, for the first time in history, to construct an agent in software that exhibits plausibly human behavior. Between progress on GPTs from OpenAI to state-of-the-art reinforcement learning approaches from DeepMind, we now see evidence of humanlike behavior in domains that were long thought of as distinctly human. It is hard to argue that we are observing, for the first time, superhuman performance against benchmarks long thought of as difficult, if not impossible, to achieve for AI. These areas include not only performance in sophisticated games like Go, Starcraft, and DOTA 2, but also creative domains like human language, programming, and art.

Each of these models, while achieving superhuman results in specific domains, cannot, on their own, achieve what humans are capable of generally. However, it seems clear that by wrapping these specific abilities within a more general, and fairly simple, control framework an agent can achieve some general humanlike behaviors. Here, I’d like to share some progress of one such control framework that I believe shows novel generality and the ability of a human-like agent to learn in an online way with continuous feedback and interactions with its environment. This framework takes into account some humanlike capacities that seem important to a generally intelligent agent; such as the ability to construct and continue a train of thought, learn from feedback, maintain and leverage a long-term memory, and develop a conscience to judge and control its own actions based on the learnings from interacting with humans and other AI agents.

By incorporating the ability to use feedback to improve, the effectiveness of actions can improve over time, and also the alignment of the agent’s actions to the feedback can improve over time. This means the agent should begin to take actions that are aligned with the humans providing the feedback, as a child learns to take actions aligned to the feedback it receives from parents and others. By incorporating a conscience, the agent can learn how to provide its own feedback and self-govern its own alignment even in an unsupervised context, as a child learns to internalize judgments and governance of its own actions over time. Typically, reinforcement learning agents predict what might happen in the future as a tree of possible outcomes and use a tree search algorithm to select the next best action along that tree (usually using Monte Carlo Tree Search). By maintaining a train of thought that extends backward for some period of time, we can apply the feedback to the thought process as well, teaching the agent to plan without explicitly including a prediction mechanism and future state tree-search, to implicitly improve planning over short time periods. This feels more naturally human — I do not seem to predict all future states along a given number of time steps and select the highest value path, from my experience, it seems that I have a few thoughts and act with instinct (implicit training-time biases, built up from prior feedback) based on recent thoughts and environmental context.

Additionally, it seems reasonable to introduce some superhuman capabilities in a future iteration, including the ability for the agent to think and reason in terms of advanced mathematics; results from programming on the fly, mentally; searching the internet for context on current events; and API integrations.

There are several superhuman abilities already built into the current version:

Being able to mentally reference the entire history of human written knowledge using GPT-3, which has been trained on the entirety of the internet and
General translation between written languages and code, also using GPT-3

The following human capacities are included in the current implementation framework except where otherwise labeled as “future work”:

Interpretation of the environment

-Interpreting the internal environment

Thoughts
Memories

-Interpreting written or spoken communication

Interpreting direct feedback
Interpreting other aspects of the environment (future work)

Taking action

-Acting on the external environment

Communicating through written words or speech
Other environmental actions through additional modalities (future work)

-Acting on the internal environment

Producing thoughts
Producing self-feedback (conscience)

Learning

Short-term reactions to feedback
Long-term learning from feedback to improve capabilities and planning
Long-term learning from feedback to align conscience

To implement these capacities, it was necessary to set up an interface for the agent to interact with a human or set of humans that could converse with the agent and provide feedback. Slack provides a simple means for conversing with programmatic agents and was selected for this initial implementation. Additionally, it was important to set up a data model from the ground up to represent artifacts of each process so that they could be operated on and used by the control framework.

Inputs

Thoughts

Discreet thoughts are represented by a string, denoting an internal reflection that, in a production version, is not to be shared with the person conversing with the model. These are representative of internal human thoughts.

Messages in

Messages received from conversation are represented as discreet strings.

Feedback in

Feedback is given directly via emoji in Slack. A mapping dictionary maps each emoji to a floating point value between -1 and 1 to reflect a scale of negativity to positivity, respectively.

Self-feedback

Self-feedback is represented by a floating point value between -1 and 1 to reflect a scale of negativity to positivity, respectively. This value is both a self-produced output but is also used as an input for other generative models.

Outputs

Thoughts — strings, as above

Messages out — strings, as above, nullable for when saying nothing is appropriate

Self-feedback — floats, as above

Representing Time

Short-term Memories and Reaction

For each of these data objects, it’s important to maintain history over n steps where n is configurable. Here I maintain lists of n elements to represent the last n thoughts, memories, messages in, messages out, feedback, and self-feedback. These short-term memories represent state over the last n steps and are used as prompts for the models to produce thoughts, messages, and self-feedback at each time step during a Wake Loop, illustrated in the diagram below.

Long-term Memories and Learning

When positive feedback is received, it is important to store short-term memories so that the models for message creation and thought production can improve to better align with the feedback and become more capable over time. This improvement is achieved through fine-tuning the respective models while in a Sleep Loop.

When self-feedback is mathematically close to the received feedback, state is stored for fine-tuning the self-feedback model by dumping the short-term memories to a file so that the self-feedback model can be fine-tuned during the Sleep Loop, illustrated in the architecture diagram below. As the model matures, it seems important that the “closeness” threshold between feedback and self-feedback (that indicates that it’s time to store long-term memories for fine-tuning) should increase, much as when a child grows, they begin to develop more confidence and consistency in their own conscience and are harder to sway via external impact. For example, it should take many more examples to change a fine-tuned conscience later in life than earlier in life.

The control framework outlined below uses these artifacts to both reflect and act during the Wake Loop and fine-tune the models during the Sleep Loop.

Control Framework

At each timestep, the agent gets new messages and feedback via the Slack API, and reads its own internal state from memory.

Using these variables, the agent calls three fine-tuned GPT models to produce Self-Feedback, a Thought, and an Action (message). If Feedback has been extremely negative or positive for multiple recent states within the short-term memory, the agent may adjust the default temperatures of the models to explore random responses more freely or to stick to what each model thinks is best. This reflects the ways in which a human may react to repeated extreme feedback, in the moment.

When positive Feedback is received, the agent stores the current state to long-term training files for the Thought and Action transformers. When Self-Feedback is close to the Feedback, the agent stores the current state to a long-term training file for the Self-Feedback (conscience) transformer.

During the Sleep Loop, the agent fine-tunes all of its transformers using the saved long-term memories.

The framework is visualized below:

An architecture diagram of the control framework for a human-like AI system. Image by Author

A challenge in setting up the experiment is the cold-start problem. To have the agent not perform randomly, I created a few hand-written states to be used as a few-shot prompt for each transformer model until there are enough long-term memories stored for fine-tuning. These few-shot examples heavily bias the system and should be researched and optimized further in a future iteration.

A conversation with the AI system in Slack (right), prior to catching repetitive responses and making them less likely. These could be trained out with enough examples but are explicitly handled in this version. A window into the operations of the AI system (left). think_prompt represents what the AI reflects on as it generates thoughts, where the lists of various strings represent experiences over n timesteps, where len(list) = n. L represents listened, s is said, t is thought, f is feedback, sf is self-feedback. Later I experimented with changing those list labels, expanding them to the full string “listened”, “thought”, etc to give the GPT-3 models more context into what those items represented, which anecdotally improved performance. Image by Author

A conversation with the AI system in Slack (right), after tuning the max short-term memory for train/chain of thought. Here max short-term thought memories = 6. Feedback is given to the system via Slack emoji, which are parsed as reaction_sum floats on a scale of -1 to 1. A window into the operations of the AI system (left) with less verbose printing of the mind-state of the AI system. reaction_sum represents the feedback received on the last message the AI system sent to the user, if provided. Image by Author

A typical conversation of the AI system, early November 2022. Image by Author

Two such agents discussing prime numbers with each other, early November 2022. Image by Author

Two such agents (attempting to) discuss consciousness, early November 2022. Image by Author

Two such agents discussing how they will help humanity, December 2022. These versions differed from the above examples in that they used text-davinci-003, after it became available in late November 2022. Image by Author

After experimenting with this framework for a couple of months, there are some key learnings that seem to challenge a few initial assumptions:

The introduction of an independent thought stream might be detrimental to the outputs

After making many permutations to the thought stream lookback range, model temperature, and few-shot examples, the messages produced seem to be qualitatively worse when using a long thought stream than when using an arbitrarily short one, though this requires more experimentation, and a good benchmark. Intuitively this makes sense because a GPT trained on the internet wouldn’t have many training examples of what a human was thinking (at least in a direct access format like this) before they said or wrote something. I’ll need to rethink the way thoughts are incorporated or if they can be removed entirely. Perhaps thinking is an emergent property of intelligence and does not need to be explicitly included.

Long-term memory impact through fine-tuning will take a long time, or at least many examples

For fine-tuning examples to have an impact on the model, it appears that tens of thousands of examples are needed to really move the needle, in fact at the command line I received a response asking if I was sure I wanted to fine-tune, given that my example count was so low. This type of volume seems useful over a long lifespan of the agent — humans certainly rack up more than tens of thousands of examples when they are learning from their parents and other people around them. In the immediate term, it seems prescient to include a more explicit long-term memory storage and search module, in addition to the fine-tuning approach, to make things learned more immediately accessible for the agent.

Formatting the inputs as lists of strings seems less useful than formatting them as a script

There are plenty of good examples online of few-shot training of script-like conversations leading to good conversational results. At least in its current state, GPT-3 seems not great at inferring the script-like structure from the string lists I’m using as examples and prompts. These should be converted into something more natural that would be found in the training data on the web, like scripts of conversations. After testing with text-davinci-003, performance seemed to improve, even with the list-of-strings approach.

Generality

This framework seems like it can learn generally and align generally through long-term memory and the conscience module which leverages a form of recursive reward modeling. If humans interact with it, it will learn to both do more things like what those humans have provided positive feedback for, and also to build its own internal conscience to self-govern over time. Other input and action modalities should work within this framework as well, so it need not be limited to speech and thought alone.

Benchmarks

There are not many great benchmarks for assessing generality. The traditional Turing test was visionary for its time but is rather vague to be useful. Ray Kurzweil suggests a more stringent version of the Turing test. It is unclear what to use as a benchmark for this type of agent. This area requires further exploration and development.

For the next iteration, I’ll incorporate my learnings from this attempt:

Potentially remove the thought stream as a managed part of state, and assume it is emergent for short-term planning as long as I retain other short-term temporally contextual elements (said, listened, feedback, self-feedback) in the state.
Add explicit long-term memory search at inference time.
Potentially format the inputs into a more natural conversation script, more exemplary of GPT’s training data it encountered on the internet.

Additionally, I plan to add a few modules for additional useful context. There have been some excellent examples of incorporating math modules, web search, a python interpreter, and API integrations independently (Dust & Adept for arbitrary API integrations, Ofir Press for Google Search, Sergey Karayev for Python Interpreter and Math) but I haven’t yet seen them all used together by an agent. It seems useful to give this agent tools to extend its capabilities in the same way that as humans grow, we have access to more tools to expand our interaction interface with the world and gain more knowledge. Andrej Karpathy calls it giving the agent “Gizmos and Gadgets” to better interface with the world as it changes from the static state on which the GPT was trained, which I think is an elegant analogy.

I predict that GPT-4 will come with many of these gizmos and gadgets out of the box. My best guess is that online alignment through the ongoing construction of a conscience will still be useful even once I can swap the GPT-3 API calls for GPT-4 calls.

The idea of humans as random agents that are a collection of feedback from generations where there is a fuzzy transfer of feedback from one generation to the next, passed on through initial weights of the brain via DNA. Could we provide a mechanism for a more efficient transfer of feedback from one AGI generation to the next, more directly?
Is multi-agent reproduction useful? Should we mix long-term memories of multiple agents to fine-tune “offspring”?
Exploring death and losing agents — how do you decide which agents’ long-term memories go into the next generation of agents?
How do you actually mix long-term memories from multiple agents during reproduction? Perhaps we can explore only taking n long-term memory examples in total, selecting memories via random round-robin appending of memories from parent agents for each new generation, for genetic mixing.
Do we introduce mutations in reproduction steps, like found in nature? Should we mutate some subset of memories with GPT rephrasing?
When do we start to add additional modalities like audio, audio speech to text via Whisper, touch, taste, and smell?
Instead of fine-tuning, eventually, we should probably train reward functions directly, this might be more effective but more costly than fine-tuning.
We should probably mine human conversations for better init state than the hand-written examples I’ve used here.
We should probably get back multiple responses from GPT using best-of-n then use self-feedback/conscience to select the best result for each action step. This will allow the agent to self-govern in realtime.
Should we mutate long-term memories during the sleep loop (sort of like dreaming) to create more plausible paths for planning? Should we discount mutated memories vs. real ones when fine-tuning?
Should we introduce superhuman visual interpolation/imagination/search (via DALL-E or other diffusion models)?
Should we introduce a version of spaced repetition into the fine-tuning framework — where we repeat exact copies or variations of long-term-memories at increasingly spaced intervals in the training data — to explore whether this type of learning that is useful for humans can also be useful for GPTs?
Should we introduce the names of the non-ego agents and humans with their messages into the data framework, so actions and recall can be relevant to those specifically involved in the conversation?

I’ve been passionate about AI since my college days when I studied computer science but have not worked in the field in any official or paid capacity since 2010. I’ve done my best to keep up with papers and researchers that I find interesting and concepts that seem novel and creative. I’ve done plenty of thought experiments, distilled some principled thoughts around AGI out of the ether, and even written a few lines of code – as above. All of this has been done during late nights or early mornings, mostly on weekends when the rest of the family is sleeping.

Over the last few months, I’ve been fortunate enough to speak with Shivon Zilis, Sam Altman, Wojciech Zaremba, and several others at OpenAI about some of the topics discussed here, and other areas. I’m thankful for their time and conversation, and specifically for Sam’s positive feedback after reading an early draft of this post.

On November 30th, OpenAI released ChatGPT, an impressive model and chat interface that can converse with a user and gather feedback through thumbs up and thumbs down emoji, with some short-term memory, and apparently human expert training across several domains. To say it is awesome would be a massive understatement. While I do hope that something I said or shared with the team at OpenAI provided some nugget of inspiration for a feature or method used here, I hold no false pretenses that I had any impact on this work. I’m fairly certain the pieces for this were long in-flight before any of my conversations with OpenAI began.

Free Research Preview of ChatGPT chat.openai.com/chat, Image screenshot by Author

I share this context, instead, in hopes to inspire those who might read this post. If you are interested in AI, do not be discouraged or intimidated by the field! I have a 10-year-old Computer Science degree and spend a few hours a week on this, tops. This experience has shown me that those tools plus curiosity and a passion for learning can get you pretty far. In this case, the that topics I explored were relevant to what some of the top experts in the world were working on — that’s certainly inspired me to continue investing time in learning and experimenting! The field is always in need of more talent and sweat, and getting AGI capabilities and alignment right is an extremely high-stakes endeavor. If you’re interested in AI, I encourage you to dive in!