Techno Blender
Digitally Yours.

Anthropic just shared the rules its Claude 3 chatbot follows—but how t

0 27



The latest step forward in the development of large language models (LLMs) took place earlier this week, with the release of a new version of Claude, the LLM developed by AI company Anthropic—whose founders left OpenAI in late 2020 over concerns about the company’s pace of development.

But alongside the release of Claude 3, which sets new records in popular tests used to assess the prowess of LLMs, there was a second, more unusual innovation. Two days after Anthropic released Claude 3 to the world, Amanda Askell, a philosopher and ethicist researching AI alignment at Anthropic, and who worked on the LLM, shared the model’s system prompt on X.

Claude’s system prompt is just over 200 words, but outlines its worldview. “It should give concise responses to very simple questions, but provide thorough responses to more complex and open-ended questions,” the prompt reads. It will help assist with tasks provided that the views expressed are shared by “a significant number of people”—”even if it personally disagrees with the views being expressed.” And it doesn’t engage in stereotyping, “including the negative stereotyping of majority groups.”

In addition to sharing the text, Askell went on to contextualize the decisions the company made in writing the system prompt. The paragraph encouraging Claude to help provided a significant number share the same viewpoint was specifically inserted because Claude was a little more likely to refuse tasks if the user expressed right-wing views, Askell admitted.

Rumman Chowdhury, cofounder and CEO of Humane Intelligence, welcomes the transparency behind sharing the system prompt and thinks more companies ought to outline the foundational principles behind how their models are coded to respond. “I think there’s an appropriate ask for transparency and it’s a good step to be sharing prompts,” she says.

Others are also pleasantly surprised by Anthropic’s openness. “It’s really refreshing to see one of the big AI vendors demonstrate more transparency about how their system works,” says Simon Willison, a British programmer and AI watcher. “System prompts for other systems such as ChatGPT can be read through prompt leaking hacks, but given how useful they are for understanding how best to use these tools it’s frustrating that we have to use advanced tricks to read them.”

Anthropic, the makers of Claude 3, who declined to make Askell available for interview, are the only major LLM developer to share their system prompt.

Mike Katell, ethics fellow at the Alan Turing Institute, is cautiously supportive of Anthropic’s decision. “It is possible that system prompts will help developers implement Claude in more contextually sensitive ways, which could make Claude more useful in some settings,” he says. However, Katell says “this doesn’t do much to address the underlying problems of model design and training that lead to undesirable outputs, such as the racism, misogyny, falsehoods, and conspiracy theory content that chat agents frequently spew out”.

Katell also worries that such radical transparency has an ulterior motive—either deliberately or accidentally. “Making system prompts available also clouds the lines of responsibility for such outputs,” he says. “Anthropic would like to shift all responsibility for the model onto downstream users and developers, and providing the appearance of configurability is one way to do that.”

On that front, Chowdhury agrees. While this is transparency of a type—and anything is better than nothing—it’s far from the whole story when it comes to how these models work. “It’s good to know what the system prompt is but it’s not a complete picture of model activity,” says Chowdhury. As with everything to do with the current set of generative AI tools, it’s far more complicated than that, she explains: “Much of it will be based on training data, fine tuning, safeguards, and user interaction.”





The latest step forward in the development of large language models (LLMs) took place earlier this week, with the release of a new version of Claude, the LLM developed by AI company Anthropic—whose founders left OpenAI in late 2020 over concerns about the company’s pace of development.

But alongside the release of Claude 3, which sets new records in popular tests used to assess the prowess of LLMs, there was a second, more unusual innovation. Two days after Anthropic released Claude 3 to the world, Amanda Askell, a philosopher and ethicist researching AI alignment at Anthropic, and who worked on the LLM, shared the model’s system prompt on X.

Claude’s system prompt is just over 200 words, but outlines its worldview. “It should give concise responses to very simple questions, but provide thorough responses to more complex and open-ended questions,” the prompt reads. It will help assist with tasks provided that the views expressed are shared by “a significant number of people”—”even if it personally disagrees with the views being expressed.” And it doesn’t engage in stereotyping, “including the negative stereotyping of majority groups.”

In addition to sharing the text, Askell went on to contextualize the decisions the company made in writing the system prompt. The paragraph encouraging Claude to help provided a significant number share the same viewpoint was specifically inserted because Claude was a little more likely to refuse tasks if the user expressed right-wing views, Askell admitted.

Rumman Chowdhury, cofounder and CEO of Humane Intelligence, welcomes the transparency behind sharing the system prompt and thinks more companies ought to outline the foundational principles behind how their models are coded to respond. “I think there’s an appropriate ask for transparency and it’s a good step to be sharing prompts,” she says.

Others are also pleasantly surprised by Anthropic’s openness. “It’s really refreshing to see one of the big AI vendors demonstrate more transparency about how their system works,” says Simon Willison, a British programmer and AI watcher. “System prompts for other systems such as ChatGPT can be read through prompt leaking hacks, but given how useful they are for understanding how best to use these tools it’s frustrating that we have to use advanced tricks to read them.”

Anthropic, the makers of Claude 3, who declined to make Askell available for interview, are the only major LLM developer to share their system prompt.

Mike Katell, ethics fellow at the Alan Turing Institute, is cautiously supportive of Anthropic’s decision. “It is possible that system prompts will help developers implement Claude in more contextually sensitive ways, which could make Claude more useful in some settings,” he says. However, Katell says “this doesn’t do much to address the underlying problems of model design and training that lead to undesirable outputs, such as the racism, misogyny, falsehoods, and conspiracy theory content that chat agents frequently spew out”.

Katell also worries that such radical transparency has an ulterior motive—either deliberately or accidentally. “Making system prompts available also clouds the lines of responsibility for such outputs,” he says. “Anthropic would like to shift all responsibility for the model onto downstream users and developers, and providing the appearance of configurability is one way to do that.”

On that front, Chowdhury agrees. While this is transparency of a type—and anything is better than nothing—it’s far from the whole story when it comes to how these models work. “It’s good to know what the system prompt is but it’s not a complete picture of model activity,” says Chowdhury. As with everything to do with the current set of generative AI tools, it’s far more complicated than that, she explains: “Much of it will be based on training data, fine tuning, safeguards, and user interaction.”

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment