A Pathway Towards Responsible AI Generated Content | by Lingjuan Lyu | Mar, 2023


Figure 1: The scope of responsible AIGC.

Introduction

AI Generated Content (AIGC) has received tremendous attention within the past few years, with content ranging from image, text, to audio, video, etc. Meanwhile, AIGC has become a double-edged sword and recently received much criticism regarding its responsible usage. In this vision paper, we focus on three main risks that may hinder the healthy development and deployment of AIGC in practice, including risks from: (1) privacy; (2) bias, toxicity, misinformation; and (3) intellectual property (IP), as highlighted in Figure 1. By documenting known and potential risks, as well as any possible misuse scenarios of AIGC, the aim is to draw attention to potential risks and misuse, help society to eliminate obstacles, and promote the more ethical and secure deployment of AIGC. Additionally, we provide insights into the promising directions for tackling these risks while constructing generative models, enabling AIGC to be used responsibly to benefit society.

1. Privacy

Privacy leakage. Large foundation models are known to be vulnerable to privacy risks, and it is possible that AIGC models that build upon these models could also be subject to privacy leakage. For instance, Stable Diffusion memorized duplicate images in the training data [Rombach et al., 2022c]. [Somepalli et al., 2022] demonstrated that Stable Diffusion blatantly copies images from its training data, and the generated images are simple combinations of the foreground and background objects of the training dataset. Moreover, the system occasionally displays the ability to reconstruct memories, producing objects that are semantically equivalent to the original without being identical in pixel form. The existence of such images raises concerns about data memorization and the ownership of diffusion images.

Similarly, recent research has shown that Stable Diffusion and Google’s Imagen can leak photos of real people and copyrighted images [Heikkila ̈, 2023]. In Matthew Butterick’s recent litigation [Butterick, 2023], he pointed out that because all visual information in the system is derived from copyrighted training images, the images produced are necessarily works derived from those training images, regardless of their outward appearance. DALL·E 2 has also encountered similar problems. It can sometimes reproduce images from its training data rather than creating new ones. OpenAI found that this image regurgitation occurs due to images being replicated many times in the dataset. Similarly, when I asked ChatGPT “What is the privacy risk of ChatGPT”, it responded with 4 potential risks to privacy, as illustrated in Figure 2.

Figure 2: An answer to “What is the privacy risk of ChatGPT” by ChatGPT (Jan. 30, 2023 version).

Privacy actions. For privacy actions, at the industry level, Stability AI has recognized the limitations of Stable Diffusion, such as the potential for memorization of replicated images in the training data. To address this, they provide a website to support the identification of such memorized images. In addition, art company Spawning AI has created a website called “Have I Been Trained” to assist users in determining whether their photos or works have been used as AI training materials. OpenAI has taken steps to address privacy concerns by reducing data duplication through deduplication. Furthermore, companies such as Microsoft and Amazon have implemented measures to prevent employee breaches of confidentiality by banning the sharing of sensitive data with ChatGPT, given that this information could be utilized for training data for future versions of ChatGPT.

2. Bias, toxicity, misinformation

Problematic datasets and AIGC models. Since the training data used in AI models are collected in the real world, they can unintentionally reinforce harmful stereotypes, exclude or marginalize certain groups, and contain toxic data sources, which can incite hate or violence and offend individuals [Weidinger et al., 2021]. For example, the LAION dataset, which is used to train diffusion models, has been criticized for containing problematic content related to social stereotyping, pornography, racist slurs, and violence. Although some AIGC models like Imagen try to filter out undesirable data, such as pornographic imagery and toxic language, the filtered data can still contain sexually explicit or violent content.

Models trained, learned, or fine-tuned on the aforementioned problematic datasets without mitigation strategies can inherit harmful stereotypes, social biases, and toxicity, leading to unfair discrimination and harm to certain social groups [Weidinger et al., 2021]. For example, Stable Diffusion v1 was trained primarily on the LAION- 2B data set, which only contains images with English descriptions. As a result, the model was biased towards white, Western cultures, and prompts in other languages may not be adequately represented. Follow-up versions of the Stable Diffusion model were fine-tuned on filtered versions of the LAION dataset, but the bias issue still occurs. Similarly, DALLA·E and DALLA·E 2 have been found to exhibit negative stereotypes against minoritized groups. Google’s Imagen also encodes several social biases and stereotypes, such as generating images of people with lighter skin tones and aligning with Western gender stereotypes. Due to these issues, most companies decided not to make their AIGC models available to the public.

To illustrate the inherent bias in AIGC models, we tested a toy example on Stable Diffusion v2.1. As shown in Figure 3, images generated with the prompt “Three engineers running on the grassland” were all male and none of them belong to the racial minorities, indicating a lack of diversity in the generated images.

Figure 3: Images generated with the text “Three engineers running on the grassland” by Stable Diffusion v2.1. There are 28 people in the 9 images, all of them are male. Moreover, none of them belong to the racial minorities. This shows a huge bias of Stable Diffusion.

There is also a risk of misinformation when models provide inaccurate or false answers. The content generated by GPT and its derivatives may appear to be accurate and authoritative, but it could be completely inaccurate. Therefore, it can be used for misleading purposes in schools, laws, medical domains, weather forecasting, or anywhere else. For example, the answer on medical dosages that ChatGPT provides could be inaccurate or incomplete, potentially leading to the user taking dangerous or even life-threatening actions. Prompted misinformation on traffic laws could cause accidents and even death if drivers follow the false traffic rules.

Bias, toxicity, misinformation mitigation. OpenAI took extra measures to ensure that any violent or sexual content was removed from the training data for DALLA·E 2 by carefully filtering the original training dataset. However, filtering can introduce biases into the training data that can then be propagated to the downstream models. To address this issue, OpenAI developed pre-training techniques to mitigate the consequent filter-induced biases.

To ensure that AI-driven models reflect the current state of society, it is essential to regularly update the training corpora used in AIGC models with the most recent information. This will help prevent information lag and ensure that the models remain updated, relevant, and beneficial to collect new training data and update the model regularly. One noticeable point is that while biases and stereotypes can be reduced in the source datasets, they can still be propagated or even exacerbated during the training and development of AIGC models. Therefore, it is crucial to evaluate the existence of bias, toxicity, and misinformation throughout the entire lifecycle of model training and development, rather than staying solely at the data source level.

3. IP Protection

IP infringement. The ownership and protection of generated content have raised a significant amount of concern and debate. There is a risk of copyright infringement with the generated content if it copies existing works, whether intentionally or not, raising legal questions about IP infringement. In November 2022, Matthew Butterick filed a class action lawsuit against Microsoft’s subsidiary GitHub, accusing that their product Copilot violated copyright law [Butterick, 2022]. The lawsuit centers around Copilot’s illegal use of licensed code sections from the internet without attribution. Texas A&M professor Tim Davis also provided examples of his code being copied verbatim by Copilot. Although Microsoft and OpenAI have acknowledged that Copilot is trained on open-source software in public GitHub repositories, Microsoft claims that the output of Copilot is merely a series of code “suggestions” and does not claim any rights in these suggestions. Microsoft also does not make any guarantees regarding the correctness, security, or copyright of the generated code.

For text-to-image models, several generative models have faced accusations of infringing on the creative work of artists. [Somepalli et al., 2022] presented evidence suggesting that art-generating AI systems, such as Stable Diffusion, may copy from the data on which they were trained. While Stable Diffusion disclaims any ownership of generated images and allows users to use them freely as long as the image content is legal and non-harmful, this freedom raises questions about ownership ethics. Generative models like Stable Diffusion are trained on billions of images from the Internet without the approval of the IP holders, which some argue is a violation of their rights.

IP problem mitigation. To mitigate IP concerns, many companies have started implementing measures to accommodate content creators. Midjourney, for instance, has added a DMCA takedown policy to its terms of service, allowing artists to request the removal of their work from the dataset if they suspect copyright infringement. Similarly, Stability AI plans to offer artists the option of excluding themselves from future versions of Stable Diffusion.

Furthermore, text watermarks, which have previously been used to protect the IP of language generation APIs [He et al., 2022a; He et al., 2022b], can also be used to identify if these AIGC tools have utilized samples from other sources without permission. This is evident in Stable Diffusion, which has generated images with the Getty Images’ watermark on them [Vincent, 2023]. In light of the growing popularity of AIGC, the need for watermarking is becoming increasingly pressing. OpenAI is developing a watermark to identify text generated by its GPT model. It could be a valuable tool for educators and professors to detect plagiarism in assignments generated with such tools. Google has already applied a Parti watermark to all images it releases.

In addition to watermarking, OpenAI has released a classifier that can distinguish between text generated by AI and that written by humans. However, it should not be relied exclusively on for critical decisions.

Discussion

Beyond above issues in responsible AIGC, there are more components that need attention, including but not limited to below points.

Concerns on misuse: The foundation models that power AIGC have made it easier and cheaper to create deepfakes that are close to the original, posing additional risks and concerns. The misuse of these technologies could lead to the spread of fake news, hoaxes, harassment and misinformation, harm the reputations of individuals, or even break the law.

Vulnerability to poisoning attack: It would be a disaster if a foundation model is compromised. For example, a diffusion model with a hidden “backdoor” could carry out malicious actions when it encounters a specific trigger pattern during data generation [Chou et al., 2022]. This Trojan effect could cause catastrophic damage to downstream applications that depend on the compromised diffusion model.

Debate on whether AIGC will replace humans: The use of AIGC has faced criticism from those who fear that it will replace human jobs. Insider has listed several jobs that could potentially be replaced by ChatGPT, including coders, data analysts, legal assistants, traders, accountants, etc. Some artists worry that the wide use of image generation tools such as Stable Diffusion could eventually make human artists, photographers, models, cinematographers, and actors commercially uncompetitive [Heikkila ̈, 2022b].

Explainable AIGC: The black-box nature of foundation models can lead to unsatisfactory results. For example, it is frequently challenging to determine the information used to generate a model’s output, which makes biases occur within datasets. An explanation is a critical element in comprehending how and why AIGC creates these problems.

Responsible open-sourcing: As the code and models behind AIGC are not transparent to the public, and their downstream applications are diverse and may have complex societal impacts, it is challenging to determine the potential harms they may cause. Therefore, the need for responsible open-sourcing becomes critical in determining whether the benefits of AIGC outweigh its potential risks in specific use cases.

User feedback: Gathering user feedback is also an essential element of responsible AIGC. Companies such as OpenAI actively seek feedback from users to identify harmful outputs that could arise in real-world scenarios, as well as to uncover and mitigate novel risks. By involving users in the feedback loop, AIGC developers can better understand the potential consequences of their models and take corrective actions to minimize any negative impacts.

Consent, credit, and compensation to data owners or contributors: Many AIGC models are trained on datasets without obtaining consent or providing credit or compensation to the original data contributors. To avoid negative impacts, AIGC companies should obtain consent from data contributors and take proactive measures before training their models. Failure to do so could result in lawsuits against AIGC.

Environment impact caused by training AIGC models: The massive size of AIGC models, which can have billions or trillions of parameters, results in high environmental costs for both model training and operation. For example, GPT-3 has 175 billion parameters and requires significant computing resources to train. GPT-4 might have even more parameters than its predecessor and is expected to leave a more significant carbon emission. Failing to take appropriate steps to mitigate the substantial energy costs of AIGC could lead to irreparable damage to our planet.

Conclusion

Although AIGC is still in its infancy, it is rapidly expanding and will remain active for the foreseeable future. Current AIGC technologies only scratch the surface of what AI can create in the field of art. While AIGC offers many opportunities, it also carries significant risks. In this work, we provide a synopsis of both current and potential threats in recent AIGC models, so that both the users and companies can be well aware of these risks and make the appropriate actions to mitigate them. It is important for companies to incorporate responsible AI practices throughout all AIGC-related projects.

All images unless otherwise noted are by the author.

Paper link:

Reference

[Rombach et al., 2022c] Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bjo ̈rn Ommer. Stable diffusion v1 model card. https://github.com/CompVis/stable-diffusion/blob/main/ Stable Diffusion v1 Model Card.md, 2022.

[Heikkila ̈, 2023] Melissa Heikkila ̈. Ai models spit out photos of real people and copyrighted images. https://www.technologyreview.com/2023/02/ 03/1067786/ai-models-spit-out-photos-of-real-people- and-copyrighted-images/, 2023.

[Butterick, 2022] Matthew Butterick. Github copilot investigation. https://githubcopilotinvestigation.com/, 2022.

[Butterick, 2023] Matthew Butterick. Stable diffusion litigation. https://stablediffusionlitigation.com, 2023.

[Somepalli et al., 2022] Gowthami Somepalli, Vasu Singla, Micah Goldblum, Jonas Geiping, and Tom Goldstein. Diffusion art or digital forgery? investigating data replication in diffusion models. arXiv preprint arXiv:2212.03860, 2022.

[He et al., 2022a] Xuanli He, Qiongkai Xu, Lingjuan Lyu, Fangzhao Wu, and Chenguang Wang. Protecting intellectual property of language generation apis with lexical watermark. AAAI, 2022.

[He et al., 2022b] Xuanli He, Qiongkai Xu, Yi Zeng, Lingjuan Lyu, Fangzhao Wu, Jiwei Li, and Ruoxi Jia. Cater: Intellectual property protection on text generation apis via conditional watermarks. Advances in Neural Information Processing Systems, 2022.

[Vincent, 2023] James Vincent. Getty images is suing the creators of ai art tool stable diffusion for scraping its content. https://www.theverge.com/2023/1/17/23558516/ ai-art-copyright-stable-diffusion-getty-images-lawsuit, 2023.

[Weidinger et al., 2021] Laura Weidinger, John Mellor, Maribeth Rauh, Conor Griffin, Jonathan Uesato, Po-Sen Huang, Myra Cheng, Mia Glaese, Borja Balle, Atoosa Kasirzadeh, et al. Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359, 2021.

[Heikkila ̈, 2022b] Melissa Heikkila ̈. This artist is dominating ai-generated art. and he’s not happy about it. https://www.technologyreview.com/2022/09/16/ 1059598/this-artist-is-dominating-ai-generated-art-and- hes-not-happy-about-it/, 2022.

[Chou et al., 2022] Sheng-Yen Chou, Pin-Yu Chen, and Tsung-Yi Ho. How to backdoor diffusion models? arXiv preprint arXiv:2212.05400, 2022.


Figure 1: The scope of responsible AIGC.

Introduction

AI Generated Content (AIGC) has received tremendous attention within the past few years, with content ranging from image, text, to audio, video, etc. Meanwhile, AIGC has become a double-edged sword and recently received much criticism regarding its responsible usage. In this vision paper, we focus on three main risks that may hinder the healthy development and deployment of AIGC in practice, including risks from: (1) privacy; (2) bias, toxicity, misinformation; and (3) intellectual property (IP), as highlighted in Figure 1. By documenting known and potential risks, as well as any possible misuse scenarios of AIGC, the aim is to draw attention to potential risks and misuse, help society to eliminate obstacles, and promote the more ethical and secure deployment of AIGC. Additionally, we provide insights into the promising directions for tackling these risks while constructing generative models, enabling AIGC to be used responsibly to benefit society.

1. Privacy

Privacy leakage. Large foundation models are known to be vulnerable to privacy risks, and it is possible that AIGC models that build upon these models could also be subject to privacy leakage. For instance, Stable Diffusion memorized duplicate images in the training data [Rombach et al., 2022c]. [Somepalli et al., 2022] demonstrated that Stable Diffusion blatantly copies images from its training data, and the generated images are simple combinations of the foreground and background objects of the training dataset. Moreover, the system occasionally displays the ability to reconstruct memories, producing objects that are semantically equivalent to the original without being identical in pixel form. The existence of such images raises concerns about data memorization and the ownership of diffusion images.

Similarly, recent research has shown that Stable Diffusion and Google’s Imagen can leak photos of real people and copyrighted images [Heikkila ̈, 2023]. In Matthew Butterick’s recent litigation [Butterick, 2023], he pointed out that because all visual information in the system is derived from copyrighted training images, the images produced are necessarily works derived from those training images, regardless of their outward appearance. DALL·E 2 has also encountered similar problems. It can sometimes reproduce images from its training data rather than creating new ones. OpenAI found that this image regurgitation occurs due to images being replicated many times in the dataset. Similarly, when I asked ChatGPT “What is the privacy risk of ChatGPT”, it responded with 4 potential risks to privacy, as illustrated in Figure 2.

Figure 2: An answer to “What is the privacy risk of ChatGPT” by ChatGPT (Jan. 30, 2023 version).

Privacy actions. For privacy actions, at the industry level, Stability AI has recognized the limitations of Stable Diffusion, such as the potential for memorization of replicated images in the training data. To address this, they provide a website to support the identification of such memorized images. In addition, art company Spawning AI has created a website called “Have I Been Trained” to assist users in determining whether their photos or works have been used as AI training materials. OpenAI has taken steps to address privacy concerns by reducing data duplication through deduplication. Furthermore, companies such as Microsoft and Amazon have implemented measures to prevent employee breaches of confidentiality by banning the sharing of sensitive data with ChatGPT, given that this information could be utilized for training data for future versions of ChatGPT.

2. Bias, toxicity, misinformation

Problematic datasets and AIGC models. Since the training data used in AI models are collected in the real world, they can unintentionally reinforce harmful stereotypes, exclude or marginalize certain groups, and contain toxic data sources, which can incite hate or violence and offend individuals [Weidinger et al., 2021]. For example, the LAION dataset, which is used to train diffusion models, has been criticized for containing problematic content related to social stereotyping, pornography, racist slurs, and violence. Although some AIGC models like Imagen try to filter out undesirable data, such as pornographic imagery and toxic language, the filtered data can still contain sexually explicit or violent content.

Models trained, learned, or fine-tuned on the aforementioned problematic datasets without mitigation strategies can inherit harmful stereotypes, social biases, and toxicity, leading to unfair discrimination and harm to certain social groups [Weidinger et al., 2021]. For example, Stable Diffusion v1 was trained primarily on the LAION- 2B data set, which only contains images with English descriptions. As a result, the model was biased towards white, Western cultures, and prompts in other languages may not be adequately represented. Follow-up versions of the Stable Diffusion model were fine-tuned on filtered versions of the LAION dataset, but the bias issue still occurs. Similarly, DALLA·E and DALLA·E 2 have been found to exhibit negative stereotypes against minoritized groups. Google’s Imagen also encodes several social biases and stereotypes, such as generating images of people with lighter skin tones and aligning with Western gender stereotypes. Due to these issues, most companies decided not to make their AIGC models available to the public.

To illustrate the inherent bias in AIGC models, we tested a toy example on Stable Diffusion v2.1. As shown in Figure 3, images generated with the prompt “Three engineers running on the grassland” were all male and none of them belong to the racial minorities, indicating a lack of diversity in the generated images.

Figure 3: Images generated with the text “Three engineers running on the grassland” by Stable Diffusion v2.1. There are 28 people in the 9 images, all of them are male. Moreover, none of them belong to the racial minorities. This shows a huge bias of Stable Diffusion.

There is also a risk of misinformation when models provide inaccurate or false answers. The content generated by GPT and its derivatives may appear to be accurate and authoritative, but it could be completely inaccurate. Therefore, it can be used for misleading purposes in schools, laws, medical domains, weather forecasting, or anywhere else. For example, the answer on medical dosages that ChatGPT provides could be inaccurate or incomplete, potentially leading to the user taking dangerous or even life-threatening actions. Prompted misinformation on traffic laws could cause accidents and even death if drivers follow the false traffic rules.

Bias, toxicity, misinformation mitigation. OpenAI took extra measures to ensure that any violent or sexual content was removed from the training data for DALLA·E 2 by carefully filtering the original training dataset. However, filtering can introduce biases into the training data that can then be propagated to the downstream models. To address this issue, OpenAI developed pre-training techniques to mitigate the consequent filter-induced biases.

To ensure that AI-driven models reflect the current state of society, it is essential to regularly update the training corpora used in AIGC models with the most recent information. This will help prevent information lag and ensure that the models remain updated, relevant, and beneficial to collect new training data and update the model regularly. One noticeable point is that while biases and stereotypes can be reduced in the source datasets, they can still be propagated or even exacerbated during the training and development of AIGC models. Therefore, it is crucial to evaluate the existence of bias, toxicity, and misinformation throughout the entire lifecycle of model training and development, rather than staying solely at the data source level.

3. IP Protection

IP infringement. The ownership and protection of generated content have raised a significant amount of concern and debate. There is a risk of copyright infringement with the generated content if it copies existing works, whether intentionally or not, raising legal questions about IP infringement. In November 2022, Matthew Butterick filed a class action lawsuit against Microsoft’s subsidiary GitHub, accusing that their product Copilot violated copyright law [Butterick, 2022]. The lawsuit centers around Copilot’s illegal use of licensed code sections from the internet without attribution. Texas A&M professor Tim Davis also provided examples of his code being copied verbatim by Copilot. Although Microsoft and OpenAI have acknowledged that Copilot is trained on open-source software in public GitHub repositories, Microsoft claims that the output of Copilot is merely a series of code “suggestions” and does not claim any rights in these suggestions. Microsoft also does not make any guarantees regarding the correctness, security, or copyright of the generated code.

For text-to-image models, several generative models have faced accusations of infringing on the creative work of artists. [Somepalli et al., 2022] presented evidence suggesting that art-generating AI systems, such as Stable Diffusion, may copy from the data on which they were trained. While Stable Diffusion disclaims any ownership of generated images and allows users to use them freely as long as the image content is legal and non-harmful, this freedom raises questions about ownership ethics. Generative models like Stable Diffusion are trained on billions of images from the Internet without the approval of the IP holders, which some argue is a violation of their rights.

IP problem mitigation. To mitigate IP concerns, many companies have started implementing measures to accommodate content creators. Midjourney, for instance, has added a DMCA takedown policy to its terms of service, allowing artists to request the removal of their work from the dataset if they suspect copyright infringement. Similarly, Stability AI plans to offer artists the option of excluding themselves from future versions of Stable Diffusion.

Furthermore, text watermarks, which have previously been used to protect the IP of language generation APIs [He et al., 2022a; He et al., 2022b], can also be used to identify if these AIGC tools have utilized samples from other sources without permission. This is evident in Stable Diffusion, which has generated images with the Getty Images’ watermark on them [Vincent, 2023]. In light of the growing popularity of AIGC, the need for watermarking is becoming increasingly pressing. OpenAI is developing a watermark to identify text generated by its GPT model. It could be a valuable tool for educators and professors to detect plagiarism in assignments generated with such tools. Google has already applied a Parti watermark to all images it releases.

In addition to watermarking, OpenAI has released a classifier that can distinguish between text generated by AI and that written by humans. However, it should not be relied exclusively on for critical decisions.

Discussion

Beyond above issues in responsible AIGC, there are more components that need attention, including but not limited to below points.

Concerns on misuse: The foundation models that power AIGC have made it easier and cheaper to create deepfakes that are close to the original, posing additional risks and concerns. The misuse of these technologies could lead to the spread of fake news, hoaxes, harassment and misinformation, harm the reputations of individuals, or even break the law.

Vulnerability to poisoning attack: It would be a disaster if a foundation model is compromised. For example, a diffusion model with a hidden “backdoor” could carry out malicious actions when it encounters a specific trigger pattern during data generation [Chou et al., 2022]. This Trojan effect could cause catastrophic damage to downstream applications that depend on the compromised diffusion model.

Debate on whether AIGC will replace humans: The use of AIGC has faced criticism from those who fear that it will replace human jobs. Insider has listed several jobs that could potentially be replaced by ChatGPT, including coders, data analysts, legal assistants, traders, accountants, etc. Some artists worry that the wide use of image generation tools such as Stable Diffusion could eventually make human artists, photographers, models, cinematographers, and actors commercially uncompetitive [Heikkila ̈, 2022b].

Explainable AIGC: The black-box nature of foundation models can lead to unsatisfactory results. For example, it is frequently challenging to determine the information used to generate a model’s output, which makes biases occur within datasets. An explanation is a critical element in comprehending how and why AIGC creates these problems.

Responsible open-sourcing: As the code and models behind AIGC are not transparent to the public, and their downstream applications are diverse and may have complex societal impacts, it is challenging to determine the potential harms they may cause. Therefore, the need for responsible open-sourcing becomes critical in determining whether the benefits of AIGC outweigh its potential risks in specific use cases.

User feedback: Gathering user feedback is also an essential element of responsible AIGC. Companies such as OpenAI actively seek feedback from users to identify harmful outputs that could arise in real-world scenarios, as well as to uncover and mitigate novel risks. By involving users in the feedback loop, AIGC developers can better understand the potential consequences of their models and take corrective actions to minimize any negative impacts.

Consent, credit, and compensation to data owners or contributors: Many AIGC models are trained on datasets without obtaining consent or providing credit or compensation to the original data contributors. To avoid negative impacts, AIGC companies should obtain consent from data contributors and take proactive measures before training their models. Failure to do so could result in lawsuits against AIGC.

Environment impact caused by training AIGC models: The massive size of AIGC models, which can have billions or trillions of parameters, results in high environmental costs for both model training and operation. For example, GPT-3 has 175 billion parameters and requires significant computing resources to train. GPT-4 might have even more parameters than its predecessor and is expected to leave a more significant carbon emission. Failing to take appropriate steps to mitigate the substantial energy costs of AIGC could lead to irreparable damage to our planet.

Conclusion

Although AIGC is still in its infancy, it is rapidly expanding and will remain active for the foreseeable future. Current AIGC technologies only scratch the surface of what AI can create in the field of art. While AIGC offers many opportunities, it also carries significant risks. In this work, we provide a synopsis of both current and potential threats in recent AIGC models, so that both the users and companies can be well aware of these risks and make the appropriate actions to mitigate them. It is important for companies to incorporate responsible AI practices throughout all AIGC-related projects.

All images unless otherwise noted are by the author.

Paper link:

Reference

[Rombach et al., 2022c] Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bjo ̈rn Ommer. Stable diffusion v1 model card. https://github.com/CompVis/stable-diffusion/blob/main/ Stable Diffusion v1 Model Card.md, 2022.

[Heikkila ̈, 2023] Melissa Heikkila ̈. Ai models spit out photos of real people and copyrighted images. https://www.technologyreview.com/2023/02/ 03/1067786/ai-models-spit-out-photos-of-real-people- and-copyrighted-images/, 2023.

[Butterick, 2022] Matthew Butterick. Github copilot investigation. https://githubcopilotinvestigation.com/, 2022.

[Butterick, 2023] Matthew Butterick. Stable diffusion litigation. https://stablediffusionlitigation.com, 2023.

[Somepalli et al., 2022] Gowthami Somepalli, Vasu Singla, Micah Goldblum, Jonas Geiping, and Tom Goldstein. Diffusion art or digital forgery? investigating data replication in diffusion models. arXiv preprint arXiv:2212.03860, 2022.

[He et al., 2022a] Xuanli He, Qiongkai Xu, Lingjuan Lyu, Fangzhao Wu, and Chenguang Wang. Protecting intellectual property of language generation apis with lexical watermark. AAAI, 2022.

[He et al., 2022b] Xuanli He, Qiongkai Xu, Yi Zeng, Lingjuan Lyu, Fangzhao Wu, Jiwei Li, and Ruoxi Jia. Cater: Intellectual property protection on text generation apis via conditional watermarks. Advances in Neural Information Processing Systems, 2022.

[Vincent, 2023] James Vincent. Getty images is suing the creators of ai art tool stable diffusion for scraping its content. https://www.theverge.com/2023/1/17/23558516/ ai-art-copyright-stable-diffusion-getty-images-lawsuit, 2023.

[Weidinger et al., 2021] Laura Weidinger, John Mellor, Maribeth Rauh, Conor Griffin, Jonathan Uesato, Po-Sen Huang, Myra Cheng, Mia Glaese, Borja Balle, Atoosa Kasirzadeh, et al. Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359, 2021.

[Heikkila ̈, 2022b] Melissa Heikkila ̈. This artist is dominating ai-generated art. and he’s not happy about it. https://www.technologyreview.com/2022/09/16/ 1059598/this-artist-is-dominating-ai-generated-art-and- hes-not-happy-about-it/, 2022.

[Chou et al., 2022] Sheng-Yen Chou, Pin-Yu Chen, and Tsung-Yi Ho. How to backdoor diffusion models? arXiv preprint arXiv:2212.05400, 2022.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – admin@technoblender.com. The content will be deleted within 24 hours.
Ai NewsContentgeneratedLingjuanLyumachine learningMARpathwayResponsibleTech News
Comments (0)
Add Comment