Thorn’s Safety by Design for Generative AI: 3-Month Progress Report on Civitai and Metaphysic
September 26, 2024
6 Minute Read
Three months ago, some of the world’s most influential AI leaders made a groundbreaking commitment to protect children from the misuse of generative AI technologies.
In collaboration with Thorn and All Tech Is Human, Amazon, Anthropic, Civitai, Google, Meta, Metaphysic, Microsoft, Mistral AI, OpenAI, and Stability AI pledged to adopt Safety by Design principles to guard against the creation and spread of AI-generated child sexual abuse material (AIG-CSAM) and other sexual harms against children.
As part of their commitment, these companies agreed to transparently publish and share documentation of their progress in implementing these principles. This is a critical component of our overall three-pillar strategy for accountability: 1) publishing progress reports with insights from the committed companies (to support public awareness and pressure where necessary), 2) collaborating with standard setting institutions such as IEEE and NIST to scale the reach of these principles and mitigations (opening the door for third party auditing), and 3) engaging with policymakers such that they understand what is technically feasible and impactful in this space, to inform necessary legislation. Today, we’re sharing the first three-month progress report focusing on two companies: Civitai and Metaphysic.
Why now? The urgency of the moment
The need for this proactive response around generative AI safety has never been clearer. (In fact, our VP of Data Science, Dr. Rebecca Portnoff, discussed this with other leaders in the space on a panel at TrustCon this summer).
Generative AI technologies, while potentially helpful in many instances, also present profound risks to child safety when misused. Bad actors can now easily generate new abuse material, sexualize benign imagery of children, and scale grooming and sextortion efforts.
Our latest data shows that while the prevalence of photorealistic AIG-CSAM in communities dedicated to child sexual abuse remains small, it is growing. This material is increasingly photorealistic, with 82% of sampled images now appearing photorealistic, up from 66% in June 2023.
Further, 1 in 10 minors reported they knew of cases where their peers had generated nude imagery of other kids.
These trends continue to underscore the critical importance of the Safety by Design principles and the commitments made by AI industry leaders.
Now, let’s take a look at how Civitai and Metaphysic have progressed in implementing these principles over the past three months. We summarize that progress below – see the full report here – and note that all data reported below and in the full report was provided to Thorn by the respective companies, and was not independently verified by Thorn. For more information regarding data collection practices and use rights, please see the full report here.
Civitai: Three-Month Progress
Civitai, a platform for hosting third-party generative AI models, reports that they have made progress in safeguarding against abusive content and responsible model hosting.
For their cloud-hosted models, they implemented a multi-layered moderation approach that combines automated filters and human review to screen content generation requests and media inputs. This system uses keyword detection and AI models to flag potentially violating input prompts and images (surfacing prevention messaging where appropriate), with all flagged content undergoing human review. They also maintain an internal hash database of previously removed images to prevent re-upload.
In addition, confirmed instances of child sexual abuse material are now reported to the National Center for Missing and Exploited Children (NCMEC) (with the generative AI flag where relevant). They extend that similar multi-layered approach to moderate all the uploaded media hosted on their platform.
Civitai also established terms of service prohibiting exploitative material, employed new technologies like semi-permeable membranes to mitigate the generation of harmful content in their cloud-hosted models, and created pathways for users to report concerning content (both content generated by cloud-hosted models, and more generally any uploaded media hosted on their platform). They also established a system to report and remove third-party models that violate their child safety policies, adding these models to internal hashlists so that attempts to re-upload those models can be blocked.
There remain some areas that require more progress to meet their commitments.
Notably, Civitai will need to implement hashing and matching against verified CSAM lists across their interventions for more robust detection, and expand prevention messaging to their search functionality. They will also need to develop strategies to assess the output content generated by their cloud-hosted models and incorporate content provenance into that generated content. They will also need to assess newly uploaded models for child safety violations before hosting those models. Similarly, they will need to incorporate systematic, retroactive assessments of currently hosted models to meet their commitments. They will also need to incorporate a child safety section for model cards into their platform, such that each model has associated information outlining the steps taken to prioritize child safety in the development of the model.
Further, they will need to determine a strategy to prevent the upload and use of nudifying services and models hosted on their site, for nudifying/sexualizing benign depictions of children.
For more detail on how Civitai has made progress on their commitments, and where there still remains work to be done, see the full report here.
Metaphysic: Three-month progress
Metaphysic, which develops first-party generative AI models to create photorealistic generative AI video content for film studios, also reports that they have made progress to safeguard their AI development process and ensure responsible model hosting.
The company sources data directly from film studios with contractual warranties against illegal material. They also require the studios to obtain consent from the individuals depicted in the data before sharing the data. This approach is intended to provide a legal and ethical foundation for ML/AI training, reducing the risk of inadvertently using exploitative content.
Metaphysic also employs human moderators to review all received data and generated media. They also have implemented ML/AI tools to detect and separate sexual content from depictions of children in training data, helping prevent inappropriate associations. Furthermore, Metaphysic has adopted the Coalition for Content Provenance and Authenticity (C2PA) standard across their data pipelines, to aid in the verification of AI-generated content origin and authenticity.
Metaphysic’s strategy for responsibly deploying their models focuses on controlling access to their generative models (limiting access to just Metaphysic employees). They also have processes in place to receive regular feedback from their customers, including any feedback related to content that may contain illegal or unethical material. Further, their internal processes have been updated such that all datasets and model cards now contain a child safety section detailing the steps taken during model development to prioritize child safety.
There remain some areas that require more progress to meet their commitments. Metaphysic will need to incorporate consistent red-teaming and model assessment for child safety violations in their model development process. This will involve systematic stress testing of their models to identify potential vulnerabilities that bad actors might exploit.
Additionally, while C2PA has built a strong technology foundation for companies to adopt, it was not built with adversarial misuse in mind. In order to meet this commitment, Metaphysic will need to engage with C2PA to better understand the ways in which C2PA is and is not robust to adversarial misuse, and – if necessary – support development and adoption of solutions that are sufficiently robust.
For more detail on how Metaphysic has made progress on their commitments, and where there still remains work to be done, see the full report here.
Related Articles
Stay up to date
Want Thorn articles and news delivered directly to your inbox? Subscribe to our newsletter.