Thorn’s Safety by Design for Generative AI: Progress Reports
March 20, 2025
6 Minute Read
Safety By Design: Industry Commitments
As part of Thorn and All Tech Is Human’s Safety By Design initiative, some of the world’s leading AI companies have made a significant commitment to protect children from the misuse of generative AI technologies.
The organizations—including Amazon, Anthropic, Civitai, Google, Invoke, Meta, Metaphysic, Microsoft, Mistral AI, OpenAI and Stability AI—have all pledged to adopt the campaign principles, which aim to prevent the creation and spread of AI-generated child sexual abuse material (AIG-CSAM) and other sexual harms against children.
As part of their commitments, these companies will continue to transparently publish and share documentation of their progress in implementing these principles.
This is a critical component of our overall three-pillar strategy for accountability:
- Publishing progress reports with insights from the committed companies (to support public awareness and pressure where necessary)
- Collaborating with standard setting institutions to scale the reach of these principles and mitigations (opening the door for third party auditing)
- Engaging with policymakers such that they understand what is technically feasible and impactful in this space, to inform necessary legislation.
Three-Month Progress Reports
Some participating companies have committed to reporting their progress on a three-month cadence (Civitai, Invoke, and Metaphysic), while others will report annually. Below are the latest updates from the companies reporting quarterly. You can also download the latest three-month progress report in full here.
January 2025: Civitai
Civitai has introduced new enforcement measures at the output stage of content generation, using machine learning models to detect AI-generated images that may contain minors or explicit content. These updates expand on its prior input-level detection efforts. Since joining into the commitments, Civitai reports they have:
- Detected over 252,000 violative prompts at the input stage.
- Retroactively removed 183 models optimized for generating AIG-CSAM.
- Updated policies to explicitly prohibit nudifying AI workflows and incorporated manual moderation to enforce this policy.
- Banned 17,436 user accounts due to policy violations.
- Filed 178 reports with NCMEC for confirmed AIG-CSAM instances.
Areas requiring progress remain, including:
- Expanding moderation using hashing against verified CSAM lists and prevention messaging.
- Incorporating content provenance for cloud-hosted models.
- Implementing pre-hosting assessments for new models and retroactively assessing current models for child safety violations.
- Adding child safety information to model cards and developing strategies to prevent the use and distribution of nudifying services.
January 2025: Invoke
Invoke has transitioned from third-party monitoring tools to an internal prompt monitoring system for improved detection and enforcement, and published guidance for customers on reporting abusive content found. Since joining into the commitments, Invoke reports they have:
- Detected and reported 2,822 instances of violative prompts to NCMEC.
- Published new customer guidance on reporting abusive content.
- Invested $224,000 in research and development for new protective tools.
- Enhanced detection mechanisms to prevent banned users from accessing the platform through secondary accounts.
Areas requiring progress remain, including:
- Implementing CSAM detection at inputs.
- Incorporating comprehensive output review.
- Expanding user reporting functionality for its OSS offering.
January 2025: Metaphysic
Metaphysic reports no additional progress beyond the measures outlined in its prior update.
- Maintains 100% dataset auditing with no detected CSAM.
- Ensures all generative models incorporate content provenance.
- Conducted two red-teaming exercises in preparation for full implementation in 2025.
- Continues to limit model access to internal employees only.
Areas requiring progress remain consistent with October’s report, including the need to implement systematic model assessment, red teaming, and engage in industry efforts to strengthen provenance measures against adversarial misuse.
October 2024: Civitai
Civitai reports no additional progress since their July 2024 report, citing other work priorities. Their metrics show continued moderation efforts:
- Detected over 120,000 violative prompts, with 100,000 indicating attempts to create AIG-CSAM
- Prevented over 400 attempts to upload models optimized for AIG-CSAM
- Removed approximately 5-10 problematic models per month
- Detected and reported 2 instances of CSAM and over 100 instances of AIG-CSAM to NCMEC
Areas requiring progress remain consistent with July’s report, including the need to retroactively assess third-party models currently hosted on their platform.
October 2024: Metaphysic
Metaphysic reports no additional progress since their July 2024 report, citing other work priorities related to being in the middle of a funding process. Their metrics show continued maintenance of their existing safeguards:
- 100% of datasets audited and updated
- No CSAM detected in their datasets
- 100% of models include content provenance
- Monthly assessment of mitigations
- Continued use of human moderators for content review
Areas requiring progress remain consistent with July’s report, including the need to implement systematic model assessment and red teaming.
October 2024: Invoke
As a new participant since July 2024, Invoke reports initial progress:
- Implemented prompt monitoring using third-party tools (askvera.io)
- Detected 73 instances of violative prompts, all reported to NCMEC
- Invested $100,000 in R&D for protective tools
- Incorporated prevention messaging directing users to redirection programs
- Utilizes Thorn’s hashlist to block problematic models
Areas requiring progress include implementing CSAM detection at inputs, incorporating comprehensive output review, and expanding user reporting functionality for their OSS offering.
July 2024: Civitai
Civitai, a platform for hosting third-party generative AI models, reports that they have made progress in safeguarding against abusive content and responsible model hosting:
- Uses multi-layered moderation with automated filters and human review for prompts, content and media uploads. Maintains an internal hash database to prevent re-upload of removed images and removed models that violate child safety policies.
- Reports confirmed child sexual abuse material (CSAM) to NCMEC, noting generative AI flags.
- Established terms of service banning exploitative material and models, and created reporting pathways for users.
However, there remain some areas for Civitai that require more progress to meet their commitments:
- Expand moderation using hashing against verified CSAM lists and prevention messaging.
- Assess output content and incorporate content provenance features.
- Implement pre-hosting assessments for new models and retroactively assess current models for child safety violations.
- Add child safety information to model cards and develop strategies to prevent the use of nudifying services.
July 2024: Metaphysic
- Sources data from film studios with legal warranties and required consent from depicted individuals.
- Employs human moderators and AI tools to review data and separate sexual content from depictions of children.
- Adopts C2PA standard to label AI-generated content.
- Limits model access to employees and has processes for customer feedback on content.
- Updates datasets and model cards to include sections detailing child safety measures during development.
However, there remain some areas for Metaphysic that require more progress to meet their commitments:
- Incorporate systematic model assessment and red teaming of their generative AI models for child safety violations.
- Engage with C2PA to understand the ways in which C2PA is and is not robust to adversarial misuse, and – if necessary – support development and adoption of solutions that are sufficiently robust.
Annual Progress Reports
Several companies have committed to reporting on an annual cadence, with their first reports expected in April 2025 – one year after the Safety By Design commitments were launched. These companies include Amazon, Anthropic, Google, Meta, Microsoft, Mistral AI, OpenAI, and Stability AI. Their comprehensive reports will provide insights into how they have implemented and maintained the Safety By Design principles across their organizations and technologies over the first full year of commitment.