Thorn’s Safety by Design for Generative AI: Progress Reports

March 20, 2025

6 Minute Read

Safety By Design: Industry Commitments

As part of Thorn and All Tech Is Human’s Safety By Design initiative, some of the world’s leading AI companies have made a significant commitment to protect children from the misuse of generative AI technologies.

The organizations—including Amazon, Anthropic, Civitai, Google, Invoke, Meta, Metaphysic, Microsoft, Mistral AI, OpenAI and Stability AI—have all pledged to adopt the campaign principles, which aim to prevent the creation and spread of AI-generated child sexual abuse material (AIG-CSAM) and other sexual harms against children.

As part of their commitments, these companies will continue to transparently publish and share documentation of their progress in implementing these principles.

This is a critical component of our overall three-pillar strategy for accountability:

Publishing progress reports with insights from the committed companies (to support public awareness and pressure where necessary)
Collaborating with standard setting institutions to scale the reach of these principles and mitigations (opening the door for third party auditing)
Engaging with policymakers such that they understand what is technically feasible and impactful in this space, to inform necessary legislation.

Progress Reports

We are moving to an annual reporting cadence for all companies who choose to participate. The first annual progress report can be downloaded here, which includes additional links to some companies’ independent transparency reports. Summarized insights on progress and learnings from the most recent annual cycle, for all committed companies, can be found here.

Note that all data reported below and in the full report was provided to Thorn by the respective companies, and was not independently verified by Thorn. For more information regarding data collection practices and use rights, please see the full report.

January 2025: Civitai

Civitai has introduced new enforcement measures at the output stage of content generation, using machine learning models to detect AI-generated images that may contain minors or explicit content. These updates expand on its prior input-level detection efforts. Since joining into the commitments, Civitai reports they have:

Detected over 252,000 violative prompts at the input stage.
Retroactively removed 183 models optimized for generating AIG-CSAM.
Updated policies to explicitly prohibit nudifying AI workflows and incorporated manual moderation to enforce this policy.
Banned 17,436 user accounts due to policy violations.
Filed 178 reports with NCMEC for confirmed AIG-CSAM instances.

Areas requiring progress remain, including:

Expanding moderation using hashing against verified CSAM lists and prevention messaging.
Incorporating content provenance for cloud-hosted models.
Implementing pre-hosting assessments for new models and retroactively assessing current models for child safety violations.
Adding child safety information to model cards and developing strategies to prevent the use and distribution of nudifying services.

January 2025: Invoke

Invoke has transitioned from third-party monitoring tools to an internal prompt monitoring system for improved detection and enforcement, and published guidance for customers on reporting abusive content found. Since joining into the commitments, Invoke reports they have:

Detected and reported 2,822 instances of violative prompts to NCMEC.
Published new customer guidance on reporting abusive content.
Invested $224,000 in research and development for new protective tools.
Enhanced detection mechanisms to prevent banned users from accessing the platform through secondary accounts.

Areas requiring progress remain, including:

Implementing CSAM detection at inputs.
Incorporating comprehensive output review.
Expanding user reporting functionality for its OSS offering.

January 2025: Metaphysic

Metaphysic reports no additional progress beyond the measures outlined in its prior update.

Maintains 100% dataset auditing with no detected CSAM.
Ensures all generative models incorporate content provenance.
Conducted two red-teaming exercises in preparation for full implementation in 2025.
Continues to limit model access to internal employees only.

Areas requiring progress remain consistent with October’s report, including the need to implement systematic model assessment, red teaming, and engage in industry efforts to strengthen provenance measures against adversarial misuse.

October 2024: Civitai

Civitai reports no additional progress since their July 2024 report, citing other work priorities. Their metrics show continued moderation efforts:

Detected over 120,000 violative prompts, with 100,000 indicating attempts to create AIG-CSAM
Prevented over 400 attempts to upload models optimized for AIG-CSAM
Removed approximately 5-10 problematic models per month
Detected and reported 2 instances of CSAM and over 100 instances of AIG-CSAM to NCMEC

Areas requiring progress remain consistent with July’s report, including the need to retroactively assess third-party models currently hosted on their platform.

October 2024: Metaphysic

Metaphysic reports no additional progress since their July 2024 report, citing other work priorities related to being in the middle of a funding process. Their metrics show continued maintenance of their existing safeguards:

100% of datasets audited and updated
No CSAM detected in their datasets
100% of models include content provenance
Monthly assessment of mitigations
Continued use of human moderators for content review

Areas requiring progress remain consistent with July’s report, including the need to implement systematic model assessment and red teaming.

October 2024: Invoke

As a new participant since July 2024, Invoke reports initial progress:

Implemented prompt monitoring using third-party tools (askvera.io)
Detected 73 instances of violative prompts, all reported to NCMEC
Invested $100,000 in R&D for protective tools
Incorporated prevention messaging directing users to redirection programs
Utilizes Thorn’s hashlist to block problematic models

Areas requiring progress include implementing CSAM detection at inputs, incorporating comprehensive output review, and expanding user reporting functionality for their OSS offering.

July 2024: Civitai

Civitai, a platform for hosting third-party generative AI models, reports that they have made progress in safeguarding against abusive content and responsible model hosting:

Uses multi-layered moderation with automated filters and human review for prompts, content and media uploads. Maintains an internal hash database to prevent re-upload of removed images and removed models that violate child safety policies.
Reports confirmed child sexual abuse material (CSAM) to NCMEC, noting generative AI flags.
Established terms of service banning exploitative material and models, and created reporting pathways for users.

However, there remain some areas for Civitai that require more progress to meet their commitments:

Expand moderation using hashing against verified CSAM lists and prevention messaging.
Assess output content and incorporate content provenance features.
Implement pre-hosting assessments for new models and retroactively assess current models for child safety violations.
Add child safety information to model cards and develop strategies to prevent the use of nudifying services.

July 2024: Metaphysic

Sources data from film studios with legal warranties and required consent from depicted individuals.
Employs human moderators and AI tools to review data and separate sexual content from depictions of children.
Adopts C2PA standard to label AI-generated content.
Limits model access to employees and has processes for customer feedback on content.
Updates datasets and model cards to include sections detailing child safety measures during development.

However, there remain some areas for Metaphysic that require more progress to meet their commitments:

Incorporate systematic model assessment and red teaming of their generative AI models for child safety violations.
Engage with C2PA to understand the ways in which C2PA is and is not robust to adversarial misuse, and – if necessary – support development and adoption of solutions that are sufficiently robust.

Get the latest delivered to your inbox

View All Blog Articles

6 min read

News

April 3 could create a dangerous gap in child safety across Europe

April 3, 2026 update: The legal exemption has expired, leaving a crucial child safety gap in Europe. To find out more about what this means for platform safety, please visit […]

Read Article

5 min read

News

A new child safety gap in Europe – and why it matters everywhere

There’s now a child safety gap in Europe. Here’s what that means. On April 3, 2026, the legal basis that allowed platforms to detect child sexual abuse material (CSAM) in […]

Read Article

8 min read

News

Thorn joins global coalition condemning the EU’s CSAM detection gap

On April 3, 2026, the legal basis allowing platforms to detect child sexual abuse material (CSAM) in Europe will expire. Without it, online service providers across the EU can no […]

Read Article

Thorn’s Safety by Design for Generative AI: Progress Reports

Safety By Design: Industry Commitments

Progress Reports

January 2025: Civitai

January 2025: Invoke

January 2025: Metaphysic

October 2024: Civitai

October 2024: Metaphysic

October 2024: Invoke

July 2024: Civitai

July 2024: Metaphysic

Get the latest delivered to your inbox

Related Articles

April 3 could create a dangerous gap in child safety across Europe

A new child safety gap in Europe – and why it matters everywhere

Thorn joins global coalition condemning the EU’s CSAM detection gap