About Digital Harms Tracker

The Digital Harms Tracker is a comprehensive database documenting verified cases where individuals have been harmed by digital platforms and AI systems. We track incidents across categories such as misinformation and disinformation, child safety and exploitation (sextortion, grooming, dangerous challenges), self-harm and suicide, addiction and mental health harms, algorithmic discrimination, fraud and financial harm, privacy and surveillance violations, and autonomous systems harm.

Our mission is to provide NGOs, policymakers, and research institutions with structured, verifiable data about the real-world harms caused by technology platforms and systems. By making this evidence accessible, searchable, and exportable, we aim to strengthen the factual foundation for digital safety regulation, academic research, and public interest advocacy.

This project is not advocacy. We do not take positions on policy solutions, but rather document what has happened, where, and to whom, based on credible reporting and official records. The data is freely available for use in policy briefs, academic papers, litigation support, and public awareness campaigns.

We are publishing this database while it is still being built. That is a deliberate choice, we believe an imperfect public record is more useful than a perfect private one, but it means users should understand what is settled, what is in progress, and what is still being worked out.

Incident Definition Still Being Refined

We have a working definition of what counts as an incident, but we are still stress-testing it against edge cases. Some entries in the database were classified under earlier, looser criteria. As our definition firms up, we will retroactively review and reclassify where needed. The Methodology section below reflects our current working standard.

AI Pipelines Still Being Built Out

Our automated collection and classification pipeline is operational but not yet at full capacity. Some stages, particularly deduplication, entity resolution for new actors, and harm-type tagging, are still being tuned. Human review currently compensates for pipeline gaps, but coverage will improve as the automation matures.

Policy Data Still Being Backfilled

Live monitoring of federal and state legislation began in early 2026. Historical policy data, bills, hearings, and regulatory actions predating that, is being added manually and is not yet comprehensive. Gaps in the policy record do not mean nothing happened; they may simply mean we have not reached that period yet.

Incident Backfill to 2020 In Progress

Our goal is to have systematic coverage of incidents back to 2020. We are currently working out the methodology for structured historical backfill, identifying the right sources, resolving entities against our existing actor and platform records, and maintaining date accuracy for events reported long after they occurred. This work is ongoing.

If you spot an error, a gap, or a misclassification, please use the tip line to let us know. Every correction improves the record for everyone who relies on it.

Working document v0.1 · Digital Harms Tracker

What We Mean by "Harm"

A digital harm is an adverse outcome, experienced by an identifiable person, defined group, or institution, that is caused or materially enabled by the design, operation, or use of a digital platform, algorithmic system, or AI technology.

Drawing on established academic and regulatory frameworks, the Tracker recognizes harm across six dimensions of impact:

Impact TypeDescriptionSource Tradition
PhysicalBodily injury or deathAgrafiotis et al. 2018; Citron & Solove 2022
PsychologicalEmotional distress, trauma, mental health deteriorationScheuerman et al. 2021; Citron & Solove 2022
EconomicFinancial loss, property damage, livelihood disruptionAgrafiotis et al. 2018; OECD 2024
ReputationalDamage to standing, dignity, or public perceptionSolove 2006; Agrafiotis et al. 2018
AutonomyLoss of agency, manipulation, coercion, or denial of self-determinationCitron & Solove 2022; EU DSA Art. 34
DiscriminatoryUnequal treatment, denial of opportunity, or reinforcement of unjust hierarchies based on identityShelby et al. 2023; Citron & Solove 2022

A single incident may produce harm across multiple dimensions simultaneously.

What We Mean by "Incident"

An incident is a discrete, documented event in which a digital platform or AI system caused or materially enabled real-world harm to an identifiable harm recipient. It is the foundational unit of the Tracker's evidence base.

The Three-Part Test

Every incident must satisfy all three of the following criteria. If any one is absent, the record is not an incident.

1. A discrete real-world event

Something that happened: a specific occurrence with identifiable circumstances, not a trend, pattern, or ongoing condition described in the abstract. The event must be situated in time, even if the exact date requires estimation.

Qualifies: A 14-year-old in the UK died by suicide after prolonged exposure to self-harm content on Instagram.

Does not qualify: Teen suicide rates are rising due to social media.

2. An identifiable harm recipient

The person, organization, or group who experienced the harm must be identifiable through credible reporting. Three categories qualify:

  • Named individual: A specific person identified by name or sufficiently detailed description in credible reporting (e.g., Molly Russell; a Tennessee grandmother identified as Angela Lipps).
  • Named organization or institution: A specific entity that suffered documented harm (e.g., the Enschede municipality's welfare system; Horizon Healthcare Services).
  • Defined group constituted by the harm mechanism itself: A group whose membership is defined by the platform action or algorithmic process that caused the harm. The group must be bounded by the incident, not by pre-existing demographics alone.

Qualifies: Black applicants screened out by Workday's AI hiring tool between 2020–2023. The algorithm created the affected class through its discriminatory function; membership is documented through the litigation record.

Does not qualify: Teenage girls on Instagram. This is a demographic category, not a group constituted by a specific harmful platform action. It becomes valid only when tied to a specific mechanism, time period, and documented impact, for example, underage users served eating disorder content by Instagram's recommendation algorithm in the period documented by the 2021 Wall Street Journal investigation.

3. Platform causation

The digital platform or AI system must be a proximate cause or necessary enabler of the harm. The platform must be more than incidentally present, it must be a but-for cause, meaning the specific harm would not have occurred, or would not have occurred in this form or at this scale, without the platform's involvement.

Qualifies: Sextortion conducted via Instagram DMs targeting a specific minor. The platform's messaging infrastructure was the necessary vehicle.

Does not qualify: A person described their depression on Twitter. The platform was present but not causally implicated in the depression.

What We Mean by "Platform Mechanism"

Following the World Economic Forum's Typology of Online Harms (2023), every incident is tagged with the mechanism through which harm occurred:

MechanismDescriptionExamples
ContentHarm from exposure to problematic material produced, distributed, or amplified by the platformAlgorithmic recommendation of self-harm content; AI-generated CSAM; deepfake disinformation
ContactHarm from interactions with other users enabled by platform infrastructureGrooming via DMs; sextortion; cyberbullying campaigns
ConductHarm from behaviors enabled or amplified by platform design and affordancesCoordinated harassment; platform-facilitated fraud; unauthorized surveillance via data collection

What Does Not Qualify as an Incident

The following are not incidents under this definition, regardless of their importance or newsworthiness. They belong elsewhere in the database or are outside its scope.

  • Policy and regulatory actions: Legislation, regulation, executive orders, court rulings → policies collection
  • Litigation events: Lawsuits, enforcement actions, AG investigations → litigation collection (the underlying harm they describe may qualify separately)
  • Company-knew accountability stories: Platform awareness of harm patterns without a specific documented victim
  • Research and audits: Academic studies, algorithmic audits, expert reviews (may inform incident records but are not incidents themselves)
  • Trend and pattern reporting: Sextortion is rising or deepfake fraud is increasing, without a specific victim event
  • Opinion, analysis, and commentary: Editorials, explainers, advocacy pieces
  • Societal and democratic harms without a discrete event: Erosion of trust, polarization, epistemic degradation (these are real and important harms recognized in the literature, but they resist the incident model; they emerge from the connections between incidents, policies, and litigation in the database)

A Note on Societal Harm

Academic literature and regulation recognize a category of social system and societal harms. Shelby et al. (2023) names it as a standalone harm category, with sub-themes including erosion of democracy, election interference, and information harms. The EU Digital Services Act (Recital 82, Article 34) requires platforms to assess systemic risks to democratic processes, civic discourse, and public security. Digital Action's taxonomy arrives at the same substantive concern through a different route: it treats societal and democratic damage as cumulative effects of its five core harm types — disinformation, hate speech, harassment, censorship, and privacy violations — rather than a standalone category.

Together, these frameworks confirm that harms to democratic processes, civic discourse, and information ecosystems are real, well-documented, and central to the policy landscape the Tracker serves. Where we refer to effects on institutional trust, we use the term as a recognized downstream consequence, not a formally named harm category in any of the three sources.

However, societal harms are diffuse by nature, they lack discrete events, identifiable victims, and clear start dates. The Tracker captures them not as incidents but as emergent patterns visible through the data: clusters of incidents linked to the same platform, the same harm type, and the same jurisdiction reveal systemic harm that no individual record could express alone. The junction tables connecting incidents to policies and litigation are where societal harm becomes visible.

This is a deliberate architectural choice: individual incidents stay evidentiarily tight, while systemic harm emerges from the structure of the database itself.

Key Sources

  • Agrafiotis, I. et al. (2018). "A Taxonomy of Cyber-Harms." Journal of Cybersecurity, 4(1). Oxford.
  • Citron, D.K. & Solove, D.J. (2022). "Privacy Harms." Boston University Law Review, 102(3), 793.
  • OECD (2024). "Defining AI Incidents and Related Terms." OECD Artificial Intelligence Papers, No. 16.
  • Scheuerman, M. et al. (2021). "A Framework of Severity for Harmful Content Online." arXiv:2108.04401.
  • Shelby, R. et al. (2023). "Sociotechnical Harms of Algorithmic Systems: Scoping a Taxonomy for Harm Reduction." AAAI/ACM AIES 2023.
  • Solove, D.J. (2006). "A Taxonomy of Privacy." University of Pennsylvania Law Review, 154(3), 477.
  • World Economic Forum (2023). "Typology of Online Harms." Global Coalition for Digital Safety.

Incidents vs. Articles: The database tracks incidents, distinct events where someone was harmed by a digital platform or AI system. A single incident may be covered by multiple news articles from different sources. Our system links related articles together under one incident record, so the database reflects unique events rather than volume of media coverage.

Data Collection: Our data collection methodology has evolved to combine historical research with real-time monitoring:

  • Pre-2026 Historical Data: Incidents occurring before January 1, 2026 were identified and catalogued through a combination of AI-powered research agents and human coding with oversight. This systematic review of historical sources ensures comprehensive coverage of documented digital harms.
  • 2026 Forward, Live Monitoring: Starting March 2026, incidents are collected automatically every hour from global RSS feeds covering major news outlets, technology publications, and investigative journalism sources. This real-time monitoring ensures rapid identification of emerging incidents.

Data Sources: We monitor RSS feeds from major news organizations and investigative journalism outlets. Every incident in our database is linked to the original reporting, so users can verify the information directly.

Verification: We only include incidents that have been reported by credible sources with specific details about what happened, when, and to whom. We do not include rumors, unverified claims, or incidents lacking sufficient documentation. When a source article does not include a specific date for an event, we conduct additional research to establish the most accurate date possible. All AI-identified incidents undergo human review before publication.

Categorization: Each incident is classified into one of eight harm domains (Misinfo & Disinfo, Child Safety, Self-Harm & Suicide, Addiction & Mental Health, Algorithmic Discrimination, Fraud & Financial, Privacy & Surveillance, Autonomous Systems), tagged with specific harm types from a controlled taxonomy, and linked to the platforms and companies involved. This structured approach enables systematic analysis and pattern identification across time, platforms, and harm types.

Scope: Incidents are tracked globally. The policy tracker currently focuses on United States federal and state legislation, with plans to expand to international jurisdictions in the future.

Updates: The database is updated continuously as new incidents are reported. We also update existing entries when new information becomes available, such as legal outcomes or platform responses.

Harm Types We Track

Each incident is classified into one of eight harm domains and tagged with specific harm types from the taxonomy below. This controlled vocabulary ensures consistent categorization and enables filtering and analysis by harm type.

Misinformation & Disinformation

Misinformation, Disinformation, Synthetic Media, Algorithmic Amplification

Child Safety & Exploitation

Sextortion, CSAM, Grooming, Trafficking, Dangerous Challenge, Drug Facilitated Harm

Self-Harm & Suicide

Suicide, Self-Harm, Chatbot Harm

Addiction & Mental Health

Eating Disorder, Addiction, Cyberbullying, Harassment

Algorithmic Discrimination

Wrongful Arrest, Discrimination, Hiring Bias

Fraud & Financial Harm

Deepfake Fraud, Voice Cloning Fraud, AI-Powered Financial Fraud

Privacy & Surveillance

Deepfake NCII, Non-Consensual Imagery, Unauthorized Surveillance

Autonomous Systems Harm

Autonomous Vehicle, Medical AI Error

Entity Resolution and Linking

The Digital Harms Tracker distinguishes between three levels of actor in the digital harm ecosystem: legal entities (organizations), products and services (platforms), and the incidents connecting them. To support profile pages, cross-incident aggregation, and structured filtering, all three levels are linked relationally rather than stored as free text.

Actor (Organization) Resolution

The tracker maintains a dedicated actors collection covering 105 organizations including technology companies, government agencies, NGOs, media organizations, and military bodies. Each record is anchored to a Wikidata QID, the stable, dereferenceable identifier used by Wikidata's knowledge graph, which serves as the canonical reference for entity disambiguation.

The Wikidata QID was chosen as the resolution anchor for several reasons: it provides stable cross-references to other authoritative sources (SEC EDGAR, GLEIF, OpenCorporates); it stores canonical English labels and aliases that resolve common naming variants (e.g., “Facebook,” “Meta,” “Meta Platforms, Inc.” all resolve to Q380); and it is freely queryable without licensing constraints.

Platform Resolution

The platforms collection covers 80 distinct products and services. Each platform record is linked to its parent actor via a many-to-one relation, enabling bidirectional traversal: an actor profile can surface all associated platforms, and a platform record identifies its owning organization.

Incident Linking

Incidents are linked to actors and platforms through dedicated many-to-many junction tables. These junctions are the primary mechanism by which the tracker aggregates harm exposure per organization and per product.

Limitations

Entity resolution coverage is strongest for large technology companies and well-documented platforms. Coverage thins for smaller actors, regional entities, and stalkerware vendors, which are frequently absent from Wikidata or have sparse records. Approximately 25% of actor records carry no Wikidata QID and are maintained through manual curation.

How We Use AI

Our system uses AI to process large volumes of news articles efficiently while maintaining high accuracy through a multi-stage pipeline with human oversight:

Stage 1
Content Collection

Every hour, our system automatically monitors global RSS feeds from major news outlets and technology publications. New articles are collected, deduplicated, and queued for analysis.

Stage 2
Relevance Classification

AI models read each article and classify its relevance to digital harms, determining whether the article documents a specific incident where people were harmed by a digital platform, as opposed to general tech news.

Stage 3
Entity Extraction & Incident Creation

For relevant articles, AI performs Named Entity Recognition to extract structured data: platforms involved, companies responsible, harm types, victim details, dates, locations, and financial losses. Multiple articles about the same event are linked to a single incident record.

Stage 4
Human Review

All AI-classified incidents are reviewed by human analysts before being published. This ensures accuracy, merges duplicate incident reports, and catches edge cases the model might misclassify.

The Digital Harms Tracker is built and maintained by Gregory Maly, a social scientist by training and data practitioner by profession, with two decades applying technology to complex problems across conflict studies, global health, demography, and public policy. He currently serves as AI & Data Science Editor at The Outlaw Ocean Project. gamaly.github.io →

The tracker is not funded by any platform, company, or advocacy organization, and takes no positions on policy solutions. It is designed as shared infrastructure for parliamentary committees, university researchers, NGOs, and journalists who need structured, verifiable evidence about digital harms.

The tracker is a living project. Here is what we are actively working toward:

Litigation Tracker

Documenting civil and regulatory lawsuits tied to digital harms, case status, plaintiffs, claims, and outcomes linked to relevant incidents.

Lobby & Influence Data

Mapping corporate lobbying activity and political donations by the companies in our database, linked to the policy areas where their platforms appear in incidents.

Policy Voting Records

Tracking how senators, representatives, and state legislators vote on digital safety legislation, cross-referenced with the policy tracker.

Expanding Personal Privacy

Deepening coverage of data broker practices, surveillance-as-a-service, and consent violations, areas currently underrepresented relative to their real-world scale.

Additional Harm Deep-Dives

Structured analysis series on specific harm clusters, starting with election integrity and AI-generated disinformation, with dedicated filtering, timelines, and policy mapping.

Scope decisions are not value judgments, they reflect where we can maintain data quality and analytical focus. Our guiding principle is: we follow the harm, not the technology. If an incident occurs on a gaming platform or a crypto exchange and falls within one of our eight harm domains, we may include it. What we do not do is comprehensively catalog every incident in these adjacent spaces.

The exclusions below reflect areas where dedicated databases already exist, where the regulatory and actor landscape is sufficiently distinct, or where our current data pipeline has structural limitations.

Pure Cybersecurity Incidents

Data breaches caused by external attackers, malware, phishing, credential stuffing, are generally outside our scope. Established databases like the NIST National Vulnerability Database and CVE already catalog these. Where we do include breach-related incidents (such as the National Public Data bankruptcy or the Equifax settlement), it is because the harm resulted from a foreseeable failure of platform stewardship, a design or governance choice, rather than a purely technical intrusion. The line is not always clean, and we acknowledge the ambiguity.

Gaming Platforms (Comprehensively)

We do not attempt to catalog every harm arising from gaming, loot box mechanics, competitive disputes, or in-game monetization practices. However, when gaming platforms produce harms that fall squarely within our core domains, child safety, addiction, privacy, we do include them. The Fortnite addiction lawsuit and Epic Games COPPA settlement are examples of gaming incidents that belong in this database.

Cryptocurrency & Web3 (Comprehensively)

We do not comprehensively track fraud, rug pulls, or financial harms endemic to crypto exchanges and DeFi protocols. The actor set, regulatory framework, and harm dynamics are distinct enough to warrant their own resource. That said, an AI-powered crypto scam targeting consumers through a mainstream platform, or a crypto firm deploying manipulative design against vulnerable users, may fall within our fraud or mental health domains and would be considered on a case-by-case basis.

Workplace & Labor Platform Harms

Algorithmic management, gig economy exploitation, and workplace surveillance overlap with our algorithmic discrimination domain but represent a distinct research field with its own regulatory framework and advocacy community. We are not resourced to cover this thoroughly, and partial coverage risks creating a misleading picture. We are watching this space and may revisit.

State-Sponsored Information Operations

Government-backed disinformation campaigns, state censorship, and cyber warfare are geopolitical issues more than platform accountability issues. Our tracker focuses on harms caused by platforms and AI systems, their design choices, governance failures, and product decisions. Where a platform knowingly enables or fails to moderate state-sponsored content that causes harm, the platform's response (or inaction) may be the incident. The state actor itself is not our primary subject.

Non-English Media Sources

Our automated RSS pipeline currently monitors English-language sources. This means incidents documented only in local-language reporting, particularly across Southeast Asia, Latin America, the Middle East, and sub-Saharan Africa, are likely underrepresented. We acknowledge this is a significant limitation for a database with global ambitions, and expanding language coverage is a longer-term goal.

These boundaries are not fixed. As the database matures, our pipeline expands, and the harm landscape evolves, we will continuously re-evaluate what belongs in scope. If you believe an area is being handled inconsistently or that a boundary should be reconsidered, we welcome that feedback.