About Digital Harms Tracker
The Digital Harms Tracker is a comprehensive database documenting verified cases where individuals have been harmed by digital platforms and AI systems. We track incidents across categories such as misinformation and disinformation, child safety and exploitation (sextortion, grooming, dangerous challenges), self-harm and suicide, addiction and mental health harms, algorithmic discrimination, fraud and financial harm, privacy and surveillance violations, and autonomous systems harm.
Our mission is to provide NGOs, policymakers, and research institutions with structured, verifiable data about the real-world harms caused by technology platforms and systems. By making this evidence accessible, searchable, and exportable, we aim to strengthen the factual foundation for digital safety regulation, academic research, and public interest advocacy.
This project is not advocacy. We do not take positions on policy solutions, but rather document what has happened, where, and to whom, based on credible reporting and official records. The data is freely available for use in policy briefs, academic papers, litigation support, and public awareness campaigns.
We are publishing this database while it is still being built. That is a deliberate choice, we believe an imperfect public record is more useful than a perfect private one, but it means users should understand what is settled, what is in progress, and what is still being worked out.
We have a working definition of what counts as an incident, but we are still stress-testing it against edge cases. Some entries in the database were classified under earlier, looser criteria. As our definition firms up, we will retroactively review and reclassify where needed. The Methodology section below reflects our current working standard.
Our automated collection and classification pipeline is operational but not yet at full capacity. Some stages, particularly deduplication, entity resolution for new actors, and harm-type tagging, are still being tuned. Human review currently compensates for pipeline gaps, but coverage will improve as the automation matures.
Live monitoring of federal and state legislation began in early 2026. Historical policy data, bills, hearings, and regulatory actions predating that, is being added manually and is not yet comprehensive. Gaps in the policy record do not mean nothing happened; they may simply mean we have not reached that period yet.
Our goal is to have systematic coverage of incidents back to 2020. We are currently working out the methodology for structured historical backfill, identifying the right sources, resolving entities against our existing actor and platform records, and maintaining date accuracy for events reported long after they occurred. This work is ongoing.
If you spot an error, a gap, or a misclassification, please use the tip line to let us know. Every correction improves the record for everyone who relies on it.
Working document v0.1 · Digital Harms Tracker
What We Mean by "Harm"
A digital harm is an adverse outcome, experienced by an identifiable person, defined group, or institution, that is caused or materially enabled by the design, operation, or use of a digital platform, algorithmic system, or AI technology.
Drawing on established academic and regulatory frameworks, the Tracker recognizes harm across six dimensions of impact:
| Impact Type | Description | Source Tradition |
|---|---|---|
| Physical | Bodily injury or death | Agrafiotis et al. 2018; Citron & Solove 2022 |
| Psychological | Emotional distress, trauma, mental health deterioration | Scheuerman et al. 2021; Citron & Solove 2022 |
| Economic | Financial loss, property damage, livelihood disruption | Agrafiotis et al. 2018; OECD 2024 |
| Reputational | Damage to standing, dignity, or public perception | Solove 2006; Agrafiotis et al. 2018 |
| Autonomy | Loss of agency, manipulation, coercion, or denial of self-determination | Citron & Solove 2022; EU DSA Art. 34 |
| Discriminatory | Unequal treatment, denial of opportunity, or reinforcement of unjust hierarchies based on identity | Shelby et al. 2023; Citron & Solove 2022 |
A single incident may produce harm across multiple dimensions simultaneously.
What We Mean by "Incident"
An incident is a discrete, documented event in which a digital platform or AI system caused or materially enabled real-world harm to an identifiable harm recipient. It is the foundational unit of the Tracker's evidence base.
The Three-Part Test
Every incident must satisfy all three of the following criteria. If any one is absent, the record is not an incident.
1. A discrete real-world event
Something that happened: a specific occurrence with identifiable circumstances, not a trend, pattern, or ongoing condition described in the abstract. The event must be situated in time, even if the exact date requires estimation.
Qualifies: A 14-year-old in the UK died by suicide after prolonged exposure to self-harm content on Instagram.
Does not qualify: Teen suicide rates are rising due to social media.
2. An identifiable harm recipient
The person, organization, or group who experienced the harm must be identifiable through credible reporting. Three categories qualify:
- Named individual: A specific person identified by name or sufficiently detailed description in credible reporting (e.g., Molly Russell; a Tennessee grandmother identified as Angela Lipps).
- Named organization or institution: A specific entity that suffered documented harm (e.g., the Enschede municipality's welfare system; Horizon Healthcare Services).
- Defined group constituted by the harm mechanism itself: A group whose membership is defined by the platform action or algorithmic process that caused the harm. The group must be bounded by the incident, not by pre-existing demographics alone.
Qualifies: Black applicants screened out by Workday's AI hiring tool between 2020–2023. The algorithm created the affected class through its discriminatory function; membership is documented through the litigation record.
Does not qualify: Teenage girls on Instagram. This is a demographic category, not a group constituted by a specific harmful platform action. It becomes valid only when tied to a specific mechanism, time period, and documented impact, for example, underage users served eating disorder content by Instagram's recommendation algorithm in the period documented by the 2021 Wall Street Journal investigation.
3. Platform causation
The digital platform or AI system must be a proximate cause or necessary enabler of the harm. The platform must be more than incidentally present, it must be a but-for cause, meaning the specific harm would not have occurred, or would not have occurred in this form or at this scale, without the platform's involvement.
Qualifies: Sextortion conducted via Instagram DMs targeting a specific minor. The platform's messaging infrastructure was the necessary vehicle.
Does not qualify: A person described their depression on Twitter. The platform was present but not causally implicated in the depression.
What We Mean by "Platform Mechanism"
Following the World Economic Forum's Typology of Online Harms (2023), every incident is tagged with the mechanism through which harm occurred:
| Mechanism | Description | Examples |
|---|---|---|
| Content | Harm from exposure to problematic material produced, distributed, or amplified by the platform | Algorithmic recommendation of self-harm content; AI-generated CSAM; deepfake disinformation |
| Contact | Harm from interactions with other users enabled by platform infrastructure | Grooming via DMs; sextortion; cyberbullying campaigns |
| Conduct | Harm from behaviors enabled or amplified by platform design and affordances | Coordinated harassment; platform-facilitated fraud; unauthorized surveillance via data collection |
What Does Not Qualify as an Incident
The following are not incidents under this definition, regardless of their importance or newsworthiness. They belong elsewhere in the database or are outside its scope.
- Policy and regulatory actions: Legislation, regulation, executive orders, court rulings → policies collection
- Litigation events: Lawsuits, enforcement actions, AG investigations → litigation collection (the underlying harm they describe may qualify separately)
- Company-knew accountability stories: Platform awareness of harm patterns without a specific documented victim
- Research and audits: Academic studies, algorithmic audits, expert reviews (may inform incident records but are not incidents themselves)
- Trend and pattern reporting: Sextortion is rising or deepfake fraud is increasing, without a specific victim event
- Opinion, analysis, and commentary: Editorials, explainers, advocacy pieces
- Societal and democratic harms without a discrete event: Erosion of trust, polarization, epistemic degradation (these are real and important harms recognized in the literature, but they resist the incident model; they emerge from the connections between incidents, policies, and litigation in the database)
A Note on Societal Harm
Academic literature and regulation recognize a category of social system and societal harms. Shelby et al. (2023) names it as a standalone harm category, with sub-themes including erosion of democracy, election interference, and information harms. The EU Digital Services Act (Recital 82, Article 34) requires platforms to assess systemic risks to democratic processes, civic discourse, and public security. Digital Action's taxonomy arrives at the same substantive concern through a different route: it treats societal and democratic damage as cumulative effects of its five core harm types — disinformation, hate speech, harassment, censorship, and privacy violations — rather than a standalone category.
Together, these frameworks confirm that harms to democratic processes, civic discourse, and information ecosystems are real, well-documented, and central to the policy landscape the Tracker serves. Where we refer to effects on institutional trust, we use the term as a recognized downstream consequence, not a formally named harm category in any of the three sources.
However, societal harms are diffuse by nature, they lack discrete events, identifiable victims, and clear start dates. The Tracker captures them not as incidents but as emergent patterns visible through the data: clusters of incidents linked to the same platform, the same harm type, and the same jurisdiction reveal systemic harm that no individual record could express alone. The junction tables connecting incidents to policies and litigation are where societal harm becomes visible.
This is a deliberate architectural choice: individual incidents stay evidentiarily tight, while systemic harm emerges from the structure of the database itself.
Key Sources
- Agrafiotis, I. et al. (2018). "A Taxonomy of Cyber-Harms." Journal of Cybersecurity, 4(1). Oxford.
- Citron, D.K. & Solove, D.J. (2022). "Privacy Harms." Boston University Law Review, 102(3), 793.
- OECD (2024). "Defining AI Incidents and Related Terms." OECD Artificial Intelligence Papers, No. 16.
- Scheuerman, M. et al. (2021). "A Framework of Severity for Harmful Content Online." arXiv:2108.04401.
- Shelby, R. et al. (2023). "Sociotechnical Harms of Algorithmic Systems: Scoping a Taxonomy for Harm Reduction." AAAI/ACM AIES 2023.
- Solove, D.J. (2006). "A Taxonomy of Privacy." University of Pennsylvania Law Review, 154(3), 477.
- World Economic Forum (2023). "Typology of Online Harms." Global Coalition for Digital Safety.
Incidents vs. Articles: The database tracks incidents, distinct events where someone was harmed by a digital platform or AI system. A single incident may be covered by multiple news articles from different sources. Our system links related articles together under one incident record, so the database reflects unique events rather than volume of media coverage.
Data Collection: Our data collection methodology has evolved to combine historical research with real-time monitoring:
- Pre-2026 Historical Data: Incidents occurring before January 1, 2026 were identified and catalogued through a combination of AI-powered research agents and human coding with oversight. This systematic review of historical sources ensures comprehensive coverage of documented digital harms.
- 2026 Forward, Live Monitoring: Starting March 2026, incidents are collected automatically every hour from global RSS feeds covering major news outlets, technology publications, and investigative journalism sources. This real-time monitoring ensures rapid identification of emerging incidents.
Data Sources: We monitor RSS feeds from major news organizations and investigative journalism outlets. Every incident in our database is linked to the original reporting, so users can verify the information directly.
Verification: We only include incidents that have been reported by credible sources with specific details about what happened, when, and to whom. We do not include rumors, unverified claims, or incidents lacking sufficient documentation. When a source article does not include a specific date for an event, we conduct additional research to establish the most accurate date possible. All AI-identified incidents undergo human review before publication.
Categorization: Each incident is classified into one of eight harm domains (Misinfo & Disinfo, Child Safety, Self-Harm & Suicide, Addiction & Mental Health, Algorithmic Discrimination, Fraud & Financial, Privacy & Surveillance, Autonomous Systems), tagged with specific harm types from a controlled taxonomy, and linked to the platforms and companies involved. This structured approach enables systematic analysis and pattern identification across time, platforms, and harm types.
Scope: Incidents are tracked globally. The policy tracker currently focuses on United States federal and state legislation, with plans to expand to international jurisdictions in the future.
Updates: The database is updated continuously as new incidents are reported. We also update existing entries when new information becomes available, such as legal outcomes or platform responses.
Harm Types We Track
Each incident is classified into one of eight harm domains and tagged with specific harm types from the taxonomy below. This controlled vocabulary ensures consistent categorization and enables filtering and analysis by harm type.
Misinformation, Disinformation, Synthetic Media, Algorithmic Amplification
Sextortion, CSAM, Grooming, Trafficking, Dangerous Challenge, Drug Facilitated Harm
Suicide, Self-Harm, Chatbot Harm
Eating Disorder, Addiction, Cyberbullying, Harassment
Wrongful Arrest, Discrimination, Hiring Bias
Deepfake Fraud, Voice Cloning Fraud, AI-Powered Financial Fraud
Deepfake NCII, Non-Consensual Imagery, Unauthorized Surveillance
Autonomous Vehicle, Medical AI Error
Entity Resolution and Linking
The Digital Harms Tracker distinguishes between three levels of actor in the digital harm ecosystem: legal entities (organizations), products and services (platforms), and the incidents connecting them. To support profile pages, cross-incident aggregation, and structured filtering, all three levels are linked relationally rather than stored as free text.
Actor (Organization) Resolution
The tracker maintains a dedicated actors collection covering 105 organizations including technology companies, government agencies, NGOs, media organizations, and military bodies. Each record is anchored to a Wikidata QID, the stable, dereferenceable identifier used by Wikidata's knowledge graph, which serves as the canonical reference for entity disambiguation.
The Wikidata QID was chosen as the resolution anchor for several reasons: it provides stable cross-references to other authoritative sources (SEC EDGAR, GLEIF, OpenCorporates); it stores canonical English labels and aliases that resolve common naming variants (e.g., “Facebook,” “Meta,” “Meta Platforms, Inc.” all resolve to Q380); and it is freely queryable without licensing constraints.
Platform Resolution
The platforms collection covers 80 distinct products and services. Each platform record is linked to its parent actor via a many-to-one relation, enabling bidirectional traversal: an actor profile can surface all associated platforms, and a platform record identifies its owning organization.
Incident Linking
Incidents are linked to actors and platforms through dedicated many-to-many junction tables. These junctions are the primary mechanism by which the tracker aggregates harm exposure per organization and per product.
Limitations
Entity resolution coverage is strongest for large technology companies and well-documented platforms. Coverage thins for smaller actors, regional entities, and stalkerware vendors, which are frequently absent from Wikidata or have sparse records. Approximately 25% of actor records carry no Wikidata QID and are maintained through manual curation.
How We Use AI
Our system uses AI to process large volumes of news articles efficiently while maintaining high accuracy through a multi-stage pipeline with human oversight:
Every hour, our system automatically monitors global RSS feeds from major news outlets and technology publications. New articles are collected, deduplicated, and queued for analysis.
AI models read each article and classify its relevance to digital harms, determining whether the article documents a specific incident where people were harmed by a digital platform, as opposed to general tech news.
For relevant articles, AI performs Named Entity Recognition to extract structured data: platforms involved, companies responsible, harm types, victim details, dates, locations, and financial losses. Multiple articles about the same event are linked to a single incident record.
All AI-classified incidents are reviewed by human analysts before being published. This ensures accuracy, merges duplicate incident reports, and catches edge cases the model might misclassify.
The Digital Harms Tracker is built and maintained by Gregory Maly, a social scientist by training and data practitioner by profession, with two decades applying technology to complex problems across conflict studies, global health, demography, and public policy. He currently serves as AI & Data Science Editor at The Outlaw Ocean Project. gamaly.github.io →
The tracker is not funded by any platform, company, or advocacy organization, and takes no positions on policy solutions. It is designed as shared infrastructure for parliamentary committees, university researchers, NGOs, and journalists who need structured, verifiable evidence about digital harms.
The tracker is a living project. Here is what we are actively working toward:
Documenting civil and regulatory lawsuits tied to digital harms, case status, plaintiffs, claims, and outcomes linked to relevant incidents.
Mapping corporate lobbying activity and political donations by the companies in our database, linked to the policy areas where their platforms appear in incidents.
Tracking how senators, representatives, and state legislators vote on digital safety legislation, cross-referenced with the policy tracker.
Deepening coverage of data broker practices, surveillance-as-a-service, and consent violations, areas currently underrepresented relative to their real-world scale.
Structured analysis series on specific harm clusters, starting with election integrity and AI-generated disinformation, with dedicated filtering, timelines, and policy mapping.
Scope decisions are not value judgments, they reflect where we can maintain data quality and analytical focus. Our guiding principle is: we follow the harm, not the technology. If an incident occurs on a gaming platform or a crypto exchange and falls within one of our eight harm domains, we may include it. What we do not do is comprehensively catalog every incident in these adjacent spaces.
The exclusions below reflect areas where dedicated databases already exist, where the regulatory and actor landscape is sufficiently distinct, or where our current data pipeline has structural limitations.
Data breaches caused by external attackers, malware, phishing, credential stuffing, are generally outside our scope. Established databases like the NIST National Vulnerability Database and CVE already catalog these. Where we do include breach-related incidents (such as the National Public Data bankruptcy or the Equifax settlement), it is because the harm resulted from a foreseeable failure of platform stewardship, a design or governance choice, rather than a purely technical intrusion. The line is not always clean, and we acknowledge the ambiguity.
We do not attempt to catalog every harm arising from gaming, loot box mechanics, competitive disputes, or in-game monetization practices. However, when gaming platforms produce harms that fall squarely within our core domains, child safety, addiction, privacy, we do include them. The Fortnite addiction lawsuit and Epic Games COPPA settlement are examples of gaming incidents that belong in this database.
We do not comprehensively track fraud, rug pulls, or financial harms endemic to crypto exchanges and DeFi protocols. The actor set, regulatory framework, and harm dynamics are distinct enough to warrant their own resource. That said, an AI-powered crypto scam targeting consumers through a mainstream platform, or a crypto firm deploying manipulative design against vulnerable users, may fall within our fraud or mental health domains and would be considered on a case-by-case basis.
Algorithmic management, gig economy exploitation, and workplace surveillance overlap with our algorithmic discrimination domain but represent a distinct research field with its own regulatory framework and advocacy community. We are not resourced to cover this thoroughly, and partial coverage risks creating a misleading picture. We are watching this space and may revisit.
Government-backed disinformation campaigns, state censorship, and cyber warfare are geopolitical issues more than platform accountability issues. Our tracker focuses on harms caused by platforms and AI systems, their design choices, governance failures, and product decisions. Where a platform knowingly enables or fails to moderate state-sponsored content that causes harm, the platform's response (or inaction) may be the incident. The state actor itself is not our primary subject.
Our automated RSS pipeline currently monitors English-language sources. This means incidents documented only in local-language reporting, particularly across Southeast Asia, Latin America, the Middle East, and sub-Saharan Africa, are likely underrepresented. We acknowledge this is a significant limitation for a database with global ambitions, and expanding language coverage is a longer-term goal.
These boundaries are not fixed. As the database matures, our pipeline expands, and the harm landscape evolves, we will continuously re-evaluate what belongs in scope. If you believe an area is being handled inconsistently or that a boundary should be reconsidered, we welcome that feedback.