Data Privacy, Memorization, & Legal Implications in Generative AI

A NeurIPS 2025 Tutorial at the Intersection of AI and law.

Exhibit Hall F · Tue 2 Dec · 1:30 p.m. - 4:00 p.m. PST
San Diego Convention Center, San Diego, USA

Slides & notes (coming soon) View official NeurIPS listing

Presenters

Pratyush Maini

CMU & DatologyAI

Pratyush works on data-centric AI, with a focus on memorization, data privacy, and training data curation in large generative models.

Joseph C. Gratz

Partner, Morrison Foerster LLP

Joseph is a copyright and AI lawyer whose practice focuses on litigation involving new technologies, including generative AI and online platforms.

A. Feder Cooper

Yale & Stanford

Feder is a co-founder of the GenLaw Center and a researcher working across computer science and law on topics including privacy, memorization, and copyright in generative AI.

Panelists

Franziska Boenisch

CISPA Helmholtz Center for Information Security

Overview

Generative models are trained on vast datasets that often contain personal data and copyrighted content. As lawsuits, regulations, and standards emerge, practitioners increasingly need concrete, technically grounded guidance on how privacy and copyright law interact with the realities of modern model development.

This tutorial connects three themes:

Data privacy: how membership inference, data extraction, training-data attribution, and unlearning relate to formal privacy notions and real-world regulations.
Memorization: when models remember training data, what that means technically, and how it matters for sensitive data and copyrighted works.
Copyright: how courts and regulators are treating training data, memorization, and outputs, and what this implies for dataset design and model deployment.

We will alternate between technical material (attacks, defenses, measurement, and system design) and legal analysis (doctrines, active cases, and regulatory futures), with a focus on practical workflows that ML researchers, engineers, and policy teams can adopt today.

Tutorial outline

20 minutes

Primer: law, AI, and privacy terms

General education on law + AI so everyone shares the same baseline
Where copyright law intersects with generative modeling practice
Privacy foundations and what we mean by "extraction"

30 minutes

Status quo & life cycle of current cases

What counts as copying: ideas vs. expression, substantial similarity, non-literal copying
When otherwise infringing copying is swept into fair use
Why verbatim copying collapses the hard questions (and what actually matters)
Fair use and transformative use, including a candid, informal definition
Strict liability, intent, and why fair use is always case by case
Why we still lack an across-the-board ruling, plus a primer on class actions

30 minutes

Why might copyright care about memorization?

Probabilistic notions of data extraction
Extracting large-pieces of copyrighted texts from LLMs

40 minutes

Future research possibilities overview

Roadmap for technical and policy teams
Working towards a robust definition of memorization
Can we detect training data?
Can we unlearn memorized information?

5 minutes

Round Up before Panel

30 minutes

Panel with Zack, David, Avi, Franziska, and Peter

Firsthand perspectives from industry, startups, academia, and policy
How they triage "fair use vs. privacy" questions on real deployments
Audience Q&A to stress-test the guidance from earlier blocks

Tutorial materials

Core materials

Slides (all parts) Coming soon
Lecture notes / reading guide Coming soon

Recordings & logistics

Tutorial recording (NeurIPS) Link will be posted if available
NeurIPS program entry View on neurips.cc

Contact & updates

This site will be updated with the finalized schedule, materials, and logistics as NeurIPS 2025 approaches.

Questions about the tutorial?
Please contact the organizers (e-mails available on our websites).
Conference logistics:
See the official NeurIPS 2025 website for registration, venue, and schedule details.

Data Privacy, Memorization, & Legal Implications in Generative AI

Presenters

Pratyush Maini

Joseph C. Gratz

A. Feder Cooper

Panelists

Zack Lipton

David Atkinson

Avi Schwarzschild

Franziska Boenisch

Peter Henderson

Overview

Tutorial outline

Primer: law, AI, and privacy terms

Status quo & life cycle of current cases

Why might copyright care about memorization?

Future research possibilities overview

Round Up before Panel

Panel with Zack, David, Avi, Franziska, and Peter

Tutorial materials

Core materials

Recordings & logistics

Contact & updates