Two Early 2026 AI Exposures: Lessons for the Future of AI and Data Governance
- CodeWall's autonomous offensive AI agent compromised McKinsey's internal GenAI platform Lilli in under two hours on 28 February 2026
- 46.5 million plaintext chat messages, 728,000 confidential files, and 57,000+ records exposed, covering M&A, strategy, and client engagement content
- Attack path: 22 unauthenticated API endpoints discovered by the offensive agent, followed by SQL injection into the production database
- No human operator in the loop on the attacker side. No credentials required. The entire intrusion ran at machine tempo.
01 Annotation
Wharton's April 2026 analysis of the McKinsey Lilli incident is the clearest published case study to date of the failure pattern we describe in our whitepaper The Agentic Blast Radius. Every one of the five control gaps is visible in the incident timeline. The attacker did not need a human insider. An autonomous offensive agent ran reconnaissance, discovered 22 unauthenticated API endpoints, identified the SQL injection path, and exfiltrated the database in under two hours. That is the Gap 5 scenario made concrete: a compromise that propagates faster than any conventional SOC escalation path can respond to it. The 46.5 million chat messages were not primary data. They were the memory layer of a GenAI platform: the accumulated conversational residue of tens of thousands of consultants working on strategy, M&A, and client engagements. Most organisations classify their CRM, their document management system, and their email archive as sensitive. Very few classify their agent memory, their vector stores, or their conversation logs with the same rigour. The unauthenticated API endpoints are the Gap 2 scenario: permission architecture scoped for the broadest possible task, not the narrowest current one. And the absence of a behavioural kill switch that could have terminated anomalous query patterns against the production database is Gap 5 in its operational form. The lesson is not that McKinsey got it uniquely wrong. The lesson is that the failure pattern (over-privileged agents, ungoverned memory, no machine-speed response capability) is the default pattern in most enterprise agentic deployments right now.
Maps to: Gaps 2, 4 & 5: Excessive Agency, Sanctioned Leakage (memory layer), Non-Human Identity