What Happens When Your AI Vendor Gets Breached? A CISO Pre-Mortem
Pre-mortem analysis is a risk management technique in which you assume a future failure has already occurred and work backwards to understand its causes and consequences. It is more useful than post-mortem analysis because it happens before the damage, when something can still be done about it.
The scenario: your primary AI vendor — the platform your legal team uses to draft client communications, your support organisation uses to answer customer queries, your knowledge management team uses to maintain internal documentation — has suffered a significant security breach. The vendor's systems were compromised. Customer data was accessed. The breach is confirmed. What happens next?
Walking through this scenario in detail, before it happens, reveals the specific ways in which an AI vendor breach differs from a conventional SaaS vendor breach — and why the organisations that have not thought through this scenario are likely to be significantly underprepared when it occurs.
The Blast Radius: Mapping What the Vendor Has
The first step in understanding an AI vendor breach is understanding what the vendor actually holds. This is where AI vendor deployments differ substantially from conventional SaaS.
A conventional SaaS vendor stores the data you explicitly put into the system: records, files, user accounts, transaction histories. In a breach, the question is which of those records were accessed. The scope is bounded by what you knowingly stored with the vendor.
An AI vendor holds more — and holds it in forms that are not immediately obvious.
Indexed documents. Every document you uploaded for indexing is held by the vendor for as long as indexing and retrieval services are active. For organisations that have indexed large internal knowledge bases, this includes contracts, policy documents, customer communications, financial models, and strategic plans. These are the documents you know the vendor has.
Query logs. Every query sent to the AI system — every question asked by every user — is typically logged by the vendor. These logs are not simply metadata. The queries themselves often contain sensitive information: the customer name being researched, the contract being negotiated, the compliance question being investigated. A query log breach exposes a detailed record of what the organisation's employees were asking about, when, and in what context. For organisations in privileged or regulated relationships, this is potentially more damaging than the document breach.
Embeddings. Documents indexed by an AI system are converted into vector embeddings — numerical representations of the document's semantic content. Embeddings are not the original text, but they are not opaque either. Research into embedding inversion has demonstrated that approximate reconstruction of original text from embeddings is feasible, particularly for shorter, structured documents. If the vendor's embedding store is compromised, the attacker has not only your documents' metadata but a representation from which significant content can be recovered.
Model fine-tuning data. If the vendor used any of your data to fine-tune their models — even if your agreement includes a training exclusion — the breach extends to whatever model weights contain that information. Model weights are not easily queried for specific training data, but attacks that extract memorised training data from language models are an active research area. The risk is not theoretical.
Why AI Vendor Breaches Are Worse Than Traditional SaaS Breaches
The conventional SaaS breach playbook involves: determine what data was accessed, notify affected parties, engage forensics, report to regulators, remediate the vulnerability. The scope of the breach is bounded by the data the vendor explicitly stored.
An AI vendor breach does not fit this playbook cleanly for several reasons.
The exposure is harder to scope. Determining which documents were accessed in a document store breach is straightforward: access logs show which files were retrieved. Determining what information is recoverable from a compromised embedding store requires technical analysis that most organisations are not equipped to perform quickly. The breach notification to regulators may be required before the scope is fully understood.
The harm may not be immediately visible. A traditional data breach produces identifiable harm: records are exposed, customer data is compromised, fraud follows. Embedding exposure produces potential harm that may not materialise immediately. The attacker now has representations of your documents that can be used for competitive intelligence, refined through AI-assisted reconstruction, or held for future leverage. The harm may appear months later in ways that are difficult to trace to the original breach.
The regulatory exposure is complex. Breach notification requirements under GDPR, HIPAA, and sector-specific frameworks have defined timelines and thresholds. But the classification of an embedding breach — does it constitute a personal data breach under GDPR? does it trigger HIPAA notification requirements? — is not settled law. Regulatory counsel will need to make judgements in real time under significant time pressure, with material consequences for both speed and accuracy of the response.
The broader security evaluation framework for AI vendors, including what to ask before deployment, is covered in detail in enterprise AI security: beyond SOC 2. The breach scenario makes clear why those pre-deployment questions matter: the answers to them are exactly what will be needed when a breach notification arrives.
The Contractual Gap
Most enterprise AI vendor agreements were drafted before the industry developed a clear understanding of AI-specific breach risks. Standard data processing agreements describe obligations around data deletion, breach notification, and security controls in terms that were developed for conventional SaaS deployments. They typically do not address:
What constitutes a breach in the context of embedding stores. Whether embedding extraction constitutes processing of personal data under applicable regulations. What the vendor's obligations are if model training on customer data is discovered after the fact. What happens to customer data in the event of a vendor acquisition or bankruptcy — situations where data handling governance may change rapidly.
These gaps mean that when a breach occurs, the organisation's legal team will be negotiating the vendor's obligations in real time, based on contractual language that does not clearly address the situation. The vendor's interests in this negotiation are not aligned with the customer's. The time pressure is significant. The outcome is uncertain.
CISOs who have not previously reviewed their AI vendor agreements specifically for these gaps are operating without a defined contractual framework for the breach scenario that is most likely to matter.
What CISOs Should Be Asking — and Aren't
Beyond standard vendor security review, the questions specific to AI vendor breach preparedness include: How are customer embeddings stored, and are they isolated per customer or commingled? What is the vendor's process for notifying customers if embeddings are compromised? What technical controls prevent your data from being used in model training without your explicit consent? What is the vendor's position on liability if model fine-tuning on customer data without authorisation is discovered? How would the vendor scope a breach notification if an inference log were compromised — what constitutes a reportable event?
Most AI vendors do not have documented, contractually committed answers to these questions. The absence of answers is itself informative about the maturity of the vendor's breach preparedness for AI-specific scenarios.
The Architectural Answer: If the Vendor Has Nothing, There Is Nothing to Breach
The pre-mortem exercise has a clear architectural implication. The most effective mitigation for AI vendor breach risk is removing the vendor's access to the data that creates the exposure. Air-gap deployment — running the AI system entirely within the organisation's own infrastructure — means the vendor never holds your documents, your query logs, or your embeddings. A breach of the vendor's systems does not affect your data because your data was never there.
This is not a theoretical security benefit. It is a direct consequence of architectural choice: where inference runs determines what a vendor breach can expose. If inference runs on your hardware, within your network, the vendor's security posture is irrelevant to your data security. The attack surface for your AI-related data is your own infrastructure — which your security team controls, monitors, and can audit.
For organisations in industries where a data incident carries regulatory, reputational, and contractual consequences — insurance, financial services, legal, defence — the architectural decision is worth making deliberately, as part of a pre-mortem analysis like this one, rather than revisiting it urgently after a vendor breach notification arrives.
To see how Scabera approaches vendor-independent AI deployment for regulated industries, book a demo.