White Paper 03 · Operational Decision Intelligence series

Shadow AI: Your Data Has Already Left the Building

Roughly three in four knowledge workers already use AI at work, and most bring their own tools to do it. Nearly half feed those tools non-public company data. Every paste is an uncontrolled transfer out of the perimeter — colliding with GDPR, the EU AI Act, NIS2 and Schrems II. Prohibition fails. The only durable remedy is a sovereign, governed substitute.

Abstract Consumer generative AI has been adopted by the workforce faster than any enterprise technology in living memory — and largely without permission. Roughly 75% of knowledge workers now use AI at work, and about 78% bring their own tools to do it.[1] Nearly half of employees admit to entering non-public company information into public GenAI tools.[2] Each such interaction is an uncontrolled transfer of corporate data to a third-party model provider — frequently outside the EU — with no log, no legal-basis assessment, and no audit trail. This is shadow AI: the successor to shadow IT, but with a decisive difference — shadow IT kept data inside the organisation, while shadow AI sends it out. We assemble the evidence on its scale, explain why the reflexive remedy — prohibition — reliably fails, map the regulatory collision (GDPR, Schrems II, the US CLOUD Act, the EU AI Act, NIS2), and cost the downside. We then argue that the only durable remedy is substitution, not prohibition: a governed, sovereign, role-scoped assistant that runs inside the perimeter, answers from live company knowledge, anonymises PII before any external model is ever touched, and records every query in an audit trail. The thesis in one line: the data has already left the building; the task is to bring the AI inside.

1. The scale of shadow AI

The adoption curve for consumer generative AI has no enterprise precedent. Microsoft and LinkedIn's 2024 Work Trend Index — a survey of 31,000 workers across 31 countries — found that 75% of knowledge workers already use AI at work, and, critically, 78% of them bring their own AI tools ("BYOAI") rather than wait for a sanctioned one.[1] Adoption did not follow governance; it outran it by a wide margin.

The consequence is data egress at scale. Cisco's 2024 Data Privacy Benchmark Study — 2,600 privacy and security professionals across 12 geographies — reports that 48% of employees admit to entering non-public company information into GenAI tools.[2] Independent telemetry sharpens the picture: Cyberhaven, analysing usage across roughly 1.6 million workers, found that 11% of the data employees paste into ChatGPT is confidential, with internal-only material, source code and client data among the most common categories, and the average company leaking confidential material hundreds of times per week.[3]

FIG 1 — The scale of shadow AI. Adoption (blue) outran governance; leakage (grey) followed. Sources: Microsoft & LinkedIn, Work Trend Index 2024 (use / BYOAI); Cisco 2024 (non-public data); Cyberhaven (confidential share of pastes).[1][2][3]

The label matters. This is not "shadow IT" — the decade-old problem of employees adopting unsanctioned SaaS. Shadow IT, for all its risk, mostly kept corporate data within tools the company could eventually discover, inventory and govern. Shadow AI inverts that: the data is copied out of the perimeter, into a model provider's infrastructure, the instant an employee hits enter. There is no shadow inventory to reconcile later, because the transfer is already complete and, in the general case, irreversible.

48%
of employees admit to entering non-public company information into public GenAI tools — while over a quarter of organisations have already banned such tools outright.
Source: Cisco, 2024 Data Privacy Benchmark Study.

The reader should hold two facts together: the tools deliver real, felt productivity (which is why 78% adopt them unbidden), and every use of an ungoverned one is a small data-exfiltration event. That combination — genuine value plus genuine leakage — is precisely what makes the problem hard, and what rules out the obvious fix.

2. Why prohibition fails

The instinctive response is to ban it. More than one in four organisations have already done exactly that — Cisco found over 27% of organisations had banned the use of generative AI over privacy and security concerns.[2] The bans do not work, for two structural reasons.

2.1 The tool is useful, and BYOAI routes around the ban

When 78% of workers already supply their own AI,[1] a policy prohibition does not remove the tool; it removes the sanctioned, observable version of it and pushes usage onto personal phones, personal accounts and personal devices — precisely the channels the security team cannot see. A ban converts a visible risk into an invisible one. The worker keeps the productivity advantage; the organisation loses the last shred of telemetry it had.

2.2 Prohibition is adversarial, and the productivity gap is real

Asking a workforce to give up a tool that measurably speeds their work, with nothing offered in its place, creates a standing incentive to circumvent the rule — and selects for the most motivated, often most senior, users to go furthest underground. This is the same lesson the industry already learned with shadow IT a decade ago: the answer to unsanctioned file sharing was never to ban file sharing — it was to provide a sanctioned equivalent good enough that no one needed the unsanctioned one. The winning move was substitution, not prohibition.

DimensionProhibitionSubstitution — a governed alternative
Effect on usagedrives it underground✓ pulls it onto a sanctioned tool
Data egress✗ continues, now invisible✓ stays inside the perimeter
Audit trail✗ none — event unobservable✓ every query logged
Relationship with staffadversarial✓ aligned — inside is the easy path
Productivitylost or captured off-platform✓ retained, on-platform
FIG 2 — You cannot govern what you cannot see, and a ban guarantees you cannot see it. The only durable path is to make the inside option the better one.

The conclusion is not "govern shadow AI better." You cannot govern what you cannot see, and a ban guarantees you cannot see it. The only durable path is to make the inside option the one employees reach for first.

3. The regulatory collision

For a European firm, an employee pasting company data into a US-hosted model is not merely a security lapse — it is a collision with several overlapping legal regimes at once, and the firm, not the employee, is accountable.

GDPR (Regulation (EU) 2016/679). If the pasted text contains any personal data — a customer name, an email, a supplier contact, an employee record — the act is processing, and sending it to a third-party model provider is a disclosure, in most configurations a transfer to a third country. It happens with no legal-basis assessment, no record of processing, and no data-protection impact assessment. The controller — the employer — carries the obligation regardless of whether it knew the transfer occurred.[5]

Schrems II (CJEU, Case C-311/18). The Court of Justice invalidated the EU–US Privacy Shield in 2020, holding that transfers of personal data to the United States require supplementary safeguards because US surveillance law does not offer EU-equivalent protection.[7] A subsequent adequacy framework exists but remains politically contested and legally fragile. An employee's paste into a US LLM is exactly the trans-Atlantic transfer Schrems II governs — executed with none of the safeguards the ruling requires.

The US CLOUD Act (2018). US-headquartered providers can be compelled to produce data in their custody regardless of where in the world it is physically stored.[8] "The data is on EU servers" is therefore not, by itself, a sovereignty guarantee when the provider is subject to US jurisdiction — the crux of the residency problem for regulated European industry.

The EU AI Act (Regulation (EU) 2024/1689). The Union's AI regulation, entering into force in phased stages, imposes transparency obligations on general-purpose AI and human-oversight and risk-management obligations on high-risk uses.[6] Ungoverned consumer AI embedded — invisibly — into a regulated process (a pharmaceutical batch record, a safety decision, a credit assessment) is unmanaged AI risk by definition, and it is the deploying organisation that must demonstrate oversight it cannot evidence.

NIS2 (Directive (EU) 2022/2555). For firms in scope, NIS2 broadens cybersecurity and supply-chain risk-management duties and raises accountability to management level.[9] Uncontrolled data egress to third-party AI services is a governance gap squarely within the risk-management obligations NIS2 makes non-delegable.

RegimeCore obligationWhat a shadow-AI paste does
GDPR (2016/679)Lawful basis + controlled transfer✗ unassessed third-country transfer
Schrems II (C-311/18)Safeguards for US transfers✗ transfer with no safeguards
US CLOUD Act(exposure) US-jurisdiction reach✗ data compellable, wherever stored
EU AI Act (2024/1689)Human oversight of high-risk AI✗ ungoverned AI in a regulated process
NIS2 (2022/2555)Risk management + accountability✗ uncontrolled, unlogged egress
FIG 3 — Five regimes, one common presumption: the organisation knows what data moves where and can prove it controls it. Shadow AI defeats that presumption at the root — because the defining feature of the event is that nobody logged it.

The common thread across all five: each regime presumes the organisation knows what data moves where, and can demonstrate control. Shadow AI defeats that presumption at the root, because the defining feature of the event is that nobody logged it.

4. The cost when it goes wrong

The financial downside is well quantified at the top end. IBM's Cost of a Data Breach Report 2024 puts the global average breach cost at USD 4.88 million — the highest on record.[4] No mid-market firm should expect that figure — it is a global, all-sizes average — but it calibrates the tail.

$4.88M
the global average cost of a data breach in 2024 — the highest ever recorded. A mid-market firm won't see this number; it calibrates the tail the invisible event exposes it to.
Source: IBM, Cost of a Data Breach Report 2024.

The larger operational cost is often not a reportable breach but irreversible trade-secret loss. In a widely reported 2023 incident, Samsung restricted employee use of ChatGPT after engineers pasted internal source code and meeting notes into the tool to debug and summarise them.[10] The material in question — a formulation, a pricing model, source code, a proprietary process — does not come back once it has entered a third party's systems and, potentially, its training-eligible data. For a manufacturer whose entire competitive advantage is a process recipe, that is the asset itself walking out the door.

And the compounding problem underneath both is invisibility. You cannot quantify, remediate, or report an incident you never observed. Every regime in §3 requires the organisation to demonstrate control; a shadow-AI event produces no log to demonstrate anything with. The absence of evidence is not merely a reporting inconvenience — under GDPR and NIS2 the inability to show what happened is itself a compliance failure.

Honesty note

A conservative expected-cost view — the annual probability of a material shadow-AI-linked incident against a mid-market-scaled impact — still lands in the tens of thousands of euros per year for a representative €40M firm, before any trade-secret tail. That construction is Dimbo analysis, set out in full in the companion Value Model; the point here is directional, not precise. The uncounted competitive-leakage risk — pricing, formulations, source code — is larger still and the least quantifiable.

5. The governed alternative

If prohibition fails and the tool is genuinely useful, the remedy is to make the governed, inside-the-perimeter option the one employees reach for first. Four properties make that substitute credible — and each is a real, present property of the system, not a certification we do not hold.

Sovereign by default. Dimbo runs fully on the customer's own infrastructure on a benchmarked local model — a gemma-class local LLM at reference-parity on a single workstation-class GPU, with local vision and local speech transcription. Air-gapped operation is the default, not a premium tier. When an external model is ever used, PII is anonymised at a Presidio gateway before any text leaves the perimeter — the hard architectural constraint that inverts the shadow-AI failure mode. Where consumer AI copies data out, Dimbo keeps it in.

Employee queryOn company knowledge
GatewayPII anonymised
on-prem (local) EU-hosted cloud (masked) audit trail ✓
FIG 4 — Substitution by construction. The default branch runs fully on-prem; only when an external tier is chosen does text pass the PII gate first — and every interaction lands in the audit trail. The assistant rests at propose; autonomy is earned per process, revocable instantly.

Role-scoped, from live company knowledge. The assistant answers "how do I do X" from the firm's own knowledge store, filtered at retrieval by role (RBAC-at-retrieval): an operator sees operator knowledge, never the cap table; a shift lead sees the line, not the board minutes. The employee gets the productivity they were seeking from the consumer chatbot — in their own language — without the data ever leaving, and every query and its evidence land in an audit trail, which is precisely the log shadow AI never produces.

Human-in-the-loop by construction. Every capability begins as a proposal a human approves; autonomy is earned per process, only by measured track record, and promotion is always the customer's decision, with a master kill-switch and instant downward demotion the moment a human disagrees. This is the autonomy ladder — and it means the EU AI Act's human-oversight obligations are satisfied by the product's core mechanism rather than a compliance appendix.

Honest about its limits. The properties above are real: GDPR-by-design, data sovereignty, on-prem deployment, PII anonymisation, a full audit trail, human oversight. Dimbo does not claim SOC 2 or ISO 27001 certification. The optional cross-company intelligence network is strictly opt-in and ships with an explicitly declared data boundary — no data crosses it by default. Selling sovereignty requires stating exactly where the perimeter is; that is what these guardrails do.

The data has already left the building. The task is not to police the exit — it is to bring the AI inside, where the work, the knowledge, and the audit trail already live. — The substitution thesis

A representative scenario. Adriatica Pharma Services, a fictional contract development and manufacturing organisation (CDMO), employs a process chemist who — to draft a deviation report quickly — pastes a client's proprietary batch record into a public chatbot. It is a GMP event and a GDPR transfer at once, and nobody logs it. With Dimbo's sovereign, role-scoped assistant, the same chemist produces the same draft inside the perimeter: the model runs locally, the client's data never leaves, and the query plus the evidence it drew on are written to the audit trail — available months later when the auditor asks how the document was produced. The productivity is identical; the exfiltration is gone; the control the regulator demands now exists.

6. Conclusion — bring the AI inside

Shadow AI is not a future risk to be managed with a policy memo; it is a present, daily data-exfiltration event already running at industrial scale — 75% of knowledge workers using AI, 78% supplying their own, 48% feeding it non-public data.[1][2] Prohibition cannot close it, because it removes the sanctioned tool without removing the underlying demand, and so drives the leakage into channels no one can see. Meanwhile GDPR, Schrems II, the CLOUD Act, the EU AI Act and NIS2 all converge on a single requirement the shadow-AI event structurally defeats: know what data moves where, and be able to prove you control it.

The resolution is substitution. Give the workforce an assistant at least as useful as the consumer tool, running on the company's own infrastructure, drawing on the company's own knowledge, filtered to each role, anonymising anything that must ever leave, and logging every interaction. That is a sovereign, governed, auditable AI the firm owns — the opposite of a paste into someone else's model. The data has already left the building. The only durable answer is to build the version that never has to.

References
  1. Microsoft & LinkedIn — 2024 Work Trend Index: AI at Work Is Here. Now Comes the Hard Part (75% of knowledge workers use AI at work; 78% bring their own AI / "BYOAI"; 31,000 respondents across 31 countries). microsoft.com/worklab
  2. Cisco — 2024 Data Privacy Benchmark Study (48% of employees admit entering non-public company information into GenAI tools; over 27% of organisations banned GenAI use; 2,600 privacy/security professionals across 12 geographies). cisco.com
  3. Cyberhaven — 11% of data employees paste into ChatGPT is confidential (analysis across ~1.6M workers; source code, client data and internal-only material among top leaked categories). cyberhaven.com
  4. IBM — Cost of a Data Breach Report 2024 (global average breach cost USD 4.88M — highest on record). ibm.com
  5. Regulation (EU) 2016/679 — General Data Protection Regulation (GDPR) (lawful basis, records of processing, international transfers, controller accountability). eur-lex.europa.eu
  6. Regulation (EU) 2024/1689 — Artificial Intelligence Act (human-oversight and risk-management obligations for high-risk AI; GPAI transparency; phased entry into force). eur-lex.europa.eu
  7. Court of Justice of the EU — Case C-311/18 (Schrems II), judgment 16 July 2020 (invalidated the EU–US Privacy Shield; transfers to the US require supplementary safeguards). curia.europa.eu
  8. United States — Clarifying Lawful Overseas Use of Data (CLOUD) Act, 2018 (US providers compellable to produce data regardless of storage location). congress.gov
  9. Directive (EU) 2022/2555 — NIS2 Directive (expanded cybersecurity and supply-chain risk-management obligations; management-level accountability). eur-lex.europa.eu
  10. Bloomberg — Samsung Bans Staff's AI Use After Spotting ChatGPT Data Leak (May 2023; engineers reportedly pasted internal source code and meeting notes into ChatGPT). bloomberg.com

Figures flagged should be confirmed against the latest release before publication. The Adriatica Pharma Services scenario is a representative fictional illustration flagged as Dimbo analysis. No unheld certifications (SOC 2 / ISO 27001) are claimed anywhere in this paper; the cross-company intelligence network is described with its explicit, declared, opt-in data boundary.

Turn it into a number

Pull your AI back inside the perimeter.

See where your own data is going. The 48-hour Deadline Audit runs on a data slice, on your infrastructure — no installs, no rip-and-replace, PII anonymised before any external model, a human reply within one day.