Exploring AI [8]

April 16, 2026

The Ins and Outs of Data Sovereignty

AI tools are often discussed in terms of features, capability and performance. In practice, the more important question for most organisations is how control is actually delivered.

In our experience working with businesses across regulated and operational environments, AI adoption is rarely limited by what the technology can do. It is limited by how data is handled, where it is processed and what level of control is available at different price and deployment tiers.

That control is not always consistent. It is shaped by licensing, infrastructure choices and the commercial model behind each platform.

This is the lens we use throughout this overview.

⸻

What Is Data Sovereignty, and Why Does It Matter?

Data sovereignty goes beyond where your files are stored. It also covers who has legal jurisdiction over your data, who can access it and what happens to it when it is processed.

In practice, it comes down to a simple question:

If I send my business data to this AI tool, where does it go, who can see it and what laws apply to it?

For businesses operating under GDPR or similar data protection frameworks, this question carries particular weight. This is especially relevant in jurisdictions such as the Isle of Man, where local regulation sits alongside broader international obligations.

⸻

Microsoft 365 and Copilot

Microsoft has invested heavily in addressing data residency concerns. Within Microsoft 365, data at rest can be located in specific geographic regions depending on how your tenant is configured.

Copilot adds a layer of nuance.

Earlier concerns focused on how Copilot processed data alongside existing Microsoft 365 services. Microsoft has since aligned Copilot more closely with its core service boundary. In practice, Copilot operates within the Microsoft 365 compliance framework, using your organisation’s data through Microsoft Graph rather than treating it as a separate external system.

Microsoft has also confirmed that customer data is not used to train its foundation models.

At the same time, Microsoft continues to expand regional processing capabilities, particularly through initiatives such as its EU Data Boundary, with increasing localisation over time.

For organisations with stricter requirements, Advanced Data Residency can extend where certain Microsoft 365 workloads are stored. It is important to understand that this applies to specific services and does not automatically cover everything Copilot interacts with.

From a control perspective, Microsoft represents one of the more structured environments, but even here, sovereignty is not absolute. It is shaped by configuration, licensing and jurisdictional exposure.

Key questions to ask:

Where is your Microsoft 365 tenant provisioned?

Do you have Advanced Data Residency configured and what does it actually cover?

Have you reviewed how Copilot processes data within your environment?

⸻

ChatGPT (OpenAI)

OpenAI’s ChatGPT is widely used, but the data handling model varies depending on how it is deployed.

For individuals and teams using free or standard plans, data sovereignty controls are limited. Conversations may be used to improve models unless settings are changed, and processing typically takes place in US-based infrastructure.

More advanced options are available through business and enterprise offerings, including regional data residency in Europe and the UK. These allow organisations to keep data at rest within a chosen region and configure how long data is retained. In some cases, retention can be reduced to zero.

These capabilities are generally tied to higher tier plans, which can represent a noticeable step up in cost and are not always practical for smaller organisations.

Even with regional settings in place, some supporting data such as account metadata or integrations may still be processed outside the selected region. OpenAI also remains a US-based company and is subject to US legal jurisdiction.

From a practical standpoint, ChatGPT illustrates a broader pattern seen across the market. The level of control available is directly tied to commercial tier and deployment model rather than being consistent across all users.

Key questions to ask:

Are you using a business or enterprise plan or a standard one?

Have you configured data residency and retention settings?

What third party integrations are connected to your environment?

⸻

Claude (Anthropic)

Anthropic’s Claude is often grouped alongside other large language models, but from a governance perspective it can be more complex to assess.

Claude is accessed both through its own interface and via APIs or third party platforms. This means the way data is handled can vary depending on how it is implemented.

Anthropic states that customer data is not used to train models when using its API or enterprise offerings. However, clarity around data residency is less central compared to platforms such as Microsoft 365 or Google Workspace, particularly for non enterprise users.

For organisations working to meet GDPR requirements, this makes due diligence more important. It is not always immediately clear where data is processed, how it is routed or what controls apply without reviewing the specific deployment model.

As with other providers, Anthropic is a US-based company and remains subject to US legal jurisdiction.

Claude is a powerful tool for reasoning and long form work, but it highlights an important point. Capability is rarely the issue. The real question is how clearly an organisation can define and control its data flow when using it.

Key questions to ask:

How are you accessing Claude, directly or through another platform?

What level of data sensitivity is being used?

Do you have visibility over where data is processed and stored?

⸻

Google Gemini

Google has developed a more mature approach to data residency across its Workspace ecosystem, and this increasingly extends to Gemini.

Administrators can configure data regions for core Workspace data, and Google continues to expand how these controls apply to AI processing. For enterprise customers, additional features such as customer managed encryption keys and private routing provide further control.

At the higher end, Gemini can be deployed in highly controlled environments through Google Distributed Cloud, including scenarios that are effectively isolated from the public internet. These setups are typically limited to large enterprises or public sector organisations due to their complexity.

From a control perspective, Google offers strong tooling, but like others, it still operates within a layered model where capability depends on configuration and commercial tier.

The underlying consideration remains consistent. Data residency affects where data is stored and processed, but it does not change the legal jurisdiction that may apply.

Key questions to ask:

What Workspace licence tier are you on?

How are your data region settings configured?

Does your use of Gemini align with your compliance requirements?

⸻

Perplexity and AI Powered Search

Perplexity AI takes a different approach by combining language models with live web search to generate answers, often with cited sources.

From a data sovereignty perspective, this introduces a different pattern of risk.

Queries may involve real time retrieval from external websites, meaning user input can be combined with third party content during processing. This can improve transparency and usefulness, while also expanding the data flow beyond a single provider.

Compared to more established enterprise platforms, governance and residency controls are less central to the product design. This makes it well suited to research and general exploration, but less appropriate for sensitive or confidential data use cases.

Key questions to ask:

What type of information is being entered into search-based AI tools?

Are queries being combined with external sources?

Is this tool being used within a governed workflow or for convenience?

⸻

Meta Llama: The Open Weight Option

Meta’s Llama models sit in a different category.

As open weight models, they can be downloaded and run on your own infrastructure. When self hosted, prompts and outputs remain entirely within your environment, with no external API calls and no third party visibility.

For organisations handling sensitive or regulated data, this can provide a higher degree of control.

However, that control comes with operational responsibility. Running Llama effectively requires infrastructure, technical expertise and ongoing management.

It is also worth noting that if Llama is accessed through a third party provider rather than self hosting, many of these benefits no longer apply.

Key questions to ask:

Do you have the capability to self host securely?

How sensitive is the data you are working with?

Does the investment align with your risk profile?

⸻

Claude Copilot and the Multi Model Future

Microsoft is evolving Copilot into a broader AI orchestration layer, incorporating models beyond its own ecosystem, including those from Anthropic.

This introduces both opportunity and complexity.

When multiple model providers are involved, it becomes important to understand not only where data is stored, but also which model processes it, which company operates that model and what legal jurisdiction applies at each stage.

Even when the interface sits within Microsoft 365, elements of processing may involve third party models with their own policies and legal exposure.

From a governance perspective, this increases the importance of visibility. The challenge is no longer just understanding a single provider, but understanding the full chain of processing across platforms, models and infrastructure.

⸻

So Is Data Safer Here?

For businesses operating under GDPR or similar data protection frameworks, keeping data local can offer genuine advantages. Local infrastructure can provide clearer regulatory alignment and greater visibility over who is responsible for systems.

At the same time, there is a practical reality that often gets overlooked.

For many smaller organisations, the strongest data controls within AI platforms sit behind higher tier licences. Features such as regional processing, reduced retention and enhanced governance are not always available at entry level pricing.

Enterprise grade AI tools often introduce another constraint. They may require a minimum number of users before access is granted, sometimes 20 or more. For very small organisations, that threshold alone can make adoption difficult, regardless of suitability from a compliance perspective.

As a result, many smaller businesses are left on consumer style plans that offer significantly less visibility and control over where data goes and who can access it.

This creates a structural gap in the market.

Larger organisations can often configure AI tools to align closely with their compliance requirements. Smaller businesses may find themselves balancing cost against control and deciding where AI can be used safely and where it should remain limited to low risk tasks.

In practice, this does not mean avoiding AI. It means using it deliberately based on risk.

⸻

A Practical Starting Point

Before adopting any AI tool, it is worth asking a few straightforward questions:

• Where is my data stored and where is it processed

• Is my data used to train or improve the model

• What legal jurisdiction applies to the provider

• What controls are available and what do they cost

• What happens if I need to move away from this provider

There are no universally right answers. The right approach depends on your business, your data and your risk appetite.

The important thing is making those decisions deliberately rather than by default.

< Older Post

Get In Touch

Five people in lab coats study and perform scientific experiments against a dark blue background with chalk-style drawings.

Exploring AI [8]

The Ins and Outs of Data Sovereignty

Women in STEM - Saint Hildegard of Bingen

Securing AI

What a Virtual/Fractional CTO Really Does