You Can't Govern What You Haven't Labeled

Every organization I talk to right now wants to talk about AI. What tools should we adopt? How do we roll out Copilot? Should we let staff use ChatGPT? Can we build our own internal models?

Good questions, all of them. But they're the wrong first questions. Here's the one that actually matters: Do you know where your sensitive data lives?

If the answer is "sort of" or "we have a policy" or the classic long pause followed by a subject change, you're not ready for AI. Not because AI is dangerous, but because you can't build guardrails around something you can't see.

The Problem Nobody Wants to Admit

Most organizations have data scattered across SharePoint sites, file shares, email, Teams channels, and that one department's shared drive that IT pretends doesn't exist. Some of that data is public. Some of it is regulated. Some of it would make your legal team very uncomfortable if it showed up in a generative AI response.

The challenge isn't that people are careless. It's that without a classification system, there's no way for anyone, human or machine, to know which data needs protection and which doesn't. You end up with two bad outcomes: either you lock everything down and kill productivity, or you leave everything open and hope for the best. Neither of those is a strategy. They're just different flavors of giving up.

I was telling a colleague about this exact scenario the other day. Forty thousand files containing Social Security numbers on a SharePoint site with external sharing enabled and zero classification. I may have gotten a little more emphatic than usual. I stood up at some point. I'm not sure when. I may have been pointing at my whiteboard. She gave me a look. That look. The one people give me sometimes when I'm... passionate about compliance.

The point is: most organizations aren't malicious about their data. They're just unaware, and awareness is step one.

What Data Classification Actually Looks Like

At its core, classification means applying labels to your data that describe how sensitive it is and how it should be handled. Simple concept. The execution is where it gets interesting.

In a recent engagement with a client organization, we implemented a classification framework built on four sensitivity tiers:

Tier	What It Covers	Risk If Exposed
Public	Press releases, published reports, public meeting minutes	None. That's the point.
Internal	Org charts, internal memos, procedure documents	Low, but not zero.
Confidential	Employee records, financial data, legal correspondence, CJIS-adjacent information	Real liability. Regulatory exposure.
Restricted	PHI, Social Security numbers, criminal justice data, specific regulatory mandates	The data that keeps your compliance officer up at night.

These tiers aren't revolutionary. What's revolutionary is making them operational: attached to every document, enforced by policy, and understood by every person who touches data.

Microsoft Purview and Sensitivity Labels: The Engine Room

If your organization runs on Microsoft 365 (and if you're in government or mid-market enterprise, you almost certainly do), Microsoft Purview Information Protection gives you the machinery to make classification real.

Sensitivity labels in Purview do more than tag a document. They carry enforcement with them:

Encryption. A Confidential label can automatically encrypt a document so only authorized users can open it, even if it gets forwarded to the wrong person.
Access control. Labels can restrict who can edit, print, copy, or forward content. That budget spreadsheet labeled Restricted? It stays in the hands that are supposed to have it.
Visual markings. Headers, footers, and watermarks that make the classification visible. When people see "CONFIDENTIAL" stamped on a document, they think twice before attaching it to an email.
DLP integration. Data Loss Prevention policies can key off sensitivity labels. A document labeled Restricted triggers different DLP rules than one labeled Internal. Your policies get smarter because they have context.
Retention and lifecycle. Labels can drive how long data is kept and when it's disposed of. Regulatory retention requirements stop being a spreadsheet someone maintains manually.

The real power is that these labels persist. They travel with the document. Email it, copy it to a USB drive, upload it to a cloud service, and the label and its protections follow. That's not a feature. That's the whole point.

The Automation Layer: Where It Gets Serious

Manual labeling is where most organizations start, and it's a legitimate first step. Train your staff, make labeling part of the workflow, build the muscle memory. But manual labeling at scale has an obvious ceiling: people are inconsistent, busy, and (let's be honest) occasionally forgetful.

This is where automatic classification changes the game. And in our experience, the most effective approach combines purpose-built data security platforms with Microsoft's native labeling infrastructure.

Discovery and Classification: Varonis Data Security Platform

In our client engagement, we deployed the Varonis Data Security Platform as the classification and discovery engine. This is a deliberate architectural choice, and here's why.

Varonis does something that native M365 tooling doesn't do well on its own: it gives you a unified, cross-platform view of where your sensitive data actually lives, who has access to it, and how it's being used. Before you can label anything, you need to find it, and in most organizations sensitive data has migrated to places nobody expected.

Automated data discovery. Varonis scans file shares, SharePoint, OneDrive, Exchange, Teams, and on-premises storage to locate sensitive content wherever it's landed. It doesn't wait for someone to label a file. It finds the file and tells you what's in it.
Built-in classification policies. Hundreds of pre-built classification rules covering PII, PHI, PCI data, credentials, and regulatory patterns. Social Security numbers hiding in a spreadsheet on a shared drive? Varonis finds them before an auditor does.
Custom classification rules. State-specific regulatory terms, local ordinance references, domain-specific terminology, internal identifiers. You build classifiers that match your world, not just the generic federal patterns.
Risk-based prioritization. Not all sensitive data carries equal risk. Varonis maps classification results against permissions, access patterns, and exposure to surface the data that's both sensitive and overexposed. A Social Security number in a locked-down HR folder is a finding. The same number in a department-wide SharePoint site with external sharing enabled is a fire.
Behavioral analytics. Classification tells you what the data is. Varonis also watches how people interact with it. Unusual access patterns, bulk downloads, lateral movement through sensitive repositories, and these behavioral signals augment the classification data and feed into your incident detection.

The Labeling Bridge: Varonis to Microsoft Purview

Here's where the architecture comes together. Varonis classifies the data. Microsoft Purview enforces the labels. The two systems work in concert.

Varonis can apply Microsoft Information Protection (MIP) sensitivity labels directly to files based on its classification results. When Varonis identifies a document containing protected health information, it doesn't just flag it in a dashboard. It applies the Confidential or Restricted sensitivity label that triggers Purview's encryption, access control, and DLP policies.

Varonis discovers SSNs on a shared drive → classifies as PII → applies Confidential label → Purview encrypts and restricts sharing

Varonis finds CJIS-adjacent data in Teams → applies Restricted label → DLP prevents copying to unapproved locations

Custom Tooling and the MIP SDK

For organizations with needs that go beyond what Varonis and Purview cover natively, the Microsoft Information Protection (MIP) SDK provides the extension point:

API-driven label application. Line-of-business applications, batch processes, and data migration workflows can apply the same sensitivity labels programmatically.
Integration with records management. Mapping sensitivity labels to existing records retention schedules so classification decisions cascade into lifecycle management automatically.
Cross-platform consistency. The MIP SDK provides a unified API for reading and writing sensitivity labels across the Microsoft ecosystem and beyond it.

One labeling taxonomy. One enforcement model. Multiple classification engines feeding into it: Varonis for discovery and automated classification, Purview for native M365 content, custom tools for everything else.

The AI Connection: Classification as a Prerequisite

Now we come back to AI. Here's the connection that too many organizations skip:

If you don't know what data is sensitive, you can't keep it out of AI tools.

When a department wants to use a generative AI assistant, the first governance question is: what data will it have access to? If your data isn't classified, you have exactly two options: give the AI access to everything (dangerous) or give it access to nothing (useless).

With a mature classification and labeling program, you get a third option, the one that actually works: use the labels to define the boundaries.

Copilot for Microsoft 365 can respect sensitivity labels. Documents labeled Restricted don't surface in AI-generated responses for users who don't have clearance.
DLP policies tied to sensitivity labels can prevent classified data from being pasted into unapproved AI tools.
Custom AI applications built on your own data can filter their training and retrieval sets by sensitivity tier. Your RAG pipeline only indexes Internal and Public data unless the user's role authorizes more.
Audit trails tied to labels let you prove (to auditors, to regulators, to your board) exactly what data touched an AI system and what didn't.

I was explaining this architecture on a call last week and I may have raised my voice. Just slightly. The way you do when someone suggests connecting an LLM to an unclassified data lake and calling it "innovation." Unlabeled data will not stand. I don't know where that phrase came from. It just comes out sometimes, like a reflex.

AI governance without data classification is just a policy document that nobody can enforce.

You want to enable AI responsibly? Label your data first, and everything else builds on that.

Where to Start

If you're reading this and thinking "we should probably do this," here's the practical path:

Define your tiers. Four is usually right. Align them to your regulatory obligations, not someone else's template.
Start with manual labeling in one department. Pick a department that handles sensitive data and is willing to participate. Learn from their workflow before you scale.
Deploy Purview sensitivity labels in M365. Even without automation, having labels available in Word, Excel, Outlook, and SharePoint changes behavior immediately.
Layer in Varonis for automated classification. Find what humans missed. Surface the risk you didn't know you had.
Tie labels to DLP policies. This is where enforcement begins. A label without a policy behind it is a suggestion. A label with a policy is a guardrail.
Extend to AI governance. Once your data is labeled, you have the vocabulary to write AI use policies that are actually enforceable.

It's not a six-month project. It's an ongoing program. But the first step, defining your tiers and deploying labels, can be done in weeks, and the security posture improvement is immediate. Stop planning to plan. Start labeling.

The Bottom Line

Data classification and labeling is the least glamorous, most impactful thing you can do for your security program right now. It's the prerequisite for DLP, for compliance reporting, for records management, and critically, for responsible AI adoption.

You can't govern what you haven't labeled. In a world where AI is about to touch every corner of your data estate, governing your data isn't optional anymore. Start labeling. Everything else gets easier after that.

Need Help Getting Started?

Circle 6 Systems helps government agencies and enterprises build data classification programs that enable responsible AI adoption.

Start a Conversation

Duke Dillingham is a compliance officer and security awareness consultant at Circle 6 Systems. The "Dude Diligence" comic book training series is available at circle6systems.com/training. Duke and Dude Diligence have never been seen in the same room at the same time, which Duke attributes to scheduling conflicts.