Every organization I talk to right now wants to talk about AI. What tools should we adopt? How do we roll out Copilot? Should we let staff use ChatGPT? Can we build our own internal models?

Good questions, all of them. But they're the wrong first questions. Here's the one that actually matters: Do you know where your sensitive data lives?

If the answer is "sort of" or "we have a policy" or the classic long pause followed by a subject change, you're not ready for AI. Not because AI is dangerous, but because you can't build guardrails around something you can't see.

The Problem Nobody Wants to Admit

Most organizations have data scattered across SharePoint sites, file shares, email, Teams channels, and that one department's shared drive that IT pretends doesn't exist. Some of that data is public. Some of it is regulated. Some of it would make your legal team very uncomfortable if it showed up in a generative AI response.

The challenge isn't that people are careless. It's that without a classification system, there's no way for anyone, human or machine, to know which data needs protection and which doesn't. You end up with two bad outcomes: either you lock everything down and kill productivity, or you leave everything open and hope for the best. Neither of those is a strategy. They're just different flavors of giving up.

I was telling a colleague about this exact scenario the other day. Forty thousand files containing Social Security numbers on a SharePoint site with external sharing enabled and zero classification. I may have gotten a little more emphatic than usual. I stood up at some point. I'm not sure when. I may have been pointing at my whiteboard. She gave me a look. That look. The one people give me sometimes when I'm... passionate about compliance.

The point is: most organizations aren't malicious about their data. They're just unaware, and awareness is step one.

What Data Classification Actually Looks Like

At its core, classification means applying labels to your data that describe how sensitive it is and how it should be handled. Simple concept. The execution is where it gets interesting.

In a recent engagement with a client organization, we implemented a classification framework built on four sensitivity tiers:

Tier What It Covers Risk If Exposed
Public Press releases, published reports, public meeting minutes None. That's the point.
Internal Org charts, internal memos, procedure documents Low, but not zero.
Confidential Employee records, financial data, legal correspondence, CJIS-adjacent information Real liability. Regulatory exposure.
Restricted PHI, Social Security numbers, criminal justice data, specific regulatory mandates The data that keeps your compliance officer up at night.

These tiers aren't revolutionary. What's revolutionary is making them operational: attached to every document, enforced by policy, and understood by every person who touches data.

Microsoft Purview and Sensitivity Labels: The Engine Room

If your organization runs on Microsoft 365 (and if you're in government or mid-market enterprise, you almost certainly do), Microsoft Purview Information Protection gives you the machinery to make classification real.

Sensitivity labels in Purview do more than tag a document. They carry enforcement with them:

The real power is that these labels persist. They travel with the document. Email it, copy it to a USB drive, upload it to a cloud service, and the label and its protections follow. That's not a feature. That's the whole point.

The Automation Layer: Where It Gets Serious

Manual labeling is where most organizations start, and it's a legitimate first step. Train your staff, make labeling part of the workflow, build the muscle memory. But manual labeling at scale has an obvious ceiling: people are inconsistent, busy, and (let's be honest) occasionally forgetful.

This is where automatic classification changes the game. And in our experience, the most effective approach combines purpose-built data security platforms with Microsoft's native labeling infrastructure.

Discovery and Classification: Varonis Data Security Platform

In our client engagement, we deployed the Varonis Data Security Platform as the classification and discovery engine. This is a deliberate architectural choice, and here's why.

Varonis does something that native M365 tooling doesn't do well on its own: it gives you a unified, cross-platform view of where your sensitive data actually lives, who has access to it, and how it's being used. Before you can label anything, you need to find it, and in most organizations sensitive data has migrated to places nobody expected.

The Labeling Bridge: Varonis to Microsoft Purview

Here's where the architecture comes together. Varonis classifies the data. Microsoft Purview enforces the labels. The two systems work in concert.

Varonis can apply Microsoft Information Protection (MIP) sensitivity labels directly to files based on its classification results. When Varonis identifies a document containing protected health information, it doesn't just flag it in a dashboard. It applies the Confidential or Restricted sensitivity label that triggers Purview's encryption, access control, and DLP policies.

Varonis discovers SSNs on a shared drive classifies as PII applies Confidential label Purview encrypts and restricts sharing

Varonis finds CJIS-adjacent data in Teams applies Restricted label DLP prevents copying to unapproved locations

Custom Tooling and the MIP SDK

For organizations with needs that go beyond what Varonis and Purview cover natively, the Microsoft Information Protection (MIP) SDK provides the extension point:

One labeling taxonomy. One enforcement model. Multiple classification engines feeding into it: Varonis for discovery and automated classification, Purview for native M365 content, custom tools for everything else.

The AI Connection: Classification as a Prerequisite

Now we come back to AI. Here's the connection that too many organizations skip:

If you don't know what data is sensitive, you can't keep it out of AI tools.

When a department wants to use a generative AI assistant, the first governance question is: what data will it have access to? If your data isn't classified, you have exactly two options: give the AI access to everything (dangerous) or give it access to nothing (useless).

With a mature classification and labeling program, you get a third option, the one that actually works: use the labels to define the boundaries.

I was explaining this architecture on a call last week and I may have raised my voice. Just slightly. The way you do when someone suggests connecting an LLM to an unclassified data lake and calling it "innovation." Unlabeled data will not stand. I don't know where that phrase came from. It just comes out sometimes, like a reflex.

AI governance without data classification is just a policy document that nobody can enforce.

You want to enable AI responsibly? Label your data first, and everything else builds on that.

Where to Start

If you're reading this and thinking "we should probably do this," here's the practical path:

  1. Define your tiers. Four is usually right. Align them to your regulatory obligations, not someone else's template.
  2. Start with manual labeling in one department. Pick a department that handles sensitive data and is willing to participate. Learn from their workflow before you scale.
  3. Deploy Purview sensitivity labels in M365. Even without automation, having labels available in Word, Excel, Outlook, and SharePoint changes behavior immediately.
  4. Layer in Varonis for automated classification. Find what humans missed. Surface the risk you didn't know you had.
  5. Tie labels to DLP policies. This is where enforcement begins. A label without a policy behind it is a suggestion. A label with a policy is a guardrail.
  6. Extend to AI governance. Once your data is labeled, you have the vocabulary to write AI use policies that are actually enforceable.

It's not a six-month project. It's an ongoing program. But the first step, defining your tiers and deploying labels, can be done in weeks, and the security posture improvement is immediate. Stop planning to plan. Start labeling.

The Bottom Line

Data classification and labeling is the least glamorous, most impactful thing you can do for your security program right now. It's the prerequisite for DLP, for compliance reporting, for records management, and critically, for responsible AI adoption.

You can't govern what you haven't labeled. In a world where AI is about to touch every corner of your data estate, governing your data isn't optional anymore. Start labeling. Everything else gets easier after that.

Need Help Getting Started?

Circle 6 Systems helps government agencies and enterprises build data classification programs that enable responsible AI adoption.

Start a Conversation

Duke Dillingham is a compliance officer and security awareness consultant at Circle 6 Systems. The "Dude Diligence" comic book training series is available at circle6systems.com/training. Duke and Dude Diligence have never been seen in the same room at the same time, which Duke attributes to scheduling conflicts.