Others posts in this series:

Use case #1: managing external data sharing

In a growing sign that enterprises trust Microsoft to protect their sensitive data (or perhaps that users are operating unaware of their organizations’ cloud policies), Skyhigh has found that employees upload a significant amount of sensitive data to Office 365. Analyzing the data loss prevention policies customers implement using Skyhigh, we found that, on average, 17.1% of files an enterprise stores in OneDrive and SharePoint Online are sensitive.

Depending on your organization’s compliance and security posture, your policies may dictate this information can be stored in Office 365 provided it is not shared inappropriately. But, many companies have high-value or regulated data they wish to prevent from living in the cloud.  And, regardless of compliance requirements, some types of data are simply unfit to be stored in the cloud. For example, Skyhigh has found the average enterprise stores 204 files containing user passwords in OneDrive. These files often take the form of a Word or Excel document with usernames and passwords for all the applications and devices an employee uses.

Get the Free eBook

Download the 67-page ebook to learn how a CASB helps secure data in Office 365, deployment architecture, and evaluation criteria.

Download Now

Preventing regulated or high-value data from being stored in the cloud is a two-part problem: 1) detecting sensitive data and 2) enforcing controls to prevent this data from living within Office 365. Identifying sensitive data is not a trivial undertaking because it often requires going beyond simple keyword matching. Consider the following real-world examples of sensitive content that enterprises rely on a CASB to detect and prevent from being stored in Office 365 or shared inappropriately:

  • A lexicon containing hundreds or thousands of keywords that are common across several different corporate policies (e.g. prescription drug names, stock symbols)
  • Data classification tags applied by classification technologies that appear in the metadata of files (e.g. confidential, internal only)
  • Standard alphanumeric patterns that follow a set of defined rules such as length, prefix or suffix, or checksum (e.g. Social Security numbers, credit card numbers)
  • Custom alphanumeric patterns that are unique to the organization and follow a set of defined rules (e.g. parts numbers, product SKUs)
  • All versions of a specific, sensitive document including the exact file or any derivative of the file (e.g. design document for production process, legal contract)
  • Any piece of content that refers to current or former customers (e.g. any field from a structured database with personal data on 300 million customers)

When deploying data loss prevention technology, enterprises want to simultaneously minimize the number of sensitive files missed by the system (false negatives) and minimize the number of non-sensitive files flagged by the system (false positives). A CASB uses a variety of technologies to match the above sensitive content types and enforce policies.

How Skyhigh helps

Skyhigh delivers a robust content-aware DLP engine with comprehensive remediation and reporting. Many organizations have standard data loss and compliance scenarios and Skyhigh includes dozens of off-the-shelf DLP templates for common use cases such as HIPAA compliance and M&A documents. These policies are customizable, or you can create your own unique DLP policies using a flexible policy framework that leverages Boolean logic to combine two or more rules and associated remediation actions.

DLP policies can contain rules leveraging document metadata and content including file attributes, keywords, keyword dictionaries, document classification tags, data identifiers, regular expressions, and fingerprinting of structured databases and unstructured files.

These rules can be combined in nested groups connected with AND and OR logic. Also, rule sets support an associated severity. For example, if a document contains one credit card number, the violation severity can be set to “medium” and if it contains 100 or more violations the severity can be set to “high”, and since remediation actions within a policy can be tiered based on severity, you can define a policy, such as “quarantine files with high severity violations but only alert users for files with low severity violations”. Skyhigh also supports integration to on-premises DLP solutions from Symantec, EMC RSA, Intel McAfee, and Websense to leverage existing policies.

Skyhigh can target DLP policies to specific user groups, business units, roles, or departments by pulling user information from directory services that support LDAP, such as Microsoft Active Directory. For example, you can target a DLP policy to a specific department, or to all users with a specific role. Policies can also exclude specific groups.

Skyhigh supports numerous automated remediation actions in response to DLP policy violations. Depending on the deployment architecture, Skyhigh can enforce policies via blocking, quarantining, deleting, coaching, and notification.

Skyhigh’s review interface provides full context of the violation including the user, file name, and a highlighted excerpt showing the content that triggered the violation. Depending on the deployment mode, a compliance reviewer can also take manual action. For example, some enterprises choose to run DLP policies in a monitor-only mode. If during review the compliance user decides action is required, she can quarantine or delete the file from the review interface. Both automated and manual remediation can be rolled back if required to restore a file. For all deployment modes, incidents can be marked with status and owners for follow up.

Skyhigh also integrates with SIEMs via syslog to provide a real-time feed of DLP violations so that enterprises can leverage pre-existing DLP incident workflows.

How it works: deployment architecture

While either the API or inline proxy modes can support scanning new files, inline proxy modes do not support scanning data that is already stored at rest because they sit inline and only have visibility into data as it is uploaded to Office 365. To scan pre-existing data, a CASB must integrate to Office 365 via API with permissions to scan existing content. All CASB modes support scanning files uploaded to the cloud, but only the API mode supports scanning content created natively within the cloud applications such as Word Online and Excel Online because a proxy cannot enforce policies as content is typed character-by-character into the browser.