What Is DLP and How Does It Work?

What is DLP?

Data Loss Prevention (DLP), per Gartner, may be defined as technologies which perform both content inspection and contextual analysis of data sent via messaging applications such as email and instant messaging, in motion over the network, in use on a managed endpoint device, and at rest in on-premises file servers or in cloud applications and cloud storage. These solutions execute responses based on policy and rules defined to address the risk of inadvertent or accidental leaks or exposure of sensitive data outside authorized channels.

DLP technologies are broadly divided into two categories – Enterprise DLP and Integrated DLP. While Enterprise DLP solutions are comprehensive and packaged in agent software for desktops and servers, physical and virtual appliances for monitoring networks and email traffic, or soft appliances for data discovery, Integrated DLP is limited to secure web gateways (SWGs), secure email gateways (SEGs), email encryption products, enterprise content management (ECM) platforms, data classification tools, data discovery tools, and cloud access security brokers (CASBs).

How does DLP work?

Understanding the differences between content awareness and contextual analysis is essential to comprehend any DLP solution in its entirety. A useful way to think of the difference is if content is a letter, context is the envelope. While content awareness involves capturing the envelope and peering inside it to analyze the content, context includes external factors such as header, size, format, etc., anything that doesn’t include the content of the letter. The idea behind content awareness is that although we want to use the context to gain more intelligence on the content, we don’t want to be restricted to a single context.

Once the envelope is opened and the content processed, there are multiple content analysis techniques which can be used to trigger policy violations, including:

  1. Rule-Based/Regular Expressions: The most common analysis technique used in DLP involves an engine analyzing content for specific rules such as 16-digit credit card numbers, 9-digit U.S. social security numbers, etc. This technique is an excellent first-pass filter since the rules can be configured and processed quickly, although they can be prone to high false positive rates without checksum validation to identify valid patterns.
  2. Database Fingerprinting: Also known as Exact Data Matching, this mechanism looks at exact matches from a database dump or live database. Although database dumps or live database connections affect performance, this is an option for structured data from databases.
  3. Exact File Matching: File contents are not analyzed; however, the hashes of files are matches against exact fingerprints. Provides low false positives although this approach does not work for files with multiple similar but not identical versions.
  4. Partial Document Matching: Looks for complete or partial match on specific files such as multiple versions of a form that have been filled out by different users.
  5. Conceptual/Lexicon: Using a combination of dictionaries, rules, etc., these policies can alert on completely unstructured ideas that defy simple categorization. It needs to be customized for the DLP solution provided.
  6. Statistical Analysis: Uses machine learning or other statistical methods such as Bayesian analysis to trigger policy violations in secure content. Requires a large volume of data to scan from, the bigger the better, else prone to false positives and negatives.
  7. Pre-built categories: Pre-built categories with rules and dictionaries for common types of sensitive data, such as credit card numbers/PCI protection, HIPAA, etc.

There are myriad techniques in the market today that deliver different types of content inspection. One thing to consider is that while many DLP vendors have developed their own content engines, some employ third-party technology that is not designed for DLP. For example, rather than building pattern matching for credit card numbers, a DLP vendor may license technology from a search engine provider to pattern match credit card numbers. When evaluating DLP solutions, pay close attention to the types of patterns detected by each solution against a real corpus of sensitive data to confirm the accuracy of its content engine.

DLP best practices strengthen data security

Best practices in DLP combine technology, process controls, knowledgeable staff, and employee awareness. Below are recommended guidelines for developing an effective DLP program:

  1. Implement a single centralized DLP program - Many organizations implement inconsistent, ad hoc DLP practices and technologies, which various departments and business units implement. This inconsistency leads to a lack of visibility into data assets and weak data security. In addition, employees tend to ignore department DLP programs that the rest of the organization does not support.
  2. Evaluate internal resources - To create and execute a DLP plan, organizations need personnel with DLP expertise, including DLP risk analysis, data breach response and reporting, data protection laws, and DLP training and awareness. Some government regulations require organizations to either employ internal staff or retain external consultants with data protection knowledge. For instance, the GDPR includes provisions that affect organizations that sell goods or services to European Union (EU) consumers or monitor their behavior. The GDPR mandates a data protection officer (DPO) or staff that can assume DPO responsibilities, including conducting compliance audits, monitoring DLP performance, educating employees on compliance requirements, and serving as a liaison between the organization and compliance authorities.
  3. Conduct an inventory and assessment -  An evaluation of the types of data and their value to the organization is an important early step in implementing a DLP program. This involves identifying relevant data, where the data is stored, and whether it is sensitive data—intellectual property, confidential information, or data that regulations address. Some DLP products, can quickly identify information assets by scanning the metadata of files and cataloging the result, or if necessary, open the files to analyze the content. The next step is to evaluate the risk associated with each type of data, if the data is leaked. Additional considerations include data exit points and the likely cost to the organization if the data is lost. Losing information about employee benefits programs carries a different level of risk than the loss of 1,000 patient medical files or 100,000 bank account numbers and passwords.
  4. Implement in phases -  DLP is a long-term process that is best implemented in stages. The most effective approach is to prioritize types of data and communication channels. Likewise, consider implementing DLP software components or modules as needed, based on the organization's priorities, rather than all at once. The risk analysis and data inventory aids establishing these priorities.
  5. Create a classification system -  Before an organization can create and execute DLP policies, it needs a data classification framework or taxonomy for both unstructured and structured data. Data security categories might include confidential, internal, public, personally identifiable information (PII), financial data, regulated data, intellectual property, and others. DLP products can scan data using a pre-configured taxonomy, which the organization may later customize, to help identify the key categories of data. While DLP software automates and speeds classification, humans select and customize the categories. Content owners can also visually evaluate certain types of content that cannot be identified using simple keywords or phrases.
  6. Establish data handling and remediation policies - After creating the classification framework, the next step is to create (or update) policies for handling different categories of data. Government requirements specify the DLP policies for handling sensitive data. DLP solutions typically apply pre-configured rules or policies based on various regulations, such as HIPAA or GDPR. DLP staff can then customize the policies to the needs of the organization. To administer the policies, DLP enforcement products prevent and monitor outgoing channels (like email and web chat) and provide options for handling potential security breaches. For instance, an employee about to send an email with a sensitive attachment might receive a pop-up that suggests encrypting the message, or the system might block it entirely or redirect it to a manager. The response is based on rules the organization establishes.
  7. Educate employees -  Employee awareness and acceptance of security policies and procedures is critical to DLP. Education and training efforts, such as classes, online training, periodic emails, videos and write-ups can improve employee understanding of the importance of data security and enhance their ability to follow recommended DLP best practices. Penalties for breaching data security may also improve compliance, especially if they are clearly defined. The SANS Institute provides a variety of data security training and awareness resources.

Explore more Security Awareness topics