0% found this document useful (0 votes)

33 views47 pages

DLP Systems Models Architecture and Algo

Uploaded by

thiagothiagofonseca2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views47 pages

DLP Systems Models Architecture and Algo

Uploaded by

thiagothiagofonseca2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

DLP Systems: Models, Architecture and Algorithms

Liwei Ren, Ph.D, Sr. Architect

Data Security Research, Trend Micro™
May, 2013, UCSC, Santa Cruz, CA

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 1

Backgrounds:
• Liwei Ren, Data Security Research, Trend Micro™
– Research interests:
• DLP, differential compression, data de-duplication, file transfer protocols, database
security, and practical algorithms.
– Education:
• MS/BS in mathematics, Tsinghua University, Beijing
• Ph.D in mathematics, MS in information science, University of Pittsburgh
– Relevant works for this talk:
• Provilla, Inc : a startup focusing on endpoint based DLP products and solutions. It was
co-founded by Liwei and acquired by Trend Micro a few years ago.
• Patents --- Liwei holds 10+ patents for DLP, mostly, for DLP content inspection
techniques.

• Trend Micro™
– Global security software company with headquarter in Tokyo, and R&D centers in
Nanjing, Taipei and Silicon Valley.
– One of top 3 anti-malware vendors
– Pioneer in cloud security
– DLP vendor via Provilla™ acquisition
Copyright 2011 Trend Micro Inc. 2
Agenda
• What is Data Loss Prevention (DLP) ?
• Concepts, Models, Architecture
• Content Inspection Problems
• Practical Algorithms for DLP
• Summary
• References
• Q&A

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 3

What Is Data Loss Prevention?
• What is Data Loss Prevention?
– Data loss prevention (aka, DLP) is a data security technology that detects
data breach incidents in timely manner and prevents them by monitoring
data in-use (endpoints), in-motion (network traffic), and at-rest (data
storage) in an organization’s network.
– A.k.a. ,Data Leak Prevention (DLP),Information Leak Prevention (ILP) or
Information Leak Detection and Prevention (ILDP).

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 4

What Is Data Loss Prevention?
• A Few Elements of a DLP system:
– WHAT data to protect?
– WHO leaks data?
– HOW the data is leaked?
– WHERE to protect data?
– WHAT actions to take?

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 5

Concepts, Models and Architecture
• WHAT data to protect?

• WHO causes data leaks?

External Hackers

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 6

Concepts, Models and Architecture
Three Data States:

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 7

Concepts, Models and Architecture
• Data-in-use:

• Data-in-motion:

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 8

Concepts, Models and Architecture
• Data-at-rest at risk:

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 9

Concepts, Models and Architecture
• DLP for data-in-use and data-in-motion:

• A conceptual view!

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 10

Concepts, Models and Architecture
• DLP for data-in-use and data-in-motion:

• A technical view!
Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 11
Concepts, Models and Architecture
• DLP Model for data-in-use and data-in-motion:
– If DATA flows from SOURCE to DESTINATION via CHANNEL, the
system takes ACTIONs

– DATA specifies what confidential data is

– SOURCE can be an user, an endpoint, an email
address, or a group of them
– DESTINATION can be an endpoint, an email address,
or a group of them, or simply the external world
– CHANNEL indicates the data leak channel such as
USB, email, network protocols and etc
– ACTION is the action that needs to be taken by the
DLP system when an incident occurs

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 12

Concepts, Models and Architecture
• DLP for data-at-rest:

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 13

Concepts, Models and Architecture
• DLP Model for data-at-rest:
– If DATA resides at SOURCE , the system takes ACTIONs

– DATA specifies what the sensitive data (which has

potential for leakage) is
– SOURCE can be an endpoint, a storage server or a
group of them
– ACTION is the action that needs to be taken by the
DLP system when confidential data is identified at
rest.

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 14

Concepts, Models and Architecture
• Typical DLP systems:
– DLP Management Console
– DLP Endpoint Agent
– DLP Network Gateway
– Data Discovery Agent (or Appliance)

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 15

Concepts, Models and Architecture
• Typical DLP system architecture:

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 16

Agenda
• What is Data Loss Prevention (DLP) ?
• Concepts, Models, Architecture

•Content Inspection Problems

• Practical Algorithms for DLP
• Summary
• References
• Q&A

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 17

Content Inspection Problems
• Two fundamental problems for a DLP system:

• It is a pair of problems that always come together:

• One determines data sensitivity based on what has been
defined.

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 18

Content Inspection Problems
• Four typical approaches for <defining, determining>
sensitive data in a DLP system:

1. Document fingerprinting
2. Database record fingerprinting
3. Multiple Keyword matching
4. Regular expression matching

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 19

Content Inspection Problems
• Document fingerprinting:
• A technique for identifying modified versions of known documents

• Problem Definition (Model 1):

– Let S= { T1, T2, …,Tn} be a set of known texts
– Given a query text T, one needs to determine if there exist at least a
docu e t t ϵ S such that T a d t share co o textual co te t
significantly, where multiple returned documents are ranked by how
much common content are shared.

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 20

Content Inspection Problems
• An alternative model (Model 2):
– Let S= { T1, T2, …,Tn} be a set of known texts
– Given a query text T and X%, one needs to determine if there exist at
least a text t ϵ S such that SIM T,t ≥ X%, where SIM is a fu ctio to
measure the similarity between two texts.
• Multiple documents are ranked by the percentiles .

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 21

Content Inspection Problems
• Database record fingerprinting:
– A technique for identifying sensitive data records within a text.
– A.k.a., Exact Match in DLP field

• Use Case:
– We have several personal data records of <SSN, Phone#, address>
that are included in a text, we want to extract all records from the
text to determine the sensitivity of the file.

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 22

Content Inspection Problems

SSN Phone # Address

178-76-6754 412-876-6789 43 Atword Street, Pittsburgh, PA 15260
159-87-8965 408-780-8876 76 Parkview Ave, Sunnyvale, CA 94086
…… …… ……

An example: a text contains a few data records:

Hhhhhdds ghghg 178-76-6754 ggkjkfddfdkkkk879-45-6785kjkjjk 43

Atword Street, Pittsburgh, PA 15260 kllkll 412-876-6789 kjkjjkj 76
Parkview Ave, Sunnyvale, CA 94086 hhsjskkdhjhjhj 408-780-8876
hjhjkjkjjj 159-87-8965 hjhjhjhjmnnmnxcbls w243 54y45 wefddew
dddw3n nn xxxxxxxxxx

Copyright 2011 Trend Micro Inc. 23

Content Inspection Problems
• Problem Definition (Model 3) :
– Let S= { R1, R2, …,Rn} be a set of known data records from a same table.
– Given any text T, one needs to extract all records or sub-records from T
while the record cells may appear randomly within the text.

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 24

Content Inspection Problems
• Problem Definition for Keyword Match:
– Let S= {K1,K2,…,Kn} be a dictionary of keywords.
– Given any text T, one needs to identify all keyword occurrences in T.
• Problem Definition for RegEx Match:
– Let S= {P1,P2,…,Pm} be a set of RegEx patterns.
– Given any text T, one needs to identify all pattern instances from T.

Easy problems?
– Not at all! For large n and m, one will
have performance issue.
– That’s the problem of scalability.
– Scalable algorithms must be provided.

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 25

Agenda
• What is Data Loss Prevention (DLP) ?
• Concepts, Models, Architecture
• Content Inspection Problems

• Practical Algorithms for DLP

• Summary
• References
• Q&A

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 26

Practical Algorithms for DLP
• We investigate some algorithms for 2 problems:

1. Document fingerprinting
2. Multiple keyword matching

Assumption: a text T is a sequence of UTF-8 characters without

loss of generality.

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 27

Document Fingerprinting Algorithms
• Lets investigate algorithmic solutions for Model 2 ( document
fingerprinting).
• Analysis for Solution:
1. We need to construct the function SIM(T,t). For example:
– SIM(T,t = |T ∩t| /Min |T|,|t|) based on common sub-strings.

2. An Obvious Challenge:
– If n is large, say, in scale of millions, we can not compute SIM(T, Tk) one by one
to find the t that satisfies SIM(T,t ≥ X%
– We need to figure out an approach that can identify a possible candidate quickly.

3. General search engines like Google use keywords to index/identify

the documents. Should we? There are too many keywords and
language dependency. The answer is NO.

4. So, which features can we use for indexing/searching?

– One answer is documents fingerprints.
Copyright 2011 Trend Micro Inc. 28
Document Fingerprinting Algorithms
• What are document fingerprints?
– A fingerprint is a hash value
– One text has multiple fingerprints
– Unique to the text: two irrelevant texts do not share any fingerprints.
– Robustness: it can survive moderate textual changes.

Copyright 2011 Trend Micro Inc. 29

Document Fingerprinting Algorithms
• How to extract fingerprints from a text?
– Anchoring point:
• A point in the text that can endure the moderate changes.
• Its neighborhood (of fixed size) is unique to the text
– We select a few anchoring points to fingerprints:
• To generate hash values around their neighborhoods.
• These hash values are the fingerprints

•Samples of anchoring points and their neighborhood:

Thereareabundantliteraturesonhowtogeneratedifferencebetween
twofilesBasicallytherearetwofundamentalapproachestoattackthisgenericp
roblemLCSmodelwhereLCSstandsforlargestcommonsubsequenceCalculate
thelargestcommonsubsequenceoftwostringFindasequenceofeditoperation
sbasedontheLCSsothatonecanapplytheeditoperationstothereferencefiletoc
onstructthetargetfileBlock movemodel

Document Fingerprinting Algorithms
• Conclusion : we have a solution that consists of two
algorithms and one search technology:
– An algorithm for computing SIM(T,t)
– An algorithm for fingerprint generator FPGEN(T)
– Fingerprint search engine

Document Fingerprinting Algorithms
• Fingerprint generation algorithm 1:
– INPUT: String T
• Select top M candidate characters based on a score function
– Character frequency n
– Character positio s i the text T: P , …, P
– SCORE(c) = SQRT(D(n) * [ P(n)-P(1)] / SQRT(D)
» Where D= [(P(2)-P(1)]2+ [(P(3)-P(2)] 2 + … + P -P(n-1)] 2
• For each selected character c
– Create a hash around the neighborhood at each occurrence
– Sort these hashes
– Select the top N hashes
– These N hashes are fingerprints
– OUTPUT: M*N fingerprints

Note 1: M and N are pre-defined. Note 2: Two keys of this algorithm are (a)
the score function; (b)sorting the hashes.
Copyright 2011 Trend Micro Inc. 32
Document Fingerprinting Algorithms
• About the score function:
– Why SQRT(n) ?
• Measurement of frequency for the given character
• The larger the value, more stable the character is
– Why [ P(n)-P(1)] / SQRT(D) ?
• Measurement of distribution for the given character
• The larger the value, more even distributed the character, and more
stable the character;
• WHY? Think about a constrained optimization problem:
– min f(X1,X2 , … Xm) = X12+ X22 + … Xm2
» subject to
Note: The solution of the
» X1+ X2 + … Xm = c AND optimization problem is Xk
» Xk ≥ , k= , ,…, = c/m, k= , ,…,m

33
Copyright 2011 Trend Micro Inc.
Document Fingerprinting Algorithms

There are alternative algorithms to construct a

fingerprint generation function.
We recently constructed algorithm 2:
– A novel approach based on rolling hash function
H(x);
– It selects anchoring points with first filter H(x) = 0
mod p;
– It further selects anchoring points with a heuristic
second filter.
Note 1: The anchoring – It also employs the asymmetric architecture of
points have better fingerprint match;
distribution across text.
Note 2: Two keys of this algorithm are (a) Rolling hash;
(b)Asymmetric use of two filters.
Copyright 2011 Trend Micro Inc. 34
Multiple Keyword Match

Essentially, it is a multi-pattern
string match problems.

Problem Definition:
– Let S={P1,P2,…,Pk} be multiple short strings as
patterns;
– Given any string T, one needs to identify all pattern
occurrences in T.

Multiple Keyword Match
Existing string match algorithms:
Algorithm Type
Naïve string match One pattern
Knuth–Morris–Pratt One pattern
Boyer-Moore One pattern
Boyer-Moore-Horspool One pattern
Boyer-Moore-Horspool-Raita One pattern
Rabin-Karp Multi-patterns
Aho-Corasick Multi-patterns

Sun-Manber Multi-patterns

Multiple Keyword Match
Boyer-Moore-Horspool (BMH) Algorithm
Key elements of the algorithm:
– Character comparison can be made from right to left, starting from the end of
the pattern.
– Ending Character Heuristics
• Consider that we are pointing to character R[i] and try to compare it with the
ending character of P
• Bad character
– If R[i ≠P m and R i is not included in P’s alphabet, then it is safe for the pointer to skip
m positions arriving at R[i+m].
– If R[i ≠P m , R i is included in P’s alphabet, and R i ’s last occurrence within P has
distance q from the end of P, then it is safe for the pointer to skip q positions arriving at
R[i+q].
• Good character
– If R[i] =P[m] , P is not matched , and R[i] has no other occurrences within P, then it is safe
for the pointer to skip m positions arriving at R[i+m].
– If R[i] =P[m] , P is not matched and R[i ’s last occurrence other than P m has distance q
from the end of P, then it is safe for the pointer to skip q positions arriving at R[i+q].
• Matched instance
– If R[i] =P[m] and P is matched, then save the instance.
– It is almost safe to move the pointer to skip m positions arriving at R[i+m].

Multiple Keyword Match
• Rabin-Karp Algorithm
– Hash based string match
• Rabin-Karp hash function H(S):
– For a given string S = x1x2…xm with length m, a hash function can be
constructed as:
• H(S) = x1bm-1 + x2 bm-2 + … + xm-1 b + xm mod q
• Where b is a base number, usually we take b=256 , and q is a big prime
number.
– For pattern P, H(P) = p1bm-1 + p2 bm-2 + … + pm-1 b + pm mod q
– If we denote Rk = R[k,k+m-1], we can derive H(Rk+1) from H(Rk) with
relatively small cost
– H(Rk+1) = [ H(Rk) – rkbm-1 ] b + rk+m mod q
– This is an iterative formula which is a common practice for algorithm
optimization

Multiple Keyword Match
• Rabin-Karp hash function:
– The quantity bm-1 mod q can be pre-calculated to save CPU time.
– For each iteration, we only need 5 arithmetic operations.
• It can be further reduced to 4
• One considers the number rkbm-1
– Horner’s rule
• H S = … x1b + x2)b + x3 b + … + x m-1 ) b + xm mod q
• Yet another formula for performance tuning

Multiple Keyword Match
• Rabin-Karp algorithm for multiple patterns:
– Input:
• String R, multiple patterns {P1,…,Pk},
• n= Length(R), mj =Length(Pj), q, b,
– Procedure:
• Step 0:
– Let m = Min(mk)
– Calculate the number bm-1 mod q
– Calculate all H(Pj ,…,m j= ,..,k and H R1 by Horner’s rule
• Step 1: Let i=1
• Step 2:
If there exists j in , ,…,k such that
H(Pj ,…,m = H Ri) and Pj = R[i,…, mj +i-1],
it is a match and output the instance
• Step 3: i = i + 1
• Step 4: If i > n-m, stop
• Step 5: Calculate H(Ri+1) using the iterative formula.
• Step 6 Go to step 2
– Output: All matched instances

Multiple Keyword Match
A practical hybrid method:
– BMH or Rabin-Karp
– If k < Magic-number,
• Use BMH k times,
• Otherwise, use Rabin-Harp
– Magic-number=100 is my exercise in DLP products.

Rabin-Karp has its weakness :

• when Min({Length(Pi)| i = , ,…,k is
small, say, less than 4, we have trouble.
• We need to introduce efficient multiple
pattern match for short patterns.

Multiple Keyword Match
We have a complimentary solution to RK algorithm when
handling multiple short patterns
– This is Reverse-trie matching algorithm.

A reverse-trie presents a set of keywords,

especially, it is good for CJK languages in
root
UTF-8 encoding :
c d

b c

a b a

a
The keyword set: {abc,abcd,acd}
Copyright 2011 Trend Micro Inc. 42
Agenda
• What is Data Loss Prevention (DLP) ?
• Concepts, Models, Architecture
• Content Inspection Problems
• Practical Algorithms for DLP

• Summary
• References
• Q&A

Summary
• What DLP is.
• DLP Security Model
• Architecture of a DLP System
• Four Content Inspection Problems
• Two Algorithms for DLP Content Inspection
– Document Fingerprinting
– Multi-keyword matching

References

• Liwei Ren et al., Document fingerprinting with asymmetric selection of anchor

points, US patent 8359472
• Liwei Ren et al., Two tiered architecture of named entity recognition engine, US
patent 8321434.
• Yingqiang Lin el al., Scalable document signature search engine, US patent
8266150
• Liwei Ren et al., Fingerprint based entity extraction, US patent 7950062
• Liwei Ren et al., Document match engine using asymmetric signature generation,
US patent 7860853
• Liwei Ren et al., Match engine for querying relevant documents, US patent
7747642
• Liwei Ren et al., Matching engine with signature generation, US patent 7516130

Q&A

Any questions?

Thank You!

Innovation is not a part

time job, and it is not even
a full-time job. It’s a life
style.

Data Loss Prevention (DLP) Interview Questions
100% (1)
Data Loss Prevention (DLP) Interview Questions
26 pages
Data Encryption
No ratings yet
Data Encryption
7 pages
Auditing Data Loss Prevention (DLP) Programs
100% (4)
Auditing Data Loss Prevention (DLP) Programs
15 pages
Six Sigma A Complete Step by Step Guide
100% (2)
Six Sigma A Complete Step by Step Guide
299 pages
DLP
100% (1)
DLP
30 pages
PPT2
No ratings yet
PPT2
35 pages
Overview of Data Loss Prevention Technology
No ratings yet
Overview of Data Loss Prevention Technology
34 pages
Definitive Guide Data Loss Prevention
100% (1)
Definitive Guide Data Loss Prevention
68 pages
Unit Viii DLP
No ratings yet
Unit Viii DLP
46 pages
Enzymes in Industrial Applications
No ratings yet
Enzymes in Industrial Applications
18 pages
Cips 2011 0126
No ratings yet
Cips 2011 0126
29 pages
Sony Ps3 Controller
33% (3)
Sony Ps3 Controller
12 pages
Data Security for IT Professionals
No ratings yet
Data Security for IT Professionals
21 pages
Data Loss Prevention 1715065780
No ratings yet
Data Loss Prevention 1715065780
15 pages
Exam 3
No ratings yet
Exam 3
29 pages
DLP Teaser Definitive Guide
No ratings yet
DLP Teaser Definitive Guide
12 pages
T GCPPCS B m3 l6 File en 22
No ratings yet
T GCPPCS B m3 l6 File en 22
27 pages
Club Hack Magazine 05 PDF
No ratings yet
Club Hack Magazine 05 PDF
29 pages
Data Leakage and Prevention
No ratings yet
Data Leakage and Prevention
50 pages
Data Loss Prevention
No ratings yet
Data Loss Prevention
12 pages
Week12 SDN Security
No ratings yet
Week12 SDN Security
25 pages
Data Leakage and Prevention
No ratings yet
Data Leakage and Prevention
36 pages
Fortigate Security: Data Leak Prevention (DLP)
No ratings yet
Fortigate Security: Data Leak Prevention (DLP)
35 pages
Deepdiveintomicrosoftpurviewdatalossprevention 365educonchi Final 231106153517 E6cc7048
No ratings yet
Deepdiveintomicrosoftpurviewdatalossprevention 365educonchi Final 231106153517 E6cc7048
47 pages
Microsoft Purview DLP For Endpoints
No ratings yet
Microsoft Purview DLP For Endpoints
32 pages
BJT AC Analysis for Electronics Students
100% (1)
BJT AC Analysis for Electronics Students
9 pages
DLP Journal PUBLISHED
No ratings yet
DLP Journal PUBLISHED
8 pages
W7-Monitoring Detection Logging. Mis
No ratings yet
W7-Monitoring Detection Logging. Mis
9 pages
3 DLP Systems As A Modern Information Security Control
No ratings yet
3 DLP Systems As A Modern Information Security Control
6 pages
DLP Article
No ratings yet
DLP Article
3 pages
2015 Renault Trafic 63463 PDF
No ratings yet
2015 Renault Trafic 63463 PDF
292 pages
Group 10 Presentation Alternative
No ratings yet
Group 10 Presentation Alternative
30 pages
Data Loss Prevention DLP 1688361220
No ratings yet
Data Loss Prevention DLP 1688361220
12 pages
2 Template 11& 14, Annex 3A
No ratings yet
2 Template 11& 14, Annex 3A
7 pages
Data Loss Prevention & Digital Transformation
No ratings yet
Data Loss Prevention & Digital Transformation
10 pages
DLP Complete Overview
No ratings yet
DLP Complete Overview
2 pages
Poor Mans DLP
No ratings yet
Poor Mans DLP
34 pages
Data Security for Enterprises
No ratings yet
Data Security for Enterprises
9 pages
Cloud - DLP
No ratings yet
Cloud - DLP
45 pages
O Level English Project
100% (1)
O Level English Project
3 pages
Best Practices For Endpoint Data Loss Prevention: This Report Sponsored by
No ratings yet
Best Practices For Endpoint Data Loss Prevention: This Report Sponsored by
15 pages
7.3 Options - Pricing Binomial-1
No ratings yet
7.3 Options - Pricing Binomial-1
25 pages
Data-Loss-Prevention 55 Us PDF
No ratings yet
Data-Loss-Prevention 55 Us PDF
2 pages
B-DLP Machine Learning - WP En-Us
No ratings yet
B-DLP Machine Learning - WP En-Us
8 pages
DLP-insider Threats
No ratings yet
DLP-insider Threats
2 pages
Radix Senegae
No ratings yet
Radix Senegae
13 pages
Transfer Pricing Aspects of Intra-Group Services What Are The Open Issues and What Can Be Improved
No ratings yet
Transfer Pricing Aspects of Intra-Group Services What Are The Open Issues and What Can Be Improved
9 pages
Data Loss Prevention
No ratings yet
Data Loss Prevention
6 pages
Thesis Paper On Net Zero Carbon
No ratings yet
Thesis Paper On Net Zero Carbon
68 pages
Ca Inter FM List of Important Concepts & List of Important Questions
No ratings yet
Ca Inter FM List of Important Concepts & List of Important Questions
5 pages
Maths Grade 12 15 August 2025
No ratings yet
Maths Grade 12 15 August 2025
9 pages
DLP 7.6.0 Deployment Brief
No ratings yet
DLP 7.6.0 Deployment Brief
3 pages
Insurance Premium Rates Guide
No ratings yet
Insurance Premium Rates Guide
6 pages
Beam Telecom PVT LTD.: 8-2-610/A, Road No.10, Banjara Hills, Hyderabad-500034 Tel: +91-40-66272727
No ratings yet
Beam Telecom PVT LTD.: 8-2-610/A, Road No.10, Banjara Hills, Hyderabad-500034 Tel: +91-40-66272727
2 pages
3-Designing A Free Data Loss Prevention System
No ratings yet
3-Designing A Free Data Loss Prevention System
30 pages
Data Loss Prevention en
No ratings yet
Data Loss Prevention en
3 pages
Dietetics As A Profession
No ratings yet
Dietetics As A Profession
11 pages
The Definitive Guide To Data Loss Prevention
No ratings yet
The Definitive Guide To Data Loss Prevention
12 pages
GSCH003 - Rev04 24.11.2021
No ratings yet
GSCH003 - Rev04 24.11.2021
55 pages
WINSEM2023-24 BCSE354E ETH VL2023240501599 2024-03-14 Reference-Material-II
No ratings yet
WINSEM2023-24 BCSE354E ETH VL2023240501599 2024-03-14 Reference-Material-II
13 pages
GDS Cycle V SOP
No ratings yet
GDS Cycle V SOP
5 pages
WB - 5 Judiciary
No ratings yet
WB - 5 Judiciary
39 pages
Data Loss Prevention: Muhammad Yousuf - Shabbir Anwer Khalidi
No ratings yet
Data Loss Prevention: Muhammad Yousuf - Shabbir Anwer Khalidi
13 pages
Lab 3
No ratings yet
Lab 3
16 pages
Avr Libc User Manual 1.4.6
No ratings yet
Avr Libc User Manual 1.4.6
372 pages
Analytical VaR VaR Mapping
No ratings yet
Analytical VaR VaR Mapping
13 pages
Consolidated Marksheet
No ratings yet
Consolidated Marksheet
3 pages
Oilfield Chemical Solutions
No ratings yet
Oilfield Chemical Solutions
13 pages
FANAS 7e PPT Chap02
No ratings yet
FANAS 7e PPT Chap02
17 pages
Procedure For Design and Development
No ratings yet
Procedure For Design and Development
8 pages
Forrester DLP Maturity Grid
No ratings yet
Forrester DLP Maturity Grid
20 pages
Working at Heights Verification of Competency RIIWHS204E OHS - Com.au
No ratings yet
Working at Heights Verification of Competency RIIWHS204E OHS - Com.au
4 pages
Otros DLP
No ratings yet
Otros DLP
4 pages
DLP Overview With TOC
No ratings yet
DLP Overview With TOC
6 pages
Data Loss Prevention
No ratings yet
Data Loss Prevention
6 pages
DDO26B1101
No ratings yet
DDO26B1101
6 pages
Data Loss Prevention Solution en
No ratings yet
Data Loss Prevention Solution en
7 pages
IELTS Listening Test 122
No ratings yet
IELTS Listening Test 122
6 pages
CaseStudy Ch8 (3) Eng
No ratings yet
CaseStudy Ch8 (3) Eng
2 pages
DLP Solution - Comprehensive Guide For SOC Analysts?
No ratings yet
DLP Solution - Comprehensive Guide For SOC Analysts?
10 pages
DLP Overview Document
No ratings yet
DLP Overview Document
3 pages
Data Loss Prevention Overview
No ratings yet
Data Loss Prevention Overview
2 pages
Understanding Data Loss Protection
No ratings yet
Understanding Data Loss Protection
3 pages
What Is DLP?: Data Loss Prevention
No ratings yet
What Is DLP?: Data Loss Prevention
3 pages
Data Loss Prevention DLP Solutions Explained
No ratings yet
Data Loss Prevention DLP Solutions Explained
3 pages
DLP Learning Material
No ratings yet
DLP Learning Material
3 pages

DLP Systems Models Architecture and Algo

Uploaded by

DLP Systems Models Architecture and Algo

Uploaded by

DLP Systems: Models, Architecture and Algorithms

Liwei Ren, Ph.D, Sr. Architect

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 1

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 3

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 4

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 5

• WHO causes data leaks?

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 6

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 7

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 8

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 9

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 10

– DATA specifies what confidential data is

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 12

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 13

– DATA specifies what the sensitive data (which has

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 14

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 15

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 16

•Content Inspection Problems

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 17

• It is a pair of problems that always come together:

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 18

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 19

• Problem Definition (Model 1):

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 20

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 21

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 22

SSN Phone # Address

An example: a text contains a few data records:

Hhhhhdds ghghg 178-76-6754 ggkjkfddfdkkkk879-45-6785kjkjjk 43

Copyright 2011 Trend Micro Inc. 23

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 24

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 25

• Practical Algorithms for DLP

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 26

Assumption: a text T is a sequence of UTF-8 characters without

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 27

3. General search engines like Google use keywords to index/identify

4. So, which features can we use for indexing/searching?

Copyright 2011 Trend Micro Inc. 29

•Samples of anchoring points and their neighborhood:

Copyright 2011 Trend Micro Inc. 30

Copyright 2011 Trend Micro Inc. 31

There are alternative algorithms to construct a

Copyright 2011 Trend Micro Inc. 35

Copyright 2011 Trend Micro Inc. 36

Copyright 2011 Trend Micro Inc. 37

Copyright 2011 Trend Micro Inc. 38

Copyright 2011 Trend Micro Inc. 39

Copyright 2011 Trend Micro Inc. 40

Rabin-Karp has its weakness :

Copyright 2011 Trend Micro Inc. 41

A reverse-trie presents a set of keywords,

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 43

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 44

• Liwei Ren et al., Document fingerprinting with asymmetric selection of anchor

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 45

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 46

Innovation is not a part

Classification 8/2/2013 Copyright 2011 Trend Micro Inc. 47

You might also like