Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
7 views21 pages

U1 B CLSRM

The document provides an overview of Big Data, covering its types, history, architecture, and importance, along with the challenges of security, compliance, and privacy. It discusses the complexities of protecting Big Data, emphasizing the need for strategic security measures and the classification of data to enhance protection. Additionally, it addresses ethical considerations and legislative frameworks surrounding data privacy, highlighting the balance between innovation and regulatory compliance.

Uploaded by

lolrofl102938
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views21 pages

U1 B CLSRM

The document provides an overview of Big Data, covering its types, history, architecture, and importance, along with the challenges of security, compliance, and privacy. It discusses the complexities of protecting Big Data, emphasizing the need for strategic security measures and the classification of data to enhance protection. Additionally, it addresses ethical considerations and legislative frameworks surrounding data privacy, highlighting the balance between innovation and regulatory compliance.

Uploaded by

lolrofl102938
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Lecture-4&5

Big Data (KCS-061)


Unit 1: Introduction to Big Data
• Types of digital data
• History of Big Data innovation
• Introduction to Big Data platform, drivers for Big Data
• Big Data architecture and characteristics
• 5 Vs of Big Data
• Big Data technology components
• Big Data importance and applications
• Big Data features – security, compliance, auditing and protection
• Big Data privacy and ethics
• Big Data Analytics
• Challenges of conventional systems
• Intelligent data analysis, nature of data, analytic processes and tools, analysis vs
reporting, modern data analytic tools
Big Data features – security, compliance,
auditing and protection
• The sheer size of a Big Data repository brings with it a major security challenge, generating the
age-old question presented to IT: How can the data be protected?

• However, that is a trick question—the answer has many caveats, which dictate how security
must be imagined as well as deployed.

• Proper security entails more than just keeping the bad guys out; it also means backing up data
and protecting data from corruption.
1. The first caveat is access. Data can be easily protected, but only if you eliminate access to the
data. That’s not a pragmatic solution, to say the least. The key is to control access, but even then,
knowing the who, what, when, and where of data access is only a start.
2. The second caveat is availability: controlling where the data are stored and how the data are
distributed. The more control you have, the better you are positioned to protect the data.
3. The third caveat is performance. Higher levels of encryption, complex security methodologies,
and additional security layers can all improve security. However, these security techniques all
carry a processing burden that can severely affect performance.
4. The fourth caveat is liability. Accessible data carry with them liability, such as the sensitivity of
the data, the legal requirements connected to the data, privacy issues, and intellectual property
concerns.

• Adequate security in the Big Data realm becomes a strategic balancing act among these caveats
along with any additional issues the caveats create. Nonetheless, effective security is an
obtainable, if not perfect, goal. With planning, logic, and observation, security becomes
manageable and omnipresent, effectively protecting data while still offering access to authorized
users and systems.
• PRAGMATIC STEPS TO SECURING BIG DATA
• A starting point is to basically get rid of data that are no longer needed.

• Of course, there are situations in which information cannot legally be destroyed; in that case,
the information should be securely archived by an offline method.

• The real challenge may be determining whether the data are needed—a difficult task in the
world of Big Data, where value can be found in unexpected places.
• For example, getting rid of activity logs may be a smart move from a security standpoint. However, those
logs could be used to determine scale, use, and efficiency of large data systems, an analytical process that
falls right under the umbrella of Big Data analytics.
• There is no easy answer to the above dilemma, and it becomes a case of choosing the lesser of two
evils. If the data have intrinsic value for analytics, they must be kept, but that does not mean they
need to be kept on a system that is connected to the Internet or other systems. The data can be
archived, retrieved for processing, and then returned to the archive.
CLASSIFYING DATA
• Protecting data is much easier if data is classified into categories, e.g., internal email between
colleagues is different from financial report, etc.
• Simple classification can be: financial, HR, sales, inventory, and communications.
• Once organizations better understand their data, they can take important steps to segregate the
information and that makes it easier to employ security measures like encryption and monitoring
more manageable.
• PROTECTING BIG DATA ANALYTICS
• It is sad to report that protecting data is an often forgotten inclination in the data center, an
afterthought that falls behind current needs. The launch of Big Data initiatives is no exception.

• Big Data offers more of a challenge than most other data center technologies, making it the
perfect storm for a data protection disaster.

• A real concern with Big Data is the fact that Big Data contains all of the things you don’t want
to see when are trying to protect data. It can contain very unique sample sets (for example,
data from devices that monitor physical elements - e.g., traffic, movement, soil pH, rain, wind).
• Such uniqueness also means that you can’t leverage time-saving backup and security
technologies such as deduplication.

• Significant issue is the large size and number of files involved in Big Data Analytics
environment. Backup bandwidth and/or the backup appliance must be large and the receiving
devices must be able to ingest data at the delivery rate of data.
• BIG DATA AND COMPLIANCE
• New data types and methodologies are still expected to meet the legislative requirements placed
on businesses by compliance laws. There will be no excuses accepted and no passes given if a
new data methodology breaks the law.

• Preventing compliance from becoming the next Big Data nightmare is going to be the job of
security professionals.

• Health care is a good example of Big Data compliance challenge, i.e., different data types and vast
rate of data from different devices, etc.
• In the medical industry, the primary problem is that unsecured Big Data stores are filled with
content that is collected and analyzed in real time and is often extraordinarily sensitive:
intellectual property, personal identifying information, and other confidential information. The
disclosure of this type of data, by either attack or human error, can be devastating to a company
and its reputation.

• Unfortunately, most data stores in the NoSQL world (i.e., Hadoop, Cassandra and MongoDB) do
not incorporate sufficient data security tools to provide what is needed.
• Lessons learned by the health care industry are the following:
• Control access by process, not job function.
• Server and network administrators, cloud administrators, and other employees often have
access to more information than their jobs require because the systems simply lack the
appropriate access controls. This is not ideal.
• Secure the data at rest.
• All Big Data, especially sensitive information, should remain encrypted, whether it is stored
on a disk, on a server, or in the cloud and regardless of whether the cloud is inside or outside
the walls of your organization.
• Protect the cryptographic keys and store them separately from the data.
• Cryptographic keys are the gateway to the encrypted data. If the keys are left unprotected,
the data are easily compromised. Organizations often cobble together their own encryption
and key management solution. Storing the cryptographic keys on a separate, hardened
server, either on the premises or in the cloud, is the best practice for keeping data safe and
an important step in regulatory compliance.
• Create trusted applications and stacks to protect data from rogue users.
• You may encrypt your data to control access, but what about the user who has access to the
configuration files that define the access controls to those data? Encrypting more than just
the data and hardening the security of your overall environment—including applications,
services, and configurations—gives you peace of mind that your sensitive information is
protected from malicious users and rogue employees.
• Once you begin to map and understand the data, opportunities will be evident that will lead to
automating and monitoring compliance and security through data warehouse technologies.
• Of course, automation does not solve every problem for security, compliance, and backup. There
are still some very basic rules that should be used to enable security while not derailing the value
of Big Data:
• Ensure that security does not impede performance or availability.
• Security controls that limit any of the 5 Vs of Big Data are a nonstarter for organizations.
• Pick the right encryption scheme.
• Some data security solutions encrypt at the file level or lower. Those methodologies can be
cumbersome, especially for key management. Likewise, encryption at the operating system
level, but without advanced key management, can leave Big Data woefully insecure. A
transparent data encryption solution optimized for Big Data should be considered.
• Ensure that the security solution can evolve with your changing requirements.
• The flexibility to migrate between cloud providers and models based on changing business
needs is a requirement, and this is no different with Big Data technologies. When evaluating
security, you should consider a solution that is platform-agnostic and can work with any Big
Data file system or database.
THE INTELLECTUAL PROPERTY CHALLENGE
• One of the biggest issues around Big Data is the concept of intellectual property (IP).

• Basically, intellectual property refers to creations of the human mind, such as inventions, literary
and artistic works, and symbols, names, images, and designs used in commerce.

• With Big Data consolidating all sorts of private, public, corporate, and government data into a
large data store, there are bound to be pieces of IP in the mix.

• That information has to be properly protected, which may prove to be difficult, since Big Data
analytics is designed to find nuggets of information and report on them.
• Some basic rules are:
• Understand what IP is and know what you have to protect
• Prioritize protection
• Label (confidential information should be labeled)
• Educate employees
• Know your tools (tools that can be used to track IP stores)
• Use a holistic approach (includes internal risks as well as external ones)
• Use a counterintelligence mind-set (think as if you are spying on your company and ask how
would you do it)

• The above guidelines can be applied to almost any information security paradigm that is geared
toward protecting IP.
Big Data privacy and ethics
• The Big Questions of Big Data Privacy are the following:
• What companies do with personal data they collect?
• How do we know that they are doing what they say?
• When exactly our right violated?
• Why should it matter to us?
• How to make Big Data Privacy friendly?

• Debates have been going on for years and will continue as we work out the wrinkles between
“rights”—the individual’s rights to determine what personal information he or she is willing to
“barter” in exchange for free services versus the service provider’s rights to determine how to
continue to provide free services.
The Privacy Landscape
• There are four main constituents involved in the privacy landscape. The table below shows how
they are impacted.
• Most companies develop their own privacy policies as a matter of establishing a modicum of
“trust” with consumers.
• There are several variations of seven principles outlined in the “EU-US Safe Harbor Principles”
which most companies have engrained into their self-regulation for data privacy:
• Seven Global Privacy Principles:
1. Notice (Transparency): Inform individuals about the purposes for which information is collected.
2. Choice: Offer individuals the opportunity to choose (or opt out) whether and how personal
information they provide is used or disclosed.
3. Consent: Only disclose personal data information to third parties consistent with the principles of
notice and choice.
4. Security: Take responsible measures to protect personal information from loss, misuse, and
unauthorized access, disclosure, alteration, and destruction.
5. Data Integrity: Assure the reliability of personal information for its intended use and reasonable
precautions and ensure information is accurate, complete, and current.
6. Access: Provide individuals with access to personal information data about them.
7. Accountability: A firm must be accountable for following the principles and must include
mechanisms for assuring compliance.
• Attorney Andrew Reiskind, Managing Counsel, Privacy and Data Protection for MasterCard,
shares his perspectives on the topic of privacy by first clarifying what it means:
• Privacy is about how you use personal data. It’s the conscious choice about how that
information is used. We usually bucket it into the collection of data, the use of that data, and
the disclosure of that data—to whom are you giving that data?

• Personal information (PI) is a term that Reiskind notes has changed meaning over time.
• Historically, we’ve always thought of it as a full name—a first name and a last name—a postal
address and a phone number. Over time it’s now included an e-mail address, a facsimile
number, and a government-issued identification number (e.g., SSN).
• IP addresses, Cookie IDs, financial account numbers etc. are now considered personal
information.
• Personally Identifiable Information (PII): This personal data which is unique to the individual
is called personally identifiable information.

• In addition to personal information there’s concern for the use and collection of “sensitive
information”.
• Let’s explore the differences between personally identifiable information, sensitive information,
and other information as described in the following table:
Can Data Be Anonymized?
• The issue at hand is, can data gathered for a specific purpose then be used for other purposes
(referred to as, “secondary use”) by somehow being stripped of the classification of personally
identifiable information (PII) and allowing the individuals associated with that data to remain
anonymous?

• The simple answer is yes!

• The challenge is the more you anonymize the data, the less utility the data has.

• There are many that believe that removing identifying information also, inevitably, removes
contextual information that has potential value to someone analyzing the data.

• On the flipside, others point out that it doesn’t take much to reconstitute an identity. So
anonymizing data might seem pointless.
Privacy May Be the Wrong Focus
"Where it relates to businesses, I think it is so easy to violate privacy that perhaps a better framing
for the conversation is one of ethics. Data privacy is the thing you do to keep from getting sued,
data ethics is the thing you do to make your relationship with your customers positive."
— James Stogdill, O’Reilly Radar

• There are many examples of where the focus isn’t just about privacy; it’s about what James
Stogdill refers to as ethics and a positive customer relationship.
• For example, there probably isn’t a customer in the world that would be okay with an iPhone app that
secretly downloaded your entire address book without your consent. That would classify as an intrusive tactic
that would certainly irritate even the most liberal social butterflies.

• It seems that there will always be a need to balance the reasonable against the unreasonable uses of
personal data. The boundaries for such will continue to be tested by new circumstances and
scenarios.
Some Legislations: Privacy or Innovation
• The concept “Right to be Forgotten” - discussed in EU since 2006
• European Data Protection Regulation and Directive 95/46/EC - as of May 2014, Google has
removed 1,390,838 URLs
• Russian Personal Data Law, 01 September 2015 - new rules obliging all companies offering
Internet services to store its citizens personal data inside the country
• Federal Trade Commission - has ordered Google, Facebook, Twitter, Apple to take care about
privacy of data they collect

✔ There has to be balance of regulatory approach to not intervene and stop this innovation and
technology.
Thank You

You might also like