1.43k likes | 1.45k Views
Discover the importance of auditing and securing big data systems. Learn about the challenges, benefits, and best practices for protecting sensitive data, monitoring real-time activities, and managing compliance. Don't miss out on the opportunity to leverage cloud computing for analyzing and storing data effectively.
E N D
Auditing Big Data Systems Leighton R. Johnson, III CISA, CISSP, CISM, CSSLP, CAP, CRISC, FITSP-A ISACA Accredited Instructor
Background • Leighton Johnson, the CTO of ISFMT (Information Security & Forensics Management Team), has presented computer security and forensics classes and seminars all across the United States and Europe. • He has over 40 years experience in Computer Security, Cyber Security, Software Development and Communications Equipment Operations & Maintenance; Primary focus areas include computer security, information operations & assurance, forensics & incident response, software system development life cycle focused on evaluating systems, systems engineering and integration activities, database administration and business process & data modeling
Big Data in Use • Every day 2.5 exabytes of data are generated from new and traditional sources including climate sensors, social media sites, digital pictures & videos, purchase transaction records, cellphone GPS signals, and more. • Big data environments allow organizations to aggregate more and more data—much of which is financial, personal, intellectual property or other types of sensitive data. • The data is both unstructured and structured; huge amounts exist in systems and often it is real-time live data.
Big Data is what? • Big Data refers to the need to parallelize the data handling in data-intensive systems. The characteristics of Big Data that force new architectures are as follows: • Volume (i.e., the size of the dataset); • Velocity (i.e., rate of flow); • Variety (i.e., data from multiple repositories, domains, or types); and • Variability (i.e., the change in velocity or structure). • These characteristics—volume, variety, velocity, and variability—are known colloquially as the Vs of Big Data
Big Data Analytics is what? • The central benefit of Big Data analytics is the ability to process large amounts and various types of information. The need for greater performance or efficiency happens on a continual basis. • However, Big Data represents a fundamental shift to parallel scalability in the architecture needed to efficiently handle current datasets: • their size, • their speed, • their “change-ability.”
Audit Areas of Interest • The Big Data areas of concern to auditors and security professionals include: • Sensitive data discovery and classification • Data access and change controls. • Real-time data activity monitoring and auditing • Data protection • Data loss prevention • Vulnerability management • Compliance management
Why Big Data Now • Big data is a result of technical transitions: • from mostly internal data to information from multiple sources • from transactional to add analytical data • from structured to add unstructured data • from persistent data to add data that is constantly on the move
2018 Numbers – 1 • Data is growing at a rapid pace. By 2020 the new information generated per second for every human being will approximate amount to 1.7 megabytes. • By 2020, the accumulated volume of big data will increase from 4.4 zettabytes to roughly 44 zettabytes or 44 trillion GB. • Originally, data scientists maintained that the volume of data would double every two years thus reaching the 40 ZB point by 2020. That number was later bumped to 44ZB when the impact of IoT was brought into consideration. • The rate at which data is created is increased exponentially. For instance, 40,000 search queries are performed per second (on Google alone), which makes it 3.46 million searches per day and 1.2 trillion every year.
2018 Numbers – 2 • Every minute Facebook users send roughly 31.25 million messages and watch 2.77 million videos. • The data gathered is no more text-only. An exponential growth in videos and photos is equally prominent. On YouTube alone, 300 hours of video are uploaded every minute. • IDC estimates that by 2020, business transactions (including both B2B and B2C) via the internet will reach up to 450 billion per day. • Globally, the number of smartphone users will grow to 6.1 billion by 2020 (this will overtake the number of basic fixed phone subscriptions). • In just 5 years the number of smart connected devices in the world will be more than 50 billion – all of which will create data that can be shared, collected and analyzed.
Support Methods include Cloud • By 2020, a minimum one-third of all data will be stored and analyzed using cloud computing. • Shared computing (executing computing tasks over a network of processors in the cloud) is what makes big data analysis possible. Google uses this setup every day by leveraging about 1,000 computers for answering a single search query, all of which takes less than 0.2 seconds to complete. • According to a study by Deloitte, the key influential reasons to move to a cloud include faster payback times (30%) and improved agility (29%). • Hadoop, an open source tool for distributed computing, is expected to grow at a compound annual growth rate of 58% thus, reaching $1 billion by 2020. • 76% of decision-makers surveyed foresee significant changes in the domain of storage systems because of the “Big Data” phenomenon.
2018 Views of Big Data Usage – 1 • Recent studies indicate that by improving the integration of big data, healthcare could save up to $300 billion a year— these boils down to reducing costs by $1000 a year for each person that has access to the facility. • The White House has invested a whopping $200 million in big data projects. • A nominal 10% upsurge in data accessibility can result in more than $65 million increase in the net income, for a typical Fortune 1000 company.
2018 View of Big Data Usage – 2 • Retailers who choose to leverage the full potential of big data analytics can optimize their operating margins by approximately 60%. • Out of the 85% companies who are trying to be data-driven, only 37% have been successful in their initiatives. This is a result of lack of clarity among the executives. Over time the remaining companies will also match up to the level. • As of this moment, only 0.5% of all accessible data is analyzed and used. Imagine the potential here.
Big Data Components • 4 V’s • Velocity • Speed of data in and out • Volume • Increasing amount of data • Variety • Range of data types and sources • Variability • Constantly changing data
Extremes of Volumeor Velocity maybe better handled by BI up to apoint BigData Velocity As dataVariety and/orVariability increase, BigData becomes more attractive Traditional BI Volume
Big Data V – Variety • Variety describes the organization of the data—whether the data is structured, semi-structured, or unstructured • Issues • Use of Encryption inhibits semantics • Inference • Aggregation
Big Data V – Volume • The volume of Big Data describes how much data is coming in; this typically ranges from gigabytes to exabytes and beyond. • Issues • Multi-tier storage needed • Distributed storage management • Advantages • Data breach analytics possible
Big Data V – Velocity • Velocity describes the speed at which data is processed. The data usually arrives in batches or is streamed continuously. • Issues • Distributed programming frameworks • Weakest Link issue with infrastructure
Big Data V – Veracity • Several Characteristics: • 1. Provenance • Original Source Information • Pedigree of Data • Metadata of the Data • Context of Data when Collected • Etc.
Big Data V – Veracity (cont.) • Several Characteristics: • 2. Curation • Governance & Quality Assurance • Correction of Data Errors • Data Collection Methodologies • Privacy Considerations • Etc.
Big Data V – Veracity (cont.) • Several Characteristics: • 3. Validity • Accuracy & Correctness • Data Quality • Aggregation Issues here • Disparate Data Sets brought together can lead to issues of: • Interpretation • Corruption of Data • Translation
Big Data V – Volatility • Data Management over Time • Big Data is transformational due to indefinitely persistent data • Criteria surrounding Data changes • Roles • Security • Privacy • Governance
NIST Definitions Big Data • Big Dataconsists of extensive datasets primarily in the characteristics of volume, variety, velocity, and/or variability that require a scalable architecture for efficient storage, manipulation, and analysis. • Variabilityrefers to changes in dataset, whether data flow rate, format/structure, semantics, and/or quality that impact the analytics application.
NIST Definitions Big Data • Varietyrefers to data from multiple repositories, domains, or types. • Velocity refers to the rate of data flow. • Veracity refers to the accuracy of the data. • Volatility refers to the tendency for data structures to change over time.
Big Data Sources • Structured Data • Pre-defined organization • RDBMS – based • Unstructured Data • No pre-defined organization • Real-time data feeds • Social Media sources • i.e. Twitter, Facebook, Reddit, etc.
Data Sources Used • Transaction Data • System Logs • Sensor/RFID/Machine Data • Documents/Texts • Social Media Data • Clickstream Data • Video/Images
Big Data Usage • Science and research • Government • Corporation/Private Sector • Retail • Wal-Mart – 1M transactions per hour • Amazon – 3 DBs online • 7.8 TB, 18.5 TB, and 24.7 TB • Healthcare • Energy/Smart Grid • Others
Major Big Data Uses • Marketing Trend Analysis • Retail Sales Analysis • Security Activity Correlation Analysis • Research Correlation Analysis using large Data Stores
Big Data Analytics Today • Analytic Speed needed for Value: • Development speed • Data processing speed • Deployment speed • Response speed • Foundation for BI and Data Science efforts
Big Data Impacts & Benefits • Governance • What Data should be Included • Planning • Process of collecting and organizing outcomes • Utilization • Becoming information “mavens” • Assurance • Data Quality • Privacy • Regional and National Laws
Big Data Security Uses • Big Data Security-based analysis can be used to: • Detect probable threats based on current vulnerabilities, • Provide analysis of identity and access activities, • Correlate events and alerts, and • Provide meaningful insights into the effectiveness of remediation of security incidents • Identify patterns of anomalies to normal behavioral performance, operations and configuration states, • Capacity planning and forecast of IT resource utilization
Security of Big Data • Secure data, collect it, and then aggregate to evaluate to total • Obtain visibility of all data - Collection • Understand the context - Integration • Utilize the intelligence - Analytics
Big Data Security Correlation • Analysis capabilities • Incident Response capabilities • Data Breach capabilities • Data Recovery capabilities • Disaster Recovery capabilities • Forensics capabilities
Big Data Risks • Risks associated with big data include: • Poor data quality • Inadequate technology • Insufficient security • Immature data governance practices • All are focus areas for audit activities
Big Data Challenges to Security • Rapid response times needs • Non-consistency of data structures • Lack of audit and security tools
Compliance and Big Data • Issues • Volume • Complexity • Lack of consistent structure • Need to isolate the compliance-sensitive portions of data from total
Compliance Issue Example • Create multiple data sets and put them in the same location, • Allows technology to cross-integrate that information. • Potential new information that needs new controls. • For instance, List of clients and it’s benign • Add marketing information from a third-party • Now have a new data set. • Then link in other information • Now possible to have PII which requires compliance and protection • Accomplished through dynamic queries of data sources.
Statutory & Regulatory Needs • Scope • Location • Transnational Considerations • Privacy Considerations • Downstream Liabilities
Big Data Regulatory Issues • Regulatory Considerations • HIPAA • SOX • GLBA • EU rules • FACTA • FERPA • PCI-DSS
Regulatory Issue – Example 1 • HIPAA/HITECH • Consequences of non-compliance are potentially severe, including both civil and criminal penalties • Two ways to control keep data secure from release • Encryption • Destruction
Regulatory Issue – Example 2 • PCI-DSS • An industry-wide framework for protecting consumer credit card data. • Any company that stores, processes or transmits credit card data must comply with PCI-DSS by properly securing and protecting the data
Big Data & Cloud Legal Issues • Where is the data? • Cloud server locations • Legal status vary from country to country. • “Despite the global feel of the cloud, some countries’ laws will be involved when it’s time to sue to get back data or to demonstrate compliance with privacy rules.” • Who is managing the data • Identified or unidentified data processing activities
Recent Data-related Events • Facebook • Equifax • OPM
ISACA’s 5 Key Questions for Big Data Privacy and Security • Can the company trust its sources of Big Data? • What information is the company collecting without exposing the enterprise to legal and regulatory battles? • How will the company protect its sources, processes and decisions from theft and corruption? • What policies are in place to ensure that employees keep stakeholder information confidential during and after employment? • What actions are company taking that creates trends that can be exploited by its rivals?
Securing Big Data Focal Points • Security Architecture • Infrastructure Components • Hardware • Software • Computational Algorithms • “Real-Time” Analytics • Data Itself
Questions for Securing Big Data • Data Fits Into Organization – How? • Data is Classified – How? • Search Algorithms are Controlled – How? • Data is Accessed – How and By Whom? • Data is Reported Out – How and To Whom? • Data is Updated - How Often and By What Process?
What Data Needs Securing • Structured Data Sources • Unstructured Data Sources • “Real-time” Data Feeds • “Time-sensitive” Data • Meta-Data about the Data
Structured Data Sources • Local Databases • Spreadsheets and Office Documents • Data Warehouses • Partner Data • Data Brokers
Standard Database Security • Data • Schemas • Meta-data • Files, Folders, Interfaces • Transaction Logs • ISACA has many documents covering various RDBMS