1 / 143

Auditing Big Data Systems

Discover the importance of auditing and securing big data systems. Learn about the challenges, benefits, and best practices for protecting sensitive data, monitoring real-time activities, and managing compliance. Don't miss out on the opportunity to leverage cloud computing for analyzing and storing data effectively.

lawhorn
Download Presentation

Auditing Big Data Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Auditing Big Data Systems Leighton R. Johnson, III CISA, CISSP, CISM, CSSLP, CAP, CRISC, FITSP-A ISACA Accredited Instructor

  2. Background • Leighton Johnson, the CTO of ISFMT (Information Security & Forensics Management Team), has presented computer security and forensics classes and seminars all across the United States and Europe. • He has over 40 years experience in Computer Security, Cyber Security, Software Development and Communications Equipment Operations & Maintenance; Primary focus areas include computer security, information operations & assurance, forensics & incident response, software system development life cycle focused on evaluating systems, systems engineering and integration activities, database administration and business process & data modeling

  3. Big Data in Use • Every day 2.5 exabytes of data are generated from new and traditional sources including climate sensors, social media sites, digital pictures & videos, purchase transaction records, cellphone GPS signals, and more. • Big data environments allow organizations to aggregate more and more data—much of which is financial, personal, intellectual property or other types of sensitive data. • The data is both unstructured and structured; huge amounts exist in systems and often it is real-time live data.

  4. How Much is Big?

  5. Big Data is what? • Big Data refers to the need to parallelize the data handling in data-intensive systems. The characteristics of Big Data that force new architectures are as follows: • Volume (i.e., the size of the dataset); • Velocity (i.e., rate of flow); • Variety (i.e., data from multiple repositories, domains, or types); and • Variability (i.e., the change in velocity or structure). • These characteristics—volume, variety, velocity, and variability—are known colloquially as the Vs of Big Data

  6. Big Data Analytics is what? • The central benefit of Big Data analytics is the ability to process large amounts and various types of information. The need for greater performance or efficiency happens on a continual basis. • However, Big Data represents a fundamental shift to parallel scalability in the architecture needed to efficiently handle current datasets: • their size, • their speed, • their “change-ability.”

  7. Audit Areas of Interest • The Big Data areas of concern to auditors and security professionals include: • Sensitive data discovery and classification • Data access and change controls. • Real-time data activity monitoring and auditing • Data protection • Data loss prevention • Vulnerability management • Compliance management

  8. Why Big Data Now • Big data is a result of technical transitions: • from mostly internal data to information from multiple sources • from transactional to add analytical data • from structured to add unstructured data • from persistent data to add data that is constantly on the move

  9. 2018 Numbers – 1 • Data is growing at a rapid pace. By 2020 the new information generated per second for every human being will approximate amount to 1.7 megabytes. • By 2020, the accumulated volume of big data will increase from 4.4 zettabytes to roughly 44 zettabytes or 44 trillion GB. • Originally, data scientists maintained that the volume of data would double every two years thus reaching the 40 ZB point by 2020. That number was later bumped to 44ZB when the impact of IoT was brought into consideration. • The rate at which data is created is increased exponentially. For instance, 40,000 search queries are performed per second (on Google alone), which makes it 3.46 million searches per day and 1.2 trillion every year.

  10. 2018 Numbers – 2 • Every minute Facebook users send roughly 31.25 million messages and watch 2.77 million videos. • The data gathered is no more text-only. An exponential growth in videos and photos is equally prominent. On YouTube alone, 300 hours of video are uploaded every minute. • IDC estimates that by 2020, business transactions (including both B2B and B2C) via the internet will reach up to 450 billion per day. • Globally, the number of smartphone users will grow to 6.1 billion by 2020 (this will overtake the number of basic fixed phone subscriptions). • In just 5 years the number of smart connected devices in the world will be more than 50 billion – all of which will create data that can be shared, collected and analyzed.

  11. Support Methods include Cloud • By 2020, a minimum one-third of all data will be stored and analyzed using cloud computing. • Shared computing (executing computing tasks over a network of processors in the cloud) is what makes big data analysis possible. Google uses this setup every day by leveraging about 1,000 computers for answering a single search query, all of which takes less than 0.2 seconds to complete. • According to a study by Deloitte, the key influential reasons to move to a cloud include faster payback times (30%) and improved agility (29%). • Hadoop, an open source tool for distributed computing, is expected to grow at a compound annual growth rate of 58% thus, reaching $1 billion by 2020. • 76% of decision-makers surveyed foresee significant changes in the domain of storage systems because of the “Big Data” phenomenon.

  12. 2018 Views of Big Data Usage – 1 • Recent studies indicate that by improving the integration of big data, healthcare could save up to $300 billion a year— these boils down to reducing costs by $1000 a year for each person that has access to the facility. • The White House has invested a whopping $200 million in big data projects. • A nominal 10% upsurge in data accessibility can result in more than $65 million increase in the net income, for a typical Fortune 1000 company.

  13. 2018 View of Big Data Usage – 2 • Retailers who choose to leverage the full potential of big data analytics can optimize their operating margins by approximately 60%. • Out of the 85% companies who are trying to be data-driven, only 37% have been successful in their initiatives. This is a result of lack of clarity among the executives. Over time the remaining companies will also match up to the level. • As of this moment, only 0.5% of all accessible data is analyzed and used. Imagine the potential here.

  14. Big Data Components • 4 V’s • Velocity • Speed of data in and out • Volume • Increasing amount of data • Variety • Range of data types and sources • Variability • Constantly changing data

  15. Extremes of Volumeor Velocity maybe better handled by BI up to apoint BigData Velocity As dataVariety and/orVariability increase, BigData becomes more attractive Traditional BI Volume

  16. Big Data V – Variety • Variety describes the organization of the data—whether the data is structured, semi-structured, or unstructured • Issues • Use of Encryption inhibits semantics • Inference • Aggregation

  17. Big Data V – Volume • The volume of Big Data describes how much data is coming in; this typically ranges from gigabytes to exabytes and beyond. • Issues • Multi-tier storage needed • Distributed storage management • Advantages • Data breach analytics possible

  18. Big Data V – Velocity • Velocity describes the speed at which data is processed. The data usually arrives in batches or is streamed continuously. • Issues • Distributed programming frameworks • Weakest Link issue with infrastructure

  19. Big Data V – Veracity • Several Characteristics: • 1. Provenance • Original Source Information • Pedigree of Data • Metadata of the Data • Context of Data when Collected • Etc.

  20. Big Data V – Veracity (cont.) • Several Characteristics: • 2. Curation • Governance & Quality Assurance • Correction of Data Errors • Data Collection Methodologies • Privacy Considerations • Etc.

  21. Big Data V – Veracity (cont.) • Several Characteristics: • 3. Validity • Accuracy & Correctness • Data Quality • Aggregation Issues here • Disparate Data Sets brought together can lead to issues of: • Interpretation • Corruption of Data • Translation

  22. Big Data V – Volatility • Data Management over Time • Big Data is transformational due to indefinitely persistent data • Criteria surrounding Data changes • Roles • Security • Privacy • Governance

  23. NIST Definitions Big Data • Big Dataconsists of extensive datasets  primarily in the characteristics of volume, variety, velocity, and/or variability  that require a scalable architecture for efficient storage, manipulation, and analysis. • Variabilityrefers to changes in dataset, whether data flow rate, format/structure, semantics, and/or quality that impact the analytics application.

  24. NIST Definitions Big Data • Varietyrefers to data from multiple repositories, domains, or types. • Velocity refers to the rate of data flow. • Veracity refers to the accuracy of the data. • Volatility refers to the tendency for data structures to change over time.

  25. Big Data Sources • Structured Data • Pre-defined organization • RDBMS – based • Unstructured Data • No pre-defined organization • Real-time data feeds • Social Media sources • i.e. Twitter, Facebook, Reddit, etc.

  26. Data Sources Used • Transaction Data • System Logs • Sensor/RFID/Machine Data • Documents/Texts • Social Media Data • Clickstream Data • Video/Images

  27. Big Data Usage • Science and research • Government • Corporation/Private Sector • Retail • Wal-Mart – 1M transactions per hour • Amazon – 3 DBs online • 7.8 TB, 18.5 TB, and 24.7 TB • Healthcare • Energy/Smart Grid • Others

  28. Major Big Data Uses • Marketing Trend Analysis • Retail Sales Analysis • Security Activity Correlation Analysis • Research Correlation Analysis using large Data Stores

  29. Big Data Analytics Today • Analytic Speed needed for Value: • Development speed • Data processing speed • Deployment speed • Response speed • Foundation for BI and Data Science efforts

  30. Big Data Impacts & Benefits • Governance • What Data should be Included • Planning • Process of collecting and organizing outcomes • Utilization • Becoming information “mavens” • Assurance • Data Quality • Privacy • Regional and National Laws

  31. Big Data Security Uses • Big Data Security-based analysis can be used to: • Detect probable threats based on current vulnerabilities, • Provide analysis of identity and access activities, • Correlate events and alerts, and • Provide meaningful insights into the effectiveness of remediation of security incidents • Identify patterns of anomalies to normal behavioral performance, operations and configuration states, • Capacity planning and forecast of IT resource utilization

  32. Security of Big Data • Secure data, collect it, and then aggregate to evaluate to total • Obtain visibility of all data - Collection • Understand the context - Integration • Utilize the intelligence - Analytics

  33. Security Key - Correlation

  34. Big Data Security Correlation • Analysis capabilities • Incident Response capabilities • Data Breach capabilities • Data Recovery capabilities • Disaster Recovery capabilities • Forensics capabilities

  35. Big Data Risks • Risks associated with big data include: • Poor data quality • Inadequate technology • Insufficient security • Immature data governance practices • All are focus areas for audit activities

  36. Big Data Challenges to Security • Rapid response times needs • Non-consistency of data structures • Lack of audit and security tools

  37. Compliance and Big Data • Issues • Volume • Complexity • Lack of consistent structure • Need to isolate the compliance-sensitive portions of data from total

  38. Compliance Issue Example • Create multiple data sets and put them in the same location, • Allows technology to cross-integrate that information. • Potential new information that needs new controls. • For instance, List of clients and it’s benign • Add marketing information from a third-party • Now have a new data set. • Then link in other information • Now possible to have PII which requires compliance and protection • Accomplished through dynamic queries of data sources.

  39. Statutory & Regulatory Needs • Scope • Location • Transnational Considerations • Privacy Considerations • Downstream Liabilities

  40. Big Data Regulatory Issues • Regulatory Considerations • HIPAA • SOX • GLBA • EU rules • FACTA • FERPA • PCI-DSS

  41. Regulatory Issue – Example 1 • HIPAA/HITECH • Consequences of non-compliance are potentially severe, including both civil and criminal penalties • Two ways to control keep data secure from release • Encryption • Destruction

  42. Regulatory Issue – Example 2 • PCI-DSS • An industry-wide framework for protecting consumer credit card data. • Any company that stores, processes or transmits credit card data must comply with PCI-DSS by properly securing and protecting the data

  43. Big Data & Cloud Legal Issues • Where is the data? • Cloud server locations • Legal status vary from country to country. • “Despite the global feel of the cloud, some countries’ laws will be involved when it’s time to sue to get back data or to demonstrate compliance with privacy rules.” • Who is managing the data • Identified or unidentified data processing activities

  44. Recent Data-related Events • Facebook • Equifax • OPM

  45. ISACA’s 5 Key Questions for Big Data Privacy and Security • Can the company trust its sources of Big Data? • What information is the company collecting without exposing the enterprise to legal and regulatory battles? • How will the company protect its sources, processes and decisions from theft and corruption? • What policies are in place to ensure that employees keep stakeholder information confidential during and after employment? • What actions are company taking that creates trends that can be exploited by its rivals?

  46. Securing Big Data Focal Points • Security Architecture • Infrastructure Components • Hardware • Software • Computational Algorithms • “Real-Time” Analytics • Data Itself

  47. Questions for Securing Big Data • Data Fits Into Organization – How? • Data is Classified – How? • Search Algorithms are Controlled – How? • Data is Accessed – How and By Whom? • Data is Reported Out – How and To Whom? • Data is Updated - How Often and By What Process?

  48. What Data Needs Securing • Structured Data Sources • Unstructured Data Sources • “Real-time” Data Feeds • “Time-sensitive” Data • Meta-Data about the Data

  49. Structured Data Sources • Local Databases • Spreadsheets and Office Documents • Data Warehouses • Partner Data • Data Brokers

  50. Standard Database Security • Data • Schemas • Meta-data • Files, Folders, Interfaces • Transaction Logs • ISACA has many documents covering various RDBMS

More Related