Data security is the main component of cloud security. Cloud Service Providers (CSP) will often share the responsibility for security with the customer. Roles such as the Chief Information Security Officer (CISO), Chief Security Officer (CSO), Chief Technology Officer (CTO), Enterprise Security Architect, and Network Administrator may all play a part in providing elements of a security solution for the enterprise.
Figure 2-1: Cloud Data Security
The goal of the cloud data security is to provide a knowledge of the types of controls necessary to administer several levels of availability, confidentiality, and integrity about securing data in cloud.
The most significant part of any system or application is the data contained within it; the data holds the most value for any organization or company. While several principles of protection and data security are the same within a cloud environment as in a traditional data center, there are some challenges and differences unique to the cloud environment.
The Data Security Life Cycle, as introduced in the Cloud Security Alliance (CSA) guidance, enables the organization to map the different phases in the data life cycle against the required controls that are important to each phase.
The life cycle contains the following steps:
- Map the different life cycle phases.
- Integrate the access types and different data locations.
- Map into actors, controls, and functions.
The data life cycle guidance provides a framework to map significant use cases for data access while assisting in the development of proper controls within each life cycle stage.
The life cycle model serves as a reference and framework to provide a standardized approach for data life cycle and data security. Not all implementations or situations will align fully or comprehensively.
Data in the cloud must be perceived, in the general case, to have the same needs and properties as data in the legacy environment. The data life cycle still has a purpose; only the implemented particulars will change. Typical stages in the data life cycle are shown:
Figure 2-2: Cloud Data Life Cycle
Cloud customers generate, alter or modify the information on Cloud. Depending on the use phase, the data may be created locally or internally. Externally, creation can be performed on remote workstations, and then uploaded onto the cloud. Creation phase is more important because it is an appropriate time to classify the sensitivity of the created information and its value to an organization. This classification is an important factor to apply proper security controls.
Data Created Remotely
Data created by the user must be encrypted before uploading to the cloud. We need to protect against apparent vulnerabilities, including man-in-the-middle attacks and insider threat at the cloud data center. The cryptosystem used for this purpose should have a high work factor and listed in the FIPS 140-2 approved crypto solutions. We must implement good key management practices as well.
The connection used to upload the data should also be secure, preferably with an IPSec VPN solution.
Data Created within the Cloud
Similarly, data created within the cloud through remote manipulation must be encrypted upon creation, to obviate unnecessary access or view by data center personnel.
After the data is created, it must be stored in a way that is useable to the application or system. Data can be stored in several ways. Storage methods include data written to a database, remote object storage in a cloud and files on a file system. All storage of data should be complete by the data classification determined during the Create phase.
The store phase is the first place where the security control can be implemented to secure the data at rest, and the cloud security professional should make sure that all storage methods employ whatever technologies are necessary for its data classification level, including the auditing, encryption, access controls, monitoring and logging. The use of appropriate redundancy and backup methods also comes into play immediately in the store phase to protect the data on top of the security controls.
In use phase, data is accessed, processed, viewed and being involved in some sort of activity other than modification and alteration. Being in process, the data is in its most vulnerable state because it is more exposed, decrypted, might be transported to unsecure location and increase the chance of leak or compromise. At this time, when the state of data is transiting from rest to motion, controls must be employed to secure the data such as Data Loss Prevention (DLP) policy, Information Right Management (IRM), databases and file access monitoring to record and audit data access and prevent unauthorized access.
Using the information or data is only possible when it is in decrypted state. As we discussed in storage phase to encrypt the data while storing, exposing unencrypted state of data in use phase requires logging and auditing. Additionally, granting least privileges considering read-only mode ensures that no further modification or alteration is possible.
In sharing phase, the data is available for use between users, partners and customers. It depends upon the nature of data if it is to be shared with all or among selected ones. It is a big challenge to make sure that proper protections are in place, once the data leaves the system and is shared. Unlike the Use phase, the data is being allowed to be accessed and used by contractors, partners, customers, and other associated groups, and once the data leaves the central system, it is no longer under the employed security control mechanisms. Technologies such as DLP (Data Loss Prevention) and numerous rights management packages can be utilized to detect either additional attempt or sharing to prevent modification. However, neither method is entirely secure.
In the archive phase, data moves to long-term storage, thus removing it from being active within a system. The archiving process can range from moving data to a lower storage tier that is slower and not as redundant but still accessible from the system, all the way up to removing it from the active system entirely and placing it on different media altogether. Additionally, the data can be recovered and read by the system again, but typically will involve more time, effort, or cost to do so. In many cases where the data is completely removed from the active system, it is also stored offsite for disaster recovery reasons—sometimes even being hundreds or thousands of miles away. More than one overlooked aspects of archiving data are the ability to recover and retrieve as well.
Destroy phase is the phase in which the data is removed by the Cloud provider. This phase can be interpreted into different technical meanings according to data content, usage, and applications used. Data destruction can mean logically erasing pointers or permanently destroying data using digital or physical means. Consideration must be made according to regulation, the type of Cloud is used (SaaS vs. IaaS), and the classification of the data.
Cloud Security Models (CSM)
Cloud Security Models are tools to help guide security decisions. It can consist of some of the following types:
Conceptual Models or Frameworks
It includes descriptions and visualizations used to explain principles and Cloud security concepts such as the Cloud Security Alliance (CSA) logical model.
Controls Models or Frameworks
It can categorize and detail specific cloud security controls or categories of controls, such as the CSA Cloud Controls Matrix (CCM).
It is a template used for implementing cloud security, typically generalized (e.g., an Infrastructure as a service (IaaS) security reference architecture). They can be very conceptual, intellectual, abstract, or quite detailed, down to specific functions and controls.
It is a re-usable solution to specific problems. Insecurity, an example is IaaS log management. As with Reference Architectures, they can be inherently specific or abstract, common implementation patterns on specific Cloud platforms.
Recommended CSA Security Models
CSA Enterprise Architecture
BUSINESS OPERATION SUPPORT SERVICES (BOSS)
INFORMATION TECHNOLOGY OPERATION & SUPPORT (ITOS)
SECURITY & RISK MANAGEMENT
Table 2-1: CSA Enterprise Architecture
CSA Cloud Controls Matrix (CCM)
- Provides a fundamental security principle to guide Cloud vendors and to assist prospective cloud customers in assessing the entire security risk of a cloud provider.
- The CSA CCM are aligned to the Cloud Security Alliance guidance in 13 domains.
- Customized relationship to important industry security standards, guidelines, and controls frameworks such as the ISACA COBIT, ISO 27001/27002, PCI, NIST, Jericho Forum and NERC CIP
- CCM provides organizations with the needed structure, in details and clarity and relatined information security tailored into the Cloud industry.
- Provides operational risk management and standardized security, and seeks to normalize security expectations, Cloud taxonomy, and terminology.
- It has the following versions:
- Cloud Control Matrix v3.0.1
- Cloud Control Matrix v3
- Cloud Control Matrix v1.4
- Cloud Control Matrix v1.3
- Cloud Control Matrix v1.2
- Cloud Control Matrix v1.1
- Cloud Control Matrix v1.0
NIST SP 500-299
NIST SP 500-299 is a Cloud Computing Security Reference Architecture. A framework that identifies a fundamental security component that can be implemented in the Cloud ecosystem.
This standard gives guideline for information security controls applicable to the provision and use of Cloud services by providing additional implementation guidance for relevant controls specified in ISO/IEC 27002.Also, provides additional control with implementation guidance that relates explicitly to Cloud services.
This Recommendation and International Standard provides controls and implementation guidance for both Cloud service providers and Cloud service customers.
NIST recommends three service models, which define the different foundational categories of Cloud services:
- IaaS (Infrastructure as a Service)
- PaaS (Platform as a Service)
- SaaS (Software as a Service)
Infrastructure as a Service (IaaS)
IaaS provides access to a resource pool of basic computing infrastructures, such as computer, network, or storage. IaaS has the following storage options:
- Content Delivery Network (CDN): It is also known as a content distribution network. A content is stored in object storage, which is distributed to various geographically distributed nodes to increase Internet consumption speeds.
- Object Storage: Object storage is referred to as file storage. Instead of a virtual hard drive, object storage is similar to a file share accessed through a web interface or Application Programming Interface (API).
- Raw Storage: This type of storage includes the physical media where the data is stored. It can be mapped for direct access in specific private cloud configurations.
- Volume storage: This type of storage includes volumes attached to IaaS instances, usually as a virtual hard drive. Volumes usually use data dispersion to support resiliency and security.
Platform as a Service (PaaS)
PaaS provides and relies on an extensive range of storage options, including:
PaaS may provide:
- Application Storage: This includes storage options built into a PaaS application platform and consumable through APIs that does not fall into other storage categories.
- Big Data as a Service: This application is offered as a cloud platform. Data is stored in object storage or another distributed file system. Data needs to be close to the processing environment and can be moved temporally if needed for processing.
- Database as a Service(DBaaS): A multi-tenant database architecture that is directly consumable as a service. Users consume the database through direct SQL calls or APIs, depending on the offering. Each customer’s data is separated and isolated from other tenants. Databases may be relational, flat, or any other common structure
PaaS may consume:
- Databases: Content and information may be directly stored in the database (as text or binary objects), or as files referenced by the database. The database itself may be a collection of IaaS instances sharing common back-end storage.
- Object/File Storage: Files or other data are stored in object storage, but only accessed via the PaaS API.
- Volume Storage: Data may be stored in IaaS volumes attached to instances dedicated to providing the PaaS service.
Software as a Service (SaaS)
As with PaaS, SaaS uses an extensive range of storage and consumption models. SaaS storage is accessed through a web-based user interface or server/client application. If the storage is accessible through API, then it is considered as a PaaS. Numerous SaaS providers also offer PaaS APIs. The two most common storage types are:
- Information Storage and Management: Data is arrived into the system through the web interface and stored within the SaaS application (with a back-end database application).
- Content and File Storage: A File-based content is stored in the SaaS application such as documents, reports, image files, and made accessible through a web based user interface.
Examples of IaaS, PaaS, SaaS
DigitalOcean, Linode, Rackspace, Amazon Web Services (AWS), Cisco Metapod, Microsoft Azure, Google Compute Engine (GCE)
AWS Elastic Beanstalk, Windows Azure, Heroku, Force.com, Google App Engine, Apache Stratos, OpenShift
Google Apps, Dropbox, Salesforce, Cisco WebEx, Concur, GoToMeeting
Table 2-2: Examples of IaaS, PaaS, SaaS
Few vendors provide a Cloud storage service tailored to the needs of data archiving. These include features such as data life cycle management, guaranteed immutability, and search.
HP Autonomy Digital Safe Archiving Service
This service uses an on-premises appliance, which connects to customers’ data stores via API and allows the user to search.
Digital Safe offers read-only Write Once Read Many (WORM), e-discovery, legal hold, and all the features associated with enterprise archiving. Its appliance carries out data deduplication before transmission to the data repository.
Ephemeral storage is significant for IaaS instances and exists only if its instance is up. It is usually used for swap files and other temporary storage needs. It is terminated with its instance.
Raw Device Mapping (RDM) is a method of disk virtualization in VMware to enable a storage Logical Unit Number (LUN), is to be directly connected to a virtual machine (VM) from the Storage Area Network (SAN). In a Microsoft’s Hyper-V platform, this is accomplished using pass-through disks.
Swap-File: A file on a hard disk that is used to provide space for programs which have been transferred from the processor’s memory.
Data Deduplication: It is a technique for eliminating duplicate copies of repeating data.
There are many threats to storage types defined by the ISO/IEC 27040.
- Unauthorized Usage: In the cloud environment, data storage can be implemented into unauthorized usages, such as by account hijacking or uploading illegal content. The multi-tenancy of the cloud storage makes tracking unauthorized usage more challenging.
- Unauthorized Access: This can happen due to hijacking, non-regulated permissions in a multi-tenant’s environment, or an internal cloud provider employee.
- Liability due to Regulatory Non-Compliance: Certain controls such as encryption might be required for specific regulations, some cloud services enable all relevant data controls.
- Distributed Denial of Service (DDoS) and Denial of Service (DoS) attacks on storage: Availability is a definite concern for cloud storage. Without data, no instances can be launched.
- Corruption or Modification and Destruction of Data: A wide variety of sources can cause this: human errors, hardware or software failure, events such as fire or flood, or intentional hacks. It can also affect a particular portion of the storage or the entire array.
- Data leakage or Data breaches: Customers always have to be aware that cloud data is exposed to data breaches. Maybe external or coming from a cloud provider employee with storage access. Data tends to be replicated and moved into cloud, which increases the likelihood of a leak.
- Theft/Accident Loss of Media: This threat applies to portable storage, but as cloud datacenters grow and storage devices are getting smaller, there are increasingly more vectors for them to experience theft or similar threats as well.
- Malware Attack: The objective of almost every malware is eventually reaching the data storage.
- Improper Treatment: End of use is more challenging and complex in cloud computing since most often, we cannot enforce physical destruction of media. However, the dynamic nature of data, where data is kept in different storages with multiple-tenants, mitigates the risk where digital remnants are located.
A significant approach and concept employed in a cloud environment to protect data are called Data Loss Prevention (DLP) and also known as data leakage prevention.
Data Loss Prevention:
DLP is a set of controls and practices used to make sure that data is only accessible and exposed to those systems and users authorized to have it. The objective of a DLP strategy for an organization is to manage and minimize risk, maintain compliance with regulatory requirements, and show due diligence on the part of the application and data owner. However, it is vital for any organization to take a holistic view of DLP and not focus on individual systems or hosting environments. The DLP strategy must involve their entire enterprise, particularly with hybrid cloud environments, or those where there is a mixture of traditional and cloud data center installations.
DLP consists of three components:
- Discovery and Classification: This is the first component of DLP, also a recurring process, and the majority of cloud-based DLP technologies are predominantly focused on this component. The discovery process usually maps data in cloud storage services and databases. It enables classification based on data categories such as classified data, credit card data, and public data.
- Monitoring: Data usage monitoring forms the key function of DLP. Effective DLP strategies monitor the usage of data across locations and platforms while enabling administrators to define more than one usage policies. The ability to monitor data can be executed on gateways, servers, and storage as well as workstations and endpoint devices. Recently, the increased adoption of external services to assist with DLP “as a service” has increased, along with many cloud-based DLP solutions. The monitoring application must be able to cover most sharing options available for users (email application, portable media, and internet browsing) and alert them of policy violations.
- Enforcement: Many DLP tools provide the capability to interrogate data and compare its location, use, or transmission destination against a set of policies to prevent data loss. If a policy violation is detected, specified appropriate enforcement actions can automatically be performed. Enforcement option can include the ability to alert and log, block data transfer, or re-route them for additional validation, or to encrypt the data before leaving the organizational boundaries.
DLP tool implementations typically conform to the following topologies:
- Data in Motion (DIM): This referred to a gateway or network-based DLP. In this type of topology, the monitoring engine is deployed near the organizational gateway to monitor outgoing protocols such as FTP, HTTP/HTTPS, and SMTP. The topology can be a combination of proxy-based, bridge, network tapping, or SMTP relays. To scan encrypted HTTPS traffic, proper mechanisms to enable SSL interception/broker is required to be integrated into the system architecture.
- Data at Rest (DAR): This referred to as storage-based data. In this type of topology, the DLP engine is installed where the data is at rest, typically more than one storage sub-systems, also file and application servers. This topology is very useful for data discovery and for tracking usage but may require integration with network or endpoint based DLP for policy enforcement.
- Data in Use (DIU): This referred to as a client or endpoint-based data. The DLP application is installed on the user’s workstations and endpoint devices. This topology offer insight into how the data is used by users, with the ability to add the protection that the network DLP may not be capable of providing. The challenge with client-based DLP is the time, complexity, and resources to implement across all the endpoint devices, often across multiple locations and a significant number of users.
Cloud-Based DLP Considerations
Some important considerations for cloud-based DLP include:
- Data in the Cloud tend to move and replicate: Whether it is between locations, backups, data centers, or backward and forward into the organizations, the movement and replication can present a challenge to any DLP deployment.
- Administrative access for enterprise data in the cloud could be tricky: Ensure that you could understand how to perform discovery and classification within cloud-based storage.
- DLP technology can affect overall performance: Gateway or network DLP, which scans all traffic for pre-defined content, could affect network performance. Customer-based DLPs scan all workstation access to data; this should have a performance impact on the workstation’s operation. The overall impact should be considered during testing.
Many technologies and tool sets are commonly used as data security strategies:
As we know, encryption is a process of encoding a message or an information in a way that it can only be decrypted by authorized parties. Plain text data is readable information by users or applications whereas an encrypted data also known as Cipher text is a random and meaningless piece of code. Decryption key is required to decrypt the data back to plain text. Encryption across the enterprise architecture could reduce the risks associated with unauthorized data access and exposure but may raise performance issues as every piece of information is encrypted at sender’s end and decrypted at destination end.
It is a responsibility to a CSP to implement encryption within the enterprise in such a way that it provides the most security benefits, safeguarding the mission-critical data while minimizing system performance issues as a result of the encryption.
Encryption can be implemented within different phases of the data life cycle:
- Data in Motion (DIM): Encrypting data in motion are mature and include IPSEC or VPN, TLS/SSL, and other same types of protocols.
- Data at Rest (DAR): When the data is archived or stored, different encryption techniques must be used. The encryption mechanism itself may vary in the manner it is deployed, dependent on the timeframe or indeed the period for which the data will be stored, such as extended retention versus short-term storage, data located in a database versus a file system, and so on.
- Data in Use (DIU): Data that is being shared, processed, or viewed. This stage of data life cycle is less mature than other data encryption techniques and typically focused on IRM/DRM solutions.
Figure 2-3 – Encryption Implementation
Sample Use Cases for Encryption
There are following some use cases for encryption:
- Inbound and Outbound traffic to cloud- for archiving, sharing, or processing- we use encryption for data in motion methods such as VPN or SSL/TLS to avoid data leakage or information exposure while in motion.
- You are protecting data at rest such as application components, archiving, backup applications, database information, and file storage.
- Objects or files should be protected when stored, shared, or used in Cloud.
- While complying with regulations such as HIPAA and PCI-DSS, which in turn requires proper protection of data traversing of untrusted networks, along with the protection of specific data types.
- Protection from third-party access through a lawful interception.
- Increased mechanisms or creating enhanced for logical separation between different clients’ data in the Cloud.
- Logical destruction of data when physical destruction is not technically possible or feasible.
Payment Card Industry Data Security Standard (PCI DSS): Payment Card Industry Data Security Standard (PCI-DSS) is a global information security standard created by “PCI Security Standards Council,” It was created for organizations to develop, enhance and assess security standards required for handling cardholder information and payment account security. PCI Security Standards Council develops security standards for payment card industry and provides tools required for enforcement of these standards like training, certification, assessment, and scanning.
Founding members of this council are:
- American Express
- Discover Financial Services
- JCB International
- Visa Inc.
PCI data security standard deals basically with cardholder data security for debit, credit, prepaid, e-purse, POS, and ATM cards. A high-level overview of PCI-DSS provide:
- Secure Network
- Strong Access Control
- Cardholder data security
- Regular Monitoring and Evaluation of Network
- Maintaining Vulnerability program
- Information security policy
Cloud Encryption Challenges
There are many factors influencing encryption, considerations and associated implementation in the enterprise. Using encryption should always be directly concern to the business considerations, regulatory requirements, and additional constraints that the organization may have to address. There are many different techniques will be used based on the location of data at rest, in transit, or use-while in the Cloud.
Different options could be applied when dealing with specific threats, such as protecting Personally Identifiable Information (PII) or legally regulated information, or when defending against unauthorized access and viewing from systems and platform administrators.
The following challenges are associated with encryption:
- The integrity of encryption is heavily dependent on the control and management of the relevant encryption keys, including how they are secured. If the cloud provider holds the keys, then not all data threats are mitigated against, as unauthorized actors may gain access to the data through the acquisition of the keys via a search warrant, legal ruling, or theft and misappropriation. Equally, if the customer is holding the encryption keys, this presents different challenges to make sure they are protected from unauthorized usage as well as compromise.
- Encryption can be challenging to implement effectively when a cloud provider is required to process the encrypted data. This is true even for simple tasks such as indexing, along with the gathering of metadata.
- Data in the cloud is highly portable. The data can be replicated, copied, and backed up extensively, making encryption and key management a challenge.
- Multi-tenant cloud environments and the shared use of physical hardware present challenges for the protection of keys in volatile memory such as RAM caches.
- Secure hardware for encrypting keys may not exist in cloud environments, and software-based key storage is often more vulnerable.
- Storage-level encryption is typically less complicated and can be most easily exploited (given sufficient time and resources). The higher you go up towards the application level, the more it becomes to deploy and implement encryption. However, encryption implemented at the application level will technically be more effective in protecting the confidentiality of the relevant resources or assets.
- Encryption can negatively affect performance, especially high-performance data processing mechanisms such as data warehouses and data cubes.
- The nature of cloud environment typically requires us to manage more keys like API keys, access keys, encryption keys, and shared keys.
Data Cube: “It is a multi-dimensional array of values.”
Object Storage Encryption
Most of the object storage services offer server-side storage-level encryption. This kind of encryption offers limited effectiveness, with the recommendation for external encryption mechanisms to be encrypting the data before its entrance within the cloud environments. Potential external mechanisms include:
- Application-level Encryption: In this mechanism, the encryption engine resides in the application that is using the object storage. It could be integrated into the application component or by a proxy that is responsible for encrypting the data before moving to the cloud. The proxy can be implemented on the client gateway or as a service residing at the external provider.
- File-level Encryption: Such as Digital Rights Management (DRM) or Information Rights Management (IRM) solutions, both of which can be very efficient when used in conjunction with sharing services and file hosting that typically rely on object storage. The encryption engine is generally implemented at the client side and will preserve the format of the original file.
There are following database encryption:
- Application-level Encryption: The encryption engine resides at the application that is using the database.
- File-level Encryption: Database servers reside on volume storage. For this deployment, we are encrypting the folder or volume of the database, with the encryption engine and keys residing on the instances attached to the volume. External file system encryption secures from lost backup, media theft, and external attack but not secure against attacks with access to the application layer, the instances OS, or the database itself.
- Transparent Encryption: Various Database Management Systems (DBMS) contain the ability to encrypt the entire database or specific portions, such as tables. The encryption engine resides within the DB, and it is transparent to the application. Keys typically reside within the instance, although processing and managing them might be offload to an external Key Management System (KMS). This encryption can provide sufficient protection from backup system intrusions, media theft, and specific database and application level attacks.
Key management is the most challenging components of any encryption implementation. Although new standards such as Key Management Interoperability Protocol (KMIP) are protection keys, emerging and appropriate management of keys are still the most difficult tasks you will need to engage in when planning cloud data security.
Common challenges with key management are:
- Access to the keys: Best practices coupled with regulatory requirements might set particular criteria for key access, along with restricting or not permitting access to keys by cloud Service Provider employees or personnel.
- Replication and Backup: The nature of cloud results in data replication and backups across some different formats. This can impact the ability for long and short-term key management to be managed and maintained adequately.
- Key Storage: Secure storage for the keys is crucial to protect the data. In traditional environments, keys were able to be stored in secure dedicated hardware. This might not always be possible in cloud environments.
Key Management Considerations
Considerations when planning key management include:
- Lack of access to the encryption keys will result in a lack of access to the data.
- Random number generation must conduct as a trusted process.
- Throughout the life cycle, cryptographic keys could never be transmitted in the clean environment and always remain in a “trusted” environment.
- Where possible, key management functions must be conducted separately from the cloud provider to enforce separation of duties and force collusion to occur if unauthorized data access is attempted.
- When considering key management or key escrow “as a service,” carefully plan to take into account all relevant regulations, laws, and jurisdictional requirements.
Key Storage in the Cloud
It is generally implemented using more than one of the following approaches:
- Externally Managed: In this approach, keys are maintained separately from the data and encryption engine. They can be on a similar cloud platform, internally within the organization, or on a different cloud platform. The actual storage can be a separate instance (hardened especially for this specific task) or on a Hardware Security Module (HSM). When implementing external key storage, consider how the key management system is integrated with the encryption engine and how the entire life cycle of key creation through to retirement is managed.
- Internally Managed: In this approach, the keys are stored on the application component or virtual machine that is also acting as the encryption engine. This type of key management is typically used in storage-level encryption, internal database encryption, or backup application encryption. This approach can be helpful for mitigating against the risks associated with lost media.
- Managed by a Third Party: This is when a trusted third party provides key escrow services. Key management providers use specifically developed secure infrastructure and integration services for key management. You should evaluate any third-party key storage services provider that might be contracted by the organization to make sure that the risks of allowing a third party to hold encryption keys are well understood and documented.
Key Management in Software Environments
Cloud Service Providers (CSP) secure keys using software-based solutions to avoid the additional cost and overhead of hardware-based security models.
Software-based key management solutions do not meet the physical security requirements specified in the National Institute of Standards and Technology (NIST) Federal Information Processing Standards Publication FIPS 140-2 or 140-3 specifications.” The ability for software to provide evidence of tampering is unlikely. The lack of FIPS certification for encryption might be an issue for U.S. Federal Government agencies and other organizations.
Data masking is the process of replacing, hiding, or omitting sensitive information from a specific dataset. Data masking is also known as Data Obfuscation.
Data masking is usually used to secure specific datasets such as PII or commercially sensitive data or to comply with specific regulations such as HIPAA or PCI-DSS.
Data masking is also widely used for test platforms where suitable test data is not available. Both techniques are usually applied when migrating, developing testing or protecting production environments from threats such as data exposure.
The primary methods of masking data are:
A new copy of the data is created with the masked values. It is usually effective when creating clean non-production environments.
It referred to as “on-the-fly” masking, adds a layer of masking between the application and the database. The masking layer is responsible for masking the information in the database on the fly” when the presentation layer accesses it. This type of masking is efficient when protecting production environments. It can hide the full credit card number from customer service representatives, but the data remains available for processing.
Table 2-3: Method of Masking Data
Conventional approaches to data masking include:
- Algorithmic Substitution: The value is replaced with an algorithm generated value (this usually permits for two-way substitution).
- Deletion: Uses a null value or deletes the data.
- Masking: Uses particular characters to hide certain parts of the data. Commonly applies to credit cards data formats: XXXX XXXX XX65 5432
- Random Substitution: The value is replaced with a random value.
- Shuffle: Shuffles different values from the dataset. Commonly from a similar column.
It was initially introduced by the Payment Card Industry (PCI) as a means to secure credit card information, but tokenization is now used to protect all types of sensitive data.
Tokenization is the method replacing sensitive data with a unique identification code, referred to as a token. The token is a collection of random values with the form and shape of the original data placeholder and mapped back to the original data by the tokenization solution or application.
Tokenization is not encryption and presents different challenges and benefits. Encryption is using a key to obfuscate data, while tokenization removes the data entirely from the database, replacing it with a mechanism to identify and access the resources.
Tokenization is used to protect the sensitive data in a protected, secure, or regulated environment
Tokenization can be implemented internally where there is a need for safe, sensitive data internally or externally using a tokenization service.
Tokenization can assist with:
- Complying with laws or regulations.
- Fewer risks of storing sensitive data and reducing attack vectors on that data
- Reducing the cost of compliance
It often seems that the cloud and the technologies that make it possible are evolving in multiple directions, all at once. It is hard to keep up with all of the new and innovative technology solutions that are being implemented across the cloud landscape.
Bit splitting typically involves splitting up and storing encrypted information across different cloud storage services. Depending on how the bit-splitting system is implemented, some or all the dataset are required to be available in order to be decrypted and read.
If a RAID 5 solution is used as part of the implementation, then the system can provide data redundancy as well as confidentiality protection, while ensuring that a single cloud provider does not have access to the entire dataset.
- Bit splitting between different jurisdictions/ geographies may make it harder to gain access to the complete dataset through a subpoena and other legal processes.
- Improvements to data security relating to confidentiality.
- It can be scalable, might be incorporated into secured cloud storage API technologies, and might reduce the risk of vendor lock-in.
- The whole dataset may not be required to be used within the same geographies that the cloud provider stores and processes the bits within, leading to the need to ensure data security on the wire as part of the security architecture for the system.
- Processing and re-processing the information to encrypt and decrypt the bits is a CPU-intensive activity.
- Storage requirements and costs are usually higher with a bit splitting system. Depending on the implementation, bit splitting can generate availability risks, since all parts of the data may need to be available when decrypting the information.
Bit splitting can use different methods, a large amount of which is based on “secret sharing” cryptographic algorithms:
- Secret Sharing Made Short (SSMS): Uses a three-phase process-encryption of information; use of Information Dispersal Algorithm (IDA), which is designed to split the data using care coding into fragments efficiently; then splitting the encryption key itself using the secret sharing algorithm.
- All-or-Nothing Transform with Reed-Solomon (AONT-RS): Integrates the AONT and remove or delete the coding. This method initially encrypts and transforms the information and the encryption key into blocks in such a way that the information cannot be recovered without using all the blocks, and then it uses the IDA to splitting the blocks into many shares that are distributed to different cloud storage services (As similar as in SSMS).
Redundant Array of Independent Disks (RAID): RAID storage uses many disks to provide fault tolerance, to improve overall performance, and to increase storage capacity in a system. It has many standard levels, and Level 5 is block interleaved distributed parity.
Subpoena: It is a legally binding order for the delivery of evidence.
This type of encryption enables processing of encrypted data without the need to decrypt the data. It allows the cloud customer to upload data to a Cloud Service Provider (CSP) for processing without the requirement to decipher the data first.
The advantages of homomorphic encryption are sizeable, with Cloud-based services benefitting most, as it enables organizations to protect data in the Cloud for processing while eliminating most confidentiality concerns.
Note that homomorphic encryption is a developing area and does not represent a mature offering for most use cases. Many of the current implementations represent “partial” implementations of homomorphic encryption; however, these are typically limited to particular use cases involving small amounts or volumes of data.
It is a departure from traditional business intelligence. In that it emphasizes interactive, visual analytics rather than static reporting. The goal of data discovery is to work with and enable people to use their intuition to find meaningful and essential information in data. This process usually consists of asking questions of the data in some way, seeing results visually, and refining the questions.
Contrast this with the traditional approach, which is for information consumers to ask questions, which causes reports to be developed, which are then fed to the consumer, which may generate more questions, which will generate more reports.
Data Discovery Methodologies
Developing companies consider data to be strategic assets and understand its importance to drive innovation, differentiation, and growth. However, leveraging data and transforming it into the real business value requires a holistic approach to business intelligence and analytics. This means going beyond the scope of various data visualization tools, and it’s dramatically different from the Business Intelligence (BI) of recent years.
These trends are driving the continuing evolution of the data discovery in the organizations and the cloud:
Agile Analytics and Agile Business Intelligence:
Business intelligence teams and data scientists are adopting more agile, iterative methods of turning data into business value. They perform data discovery processes more often and in ways that are more diverse.
When profiling new datasets for integration, seeking answers to new questions emerging present week based on previous week’s new analysis, or finding alerts about emerging trends that can be warranted new analysis work streams.
This type of projects, data discovery, is more critical and challenging. Not only the volume of data should be efficiently processed for discovery is more significant, but the diversity of source and formats presents challenges that make various traditional methods of data discovery is fail. In a big data cases, initiatives also involve rapid profiling of high-velocity big data that makes data profiling are complicated and less feasible using existing tool sets.
The ongoing shift towards real-time analytics has created a new class of use cases for data discovery. These use cases are valuable but need data discovery tools that are faster, more adaptive, and more automated.
Different Data Discovery Techniques:
Data discovery tools differ by technique and data matching abilities. Assume you wanted to find credit card numbers. Data discovery tools for databases use a couple of methods to find and then identify information. Most use unique login credentials to scan internal database structures, itemize tables and columns, and then analyze what was found. There are three basic analysis methods are employed:
In this form of analysis, we examine the data itself by employing pattern matching, hashing, statistical, lexical, or other forms of probability analysis
This is data that describes itself, and all relational databases store metadata that describes tables and column attributes.
When data elements are joined with a tag that describes the data, this can be completed at the same time the data is created, or tags can be added over time to provide additional references and information to describe the data. In multiple ways, it is just similar to metadata but slightly less formal.
Credit Card Example:
When we search a number that looks like a credit card number, a standard method is to perform a LUHN check on the number itself. This is a simple numeric checksum used by credit card companies to verify if a number of a credit card is valid. If the number, we discover passes the LUHN check, then it is a very high possibility that we have discovered a credit card number. Content analysis is a growing trend and one that is being used successfully in data loss prevention (DLP) and web content analysis products.
Credit Card Example:
We would examine column attributes to determine whether the name of the column or the size and data type, looks like a credit card number. If a column is a 16-digit number or the name is something like “Credit Card” or “CC#,” then we have a high possibility of a match. Moreover, the effectiveness of each product will vary depending on how well the analysis rules are implemented. This remains the most common analysis technique.
Some relational database platforms provide mechanisms to create data labels. However, this method is more usually used with flat files, becoming increasingly useful as more firms move to Indexed Sequential Access Method (ISAM) or quasi-relational data storage, such as Amazon’s SimpleDB, to handle fast-growing datasets. This form of discovery is same as to a Google search, with the higher number of same labels, the greater the possibility of a match. Effectiveness is dependent on the use of labels.
Table 2-4: Data Discovery Techniques
LUHN: The LUHN algorithm, also called as the modulus 10/mod 10 algorithms, is a simple checksum formula used to validate a type of identification numbers, such as IMEI numbers, credit card numbers, Canadian Social Insurance Numbers.
ISAM: It is a method for creating, maintaining, and manipulating indexes of key-fields extracted from random data file records to achieve fast recovery of required file records. IBM developed ISAM for mainframe computers.
Problems in Data Discovery:
There are following problems in data discovery:
In the dashboard, these are the following problems that may appear:
- Is the data accurate on the dashboard?
- Is the analytical method accurate?
- Most importantly, can important business decisions be based on this information?
Users modify data and fields with no audit trail, and there is no way to recognize who changed what. This disconnect can lead to uneven insight and flawed decisions, increase administration costs, and inevitably create numerous versions of the truth.
Security also poses issues with data discovery tools. IT staff usually have little or no control over these types of solutions, which means they cannot protect sensitive information. This can result in unencrypted data that is to be cached locally and viewed by or shared with unauthorized users.
A typical data discovery technique is to put all of the data into server RAM to take advantage of the fundamental input/output rate improvements over the disk.
Poor Data Quality
Data visualization tools are only as good as the information that is inputted. If organizations want an enterprise-wide data governance policy, they might be relying on incomplete or inaccurate information to create their charts and dashboards.
Having a big enterprise data, governance policy will help to lower the risk of a data breach. This consists defining rules and processes related to dashboard creation, distribution, ownership, and usage; creating restrictions on who can access what data; and make sure that employee follows their organizations’ data usage policies.
Data Discovery Challenges in the Cloud
There are many challenges with data discovery in the Cloud that are:
Not all the data stored in the Cloud can be accessed easily. Sometimes customers do not have the necessary administrative rights to access their data on demand. Long-term data can be visible to the customer but not accessible to download in acceptable formats to use offline.
The lack of data access might require specific configurations for the data discovery process, which in turn might result in additional time and expense for the organization. Data access requirements and capabilities can also change during the data life cycle. Archiving, DR, and backup sets tend to offer less control and flexibility for the end user. Also, metadata such as indexes and labels might not be accessible.
The ability to have data available on-demand, across almost any platform and access mechanism, is an incredible advancement concerning end-user productivity and collaboration. However, at the same time, the security implications of this level of access confound both the enterprise and the CSP, challenging them to find ways to secure the data that users are accessing in real time, from many locations, across many platforms.
Not knowing where data is, where it is going, and where it will be at any given moment with assurance presents significant security concerns for enterprise data and the confidentiality, integrity, and availability that is required to be provided by the Cloud Security Professional.
Maintenance and Preservation:
Ensure that preservation requirements are documented and supported by the Cloud provider as part of the Service Level Agreement (SLA).
If the time required for preservation exceeds what has been documented in the provider SLA, the data might be lost. Long-term preservation of data is possible and can be managed through an SLA with a provider as well. However, the issues of data granularity, access, and visibility all could need to be considered when planning for data discovery against long-term stored datasets.
Data classification as a part of the Information Life Cycle Management (ILM) process can be defined as a tool for categorization of data to enable or help the organization in a very effective way.
Data classification is a process that is recommended for implementing data controls such as encryption and DLP. Data classification is also a requirement of specific regulations and standards, such as ISO 27001 and PCI-DSS.
Types of Data Classification
There are different reasons for implementing data classification and therefore many different parameters and categories for the classified data. Also, some of the generally used classification types are:
- Business constraints or contractual
- Data type (structure, format)
- Jurisdiction (of domiciled, origin) and other legal constraints
- The obligation for retention and preservation
- Trust levels and source of origin
- Criticality, sensitivity, and value (to the organization or the third party)
The classification categories should match the data controls to be used.
When using encryption, data can be classified as “not to encrypt” or “to encrypt.” For DLP, other types such as “limited sharing” and “internal use” that could be required to classify the data correctly.
Classification and labelling relationship—data labelling is commonly referred to as tagging the data with additional information such as creator, department, and location. One of the labelling options can be classified according to a specific criterion that is top secret, secret, and classified.
Therefore, classification is typically considered as the part of data labeling. Classification can be manual or automatic based on policy rules.
Challenges with the Cloud Data
There are some challenges in this area consists of:
- Classification Controls: This could be administrative (as guidelines for users who are creating the data), compensating, or preventive.
- Classification Data Transformation: This could be placed to ensure that the relevant metadata or property can survive data object format changes and cloud imports or exports.
- Data Creation: The CSP needs to make sure that proper security controls are in place so that whenever data is modified or created by anyone, they are forced to update or classify the data as part of the creation or modification process.
- Metadata: Classifications can be made based on the metadata that is attached to the file, such as location or owner. This metadata could be accessible to the classification process to make the proper decisions.
- Reclassification Consideration: Cloud applications should support a re-classification process based on the data life cycle. Sometimes the new classification of a data object could mean enabling new controls such as retention or encryption and disposal.
Privacy and Data Protection (P&DP) issues are often cited as a concern for cloud computing scenarios. The P&DP regulations affect not just those organization whose personal data is processed in the cloud but also those organizations who are using cloud computing to process others’ data and indeed those providing cloud services used to process that data.
The world wide economy is suffering an information explosion; there has been an enormous growth in the complexity and volume of global data services: personal data is now vital source, and its privacy and protection have become essential aspects to enabling the acceptance of cloud computing services.
The following ways in which different countries and regions around the world are addressing the varied legal issues they face.
Global P&DP Laws in the United States
There is no single federal law, privacy is recognized differently in individual states and specific circumstances.
- COPPA: Children’s Online Privacy Protection Act 1998
- ECPA and SCA: Electronic Communication Privacy Act and Stored Communications Act, from 1986. It is older law in the US that means to limit wiretaps.
- FERPA: Family Educational Rights and Privacy, to protect student data
- HIPAA: Health Insurance Portability and Accountability Act — Dept Health & Human Services
- GLBA: Gramm-Leach-Bliley Act, a.k.a. Financial Modernization Act of 1999, run by Federal Deposit Insurance Corporation (FDIC).
- SOX: Sarbanes-Oxley — created in response to Enron corporate corruption, for publicly-traded companies
- Safe Harbor Program: Developed by Dept of Commerce and EU, now discontinued, replaced by EU-U.S. Privacy Shield.
Global P&DP Laws in the European Union (UN)
Data Protection Directive 95/46/EC: This act applies to paper records and electronics but does not apply to purely household or personal activities, or in operations related to state security or public safety.
GDPR: General Data Protection Regulation, it is updated 95/46/EU to include:
- Access requests
- Role establishing of the data protection officer
- Home state regulation
- Increased sanctions
- Transfers abroad
- The right to be forgotten
Global P&DP Laws in APEC
Asia-Pacific Economic Cooperation: It is a privacy framework that makes sure the free flow of information and open conduct of business within the region while protecting privacy.
Difference between Applicable Law and Jurisdiction:
For P&DP, it is particularly important to distinguish between the concepts of:
This determines the legal regime applicable to a specific matter.
This typically determines the ability of a national court to decide a case or enforce a judgment or order.
Table 2-5: Applicable Law and Jurisdiction
The solutions provide an adequate foundation for useful application and governance for any of the P&DP fulfillments.
The customer role of data controller has full responsibility for compliance with the P&DP laws obligations; therefore, the implementation of data discovery solutions together with data classification techniques provide him or her with a sound basis for operatively specifying to the service provider the requirements to be fulfilled and for performing effective periodic audit according to the applicable P&DP laws, also for demonstrating, to the competent privacy authorities, his due accountability according to the applicable P&DP laws.
Service Provider’s Perspective:
The service providers, in the role of data processor, have to implement and can demonstrate they have implemented in a clear and objective way the rules and the security measures to be applied in the processing of private data on behalf of the controller; thus data discovery solutions jointly with data classification techniques provide an effective enabler factor for their ability to comply with the controller P&DP instructions.
Additionally, the service provider will particularly benefit from this approach:
- Its service provider responsibility to operatively support the controller when a data subject exercises his/her rights. Thus, it is required information about which data is processed or to implement actions on this data (e.g., correct or destroy the data).
- For its duty to detect, promptly report to the controller, and adequately manage the personal data breaches concerning the applicable P&DP obligations.
- When the service provider has to support the controller in any of the P&DP obligations concerning the application of rules or prohibitions of personal data transfer through many countries.
- When the service provider involves sub-service providers, in order to clearly trace and operatively transfer to clients. The P&DP requirements according to the process assigned by the service provider.
For compliance with the applicable Privacy and Data Protection (P&DP), laws play an essential part in the adequate control of the elements that are the feeds of P&DP fulfillments. This means that not only the “nature” of the data could be traced with classification but its relationship to the P&DP Act in which the data itself could be processed as well.
The P&DP fulfillments, and especially the security measures required by these laws, can always be expressed at least regarding a set of primary entities:
The Purpose and Scope of Processing:
It represents the main impression that effects the entire set of typical P&DP fulfillments.
Processing for accounting and administrative purposes requires fewer fulfillments when compared with the processing of traffic telephone or Internet data for the purpose of mobile payment services, since the cluster of data processed (personal data of the subscriber, his/her billing data, the kind of purchased objects) assumes a more critical value for all the stakeholders involved. The P&DP laws consequently require more obligations and a higher level of protection.
Categories of Personal Data to be Processed:
Data means here the type of data as identified for a P&DP law, and usually, it is entirely different from the “nature” of the data, that is, its intrinsic and objective value. In this sense, data categories include
- Personal data
- Sensitive data (health, political belief, religious belief, etc.)
- Biometric data
- Telephone or Internet data
- Categories of the processing to be performed
From the perspective of the P&DP laws, the processing means an operation or a set of combined operations that can be materially applied to data; therefore, in this sense processing can be one or more of the following operations:
In the derivation of these, a secondary set of entities is relevant for P&DP fulfillments:
- Data location allowed.
- According to the applicable P&DP laws, there are prohibitions or constraints to be observed, and this could be accurately reflected in the classification of data to act as a driver in allowing or blocking the moving of data from one location to another one.
Categories of users allowed
Accessibility of data for a specific category of users is another essential feature for the P&DP laws.
The role of the backup operator should not be able to read any data in the system, even though the operator role will need to be able to interact with all system data to back it up.
The majority of the categories of data processed for specific scopes and purposes must be retained for a determined period (and then erased or anonymized) according to the applicable P&DP laws.
Data-retention periods are to be respected for access logs concerning the accesses made by the role of system administrator, and there are data-retention periods to be respected for the details concerning the profiles defined from the “online behavior” of Internet users for marketing. Once the retention period has ended, the legal ground for retention of the data disappears, and therefore any additional processing or handling of the data becomes unlawful.
Security Measures to be Ensured
The type of security measures can widely vary depending on the purpose and data to be processed. Typically, they are expressed in terms of
- Accurate classification of the data regarding security measures will provide the basis for any approach of control based on data leakage prevention (DLP) and data-protection processes.
- Necessary security measures to make sure a minimum level of security regardless of the type of data/purpose/processing
- Specific measures according to the type of data/purpose/processing
- Measures identified regarding output from a risk analysis process, to be operated by the Controller or processor considering the risks of a specific context. It may be technical or operational, and it cannot be mitigated with the measures of the previous points.
Data Breach Constraints
Several P&DP laws around the world already provide for specific obligations regarding a data breach. These obligations essentially require to do the following:
- Notify the competent DPA within tighter time limits and also notify some specific cases set forth by law, the data subjects
- Follow a specific process of Incident Management, including activation of measures aimed at limiting the damages to the concerned data subjects
- Handle a secure archive concerning the occurred data breach
Therefore, data classification that can take into account the operational requirements coming from the data breach constraints becomes essential, especially in the Cloud services context.
As a consequence of events such as a data breach, data might be left in a specific state that may require various necessary actions or a state where specific actions are prohibited. The precise identification of this status regarding data classification might be used to direct and oversee any further processing of the data according to the applicable laws.
It provides the primary input entity for data classification about P&DP.
Table 2-6 – Main Input Entities for Data Classification for P&DP Purposes
There are various requirements to be mapped out in privacy acts, a central role of the Cloud Security Professional is the mapping of those requirements to the actual security controls and processes in place with both the application and the Cloud environment under the responsibility of the Cloud provider. For large applications that could span various jurisdictions and privacy act, this will be of specific importance. The Cloud customer and Cloud provider will need to work together through proper contractual or SLA requirements to make sure compliance for both parties regarding the requirements for regulatory bodies or applicable privacy acts.
The efficient application of the defined controls for the protection of PII is generally affected by the cluster of providers or sub-providers involved in the operation of specific Cloud service; therefore, any attempt to provide guidelines for this can be made only at a general level.
Since the application of data-protection measures has the ultimate goal to fulfill the P&DP laws applicable to the controller, any constraints arising from specific arrangements of a Cloud service operation will be made clear by the service provider to avoid any consequences for unlawful personal data processing.
About servers located across numerous countries, it could be challenging to make sure the proper application of measures such as encryption for sensitive data on all systems.
Additionally, the service providers might benefit from making explicit reference to standardized frameworks of security controls expressly defined for Cloud services.
Cloud Controls Matrix (CCM)
An essential and up-to-date security controls framework is addressed to the Cloud community and stakeholders. A fundamental richness of the CCM is its ability to provide mapping or cross relationships is widely accepted in industries’ security standards, regulations, and controls frameworks such as the ISO 27001/27002, ISACA’s COBIT, and PCI-DSS.
The CCM can be seen as an inventory of Cloud service security controls, arranged in the following separate security domains:
- Application and Interface Security
- Audit Assurance and Compliance
- Business Continuity Management and Operational Resilience
- Change Control and Configuration Management
- Data Security and Information Life Cycle Management
- Data Center Security
- Encryption and Key Management
- Governance and Risk Management
- Human Resources
- Identity and Access Management
- Infrastructure and Virtualization Security
- Interoperability and Portability
- Mobile Security
- Security Incident Management, E-Discovery, and Cloud
- Supply Chain Management, Transparency, and Accountability
- Threat and Vulnerability Management
Although all the CCM security controls can be considered applicable in a specific CS context, from the privacy and data-protection perspective, some of them have greater relevance to the P&DP fulfilments.
Therefore, the selection and implementation of controls for a specific Cloud service involving processing of personal data will be performed:
- Within the context of information security managed system: this requires at least the identification of law requirements, risk analysis, design and implementation of security policies, and related assessment and reviews.
Data right management is an extension of normal data protection, where additional control and ACLs are placed on the data sets that require additional permission or condition to access and use beyond just simple and traditional security controls. This is encapsulated with the concepts of Information Right Management (IRM).
Information Rights Management (IRM) is not only the use of standard encryption technologies to provide confidentiality for data; it also provides use cases and features as well:
IRM adds a layer of access controls on top of the document and data object. The Access Control List (ACL) determines who can open the document and what they can do with it and provides granularity that flows down to copying, printing, saving, and similar options. Because IRM contains ACLs and is embedded into the actual file, IRM is agnostic to the location of the data, distinct other preventative controls that depended on file location. IRM protection will travel with the file and provide continuous protection.
IRM is useful for securing sensitive organization content such as financial documents. However, it is not securing to only documents; IRM can be implemented to protect database columns, emails, web pages, and other data objects. Also, it is useful for setting up a baseline for the default Information Protection Policy, that is, all documents created by a specific user, at a specific location, will receive a specific policy.
IRM Cloud Challenges
IRM requires that all users with data access should have matching encryption keys. This requirement means robust identity infrastructure is necessary when implementing IRM and the identity infrastructure have to expand to partners, customers, and any other organizations with which data is shared.
- IRM needs that each resource will be provisioned with an access policy. Each client accessing the resource will be provisioned with account and keys. Provisions could be made securely and efficiently for the implementation to be successful. Automation of provisioning of IRM resource access policy can help in implementing that goal. Automated policy provision can be based on file location, the origin of the document or keywords.
- Access to resources can be granted per user bases or according to user role using an RBAC model. Provisioning of users and roles must be integrated into IRM policies. Since in IRM most of the classification is in the user responsibility, or based on automated policy, implementing the right RBAC policy is essential.
- Identity infrastructure can be implemented by creating a single location where users are created and authenticated or by creating federation and trust between different repositories of user identities in different systems. They carefully consider the most appropriate method based on the security requirements of the data.
- Many IRM implementations will force end users to install a local IRM agent either for key storage or for authenticating and retrieving the IRM content. This feature might limit certain implementations that involve external users and must be considered as a part of the architecture planning before deployment.
- When reading IRM-protected files, the reader software must be IRM-aware Microsoft, and Adobe products in their latest versions have high-quality IRM support, but other readers might encounter compatibility issues and have to be tested before deployment.
- The challenges of IRM compatibility with different operating systems and different document readers increase when the data needs to be read on mobile devices. The usage of mobile platforms and IRM have also be tested carefully.
- IRM can integrate into other security controls such as DLP and documents discovery tools, adding extra benefits.
Note: Data Rights Objectives like provisioning, users and roles, role-based access cover in the IRM Cloud Challenges section.
RBAC Model: Role-Based Access Control (RBAC) is an approach to restricting system access to authorized users
Instead of focusing on specific software technologies or implementations for IRM, we will focus on the common attributes and features of those tool sets and what they can provide for IRM and security:
- Auditing: IRM technologies allow for robust auditing of who has viewed information, also provide proof as to when and where they accessed the data.
- Expiration: IRM technologies allow for the expiration of access to data.
- Policy Control: IRM technologies allow an organization to have very granular and detailed control over how their data is accessed and used.
- Protection: With the implementation of IRM technologies and controls, any information under their protection is always protected.
- Support for Applications and Formats: Most IRM technologies support a range of data formats and integration with application packages commonly used within organizations, such as e-mail and various office suites.
Protection Data policy should be active and ensure compliance with regulatory or corporate compliance requirements, the concepts of retention, deletion, and archiving are the most important. While other data policies and practices are focused on the security and production usage of data. These three concepts are typical of those that fulfill regulatory obligations, which can come in the form of legal requirements or mandates from certification or industry oversight bodies.
It is a policy that is established within an organization, a protocol for saving information for regulatory or operational compliance needs. The goals of a data-retention policy are to save most crucial information for reference or future use, to organize information so it can be searched and accessed later, and to dispose of the information that is no longer needed. The policy balances the regulation, legal and business data archival requirements against data storage costs, complexity, and other data considerations.
A good data-retention policy should define Retention periods, Data formats, Data security, and Data-retrieval procedures for the enterprise. A data-retention policy for Cloud services must contain the following components:
Data classification is based on business usage, compliance requirements, locations, ownership, or in other words, its “value.” It is used to decide about the proper retention procedures for the enterprise as well.
Data mapping is a process of mapping all relevant data to understand data formats, data locations (databases, network drives, object, or volume storage), data types (structured or unstructured) and file types.
For each data category, the data-retention procedures must be followed based on the proper data-retention policy that governs the data type. How long the data is to be saved, where (physical location, and jurisdiction), and how (which technology and format) must all be spelled out in the policy and implemented through the procedure. The procedure must include backup options, retrieval requirements, and restoration procedures, as required and necessary for the data types is to be managed as well.
Monitoring and Maintenance
It is a procedure for ensuring that the entire process is working, including a review of the policy and requirements to ensure that there are no changes.
Legislation, Regulation, and Standard Requirements
Data-retention considerations are entirely dependent on the data type and the required compliance regimes associated with it.
According to the Basel II Accords for Financial Data:
“The retention period for financial transactions must be between three to seven years, while according to the PCI-DSS version 3.0 Requirement 10, complete access to network resources and cardholder data and credit card transaction data must be saved for at least a year with at least three months available online.”
PCI-DSS version 3.0 Requirement 10: Clarified the intent and scope of daily log reviews.
Basel II: Basel II is the second of the Basel Accords offering recommendations on banking laws and regulations issued by the Basel Committee on Banking Supervision.
The primary objective of data-protection procedures is the safe disposal of data once it is becomes useless. Failure to do so may result in data breaches and compliance failures. Safe disposal procedures are designed to make sure that there are no files, pointers, or data remnants left behind in a system that can be used to restore the original data.
A policy is required for the following reasons:
1.Technical and Business Requirements:
The business policy could require safe disposal of data. Process such as encryption may require safe disposal of clear text data after creating the encrypted copy.
2.Legislation or Regulation:
Certain regulations and laws need specific degrees of safe disposal records.
In a Cloud environment, restoring deleted data is not an easy job for an attacker because Cloud-based data is distributed, usually is to be stored in a different physical location with unique pointers. Achieving any level of physical access to the media is a challenge.
On the other hand, it is still an actual attack vector that you must consider when evaluating the business requirements for data disposal.
To save a deletion of electronic records, the following options are available:
Degaussing: Using strong magnets for scrambling data on magnetic media such as tapes or hard drive.
Overwriting: Writing random data over the actual data. The more times the overwriting process occurs, the more thorough the destruction of the data is considered to be.
Physical Destruction: Physically destroying the media by shredding, incineration or other means.
Encryption: To use an encryption method to rewrite the data in an encrypted format to make it unreadable without the encryption key.
Since the first three data disposal options are not entirely suitable for Cloud computing, the only remaining suitable method is the encryption ofdata. The process of encrypting the data to be disposed of. It is known as crypto shredding or digital shredding.
Crypto-shredding is the process of deliberately destroying the encryption keys that were used to encrypt the data initially. Since the data is encrypted with the keys, the result is that the data is made unreadable (at least until the encryption protocol used can be broken or is capable of being brute-forced by an attacker).
To perform a proper crypto-shredding, consider the following points:
- The data must be completely encrypted without leaving any clear text remaining.
- The technique should ensure that the encryption keys are entirely unrecoverable. This can be hard to achieve if an external Cloud provider or other third party manages the keys.
A process of identifying and moving inactive data out of existing Cloud environment and into specialized long-term archival storage systems. Moving inactive data out of Cloud environment optimizes the performance of resources required there. Specialized archival systems store information more efficiently and provide for retrieval when needed. A policy for the Cloud must contain the following elements:
Ability to perform eDiscovery and Granular Retrieval
Archive data can be subject to retrieval according to specified parameters such as authors, dates, subject, etc. The archiving platform must provide the ability to do eDiscovery on the data to decide which data must be retrieved.
Backup and Disaster Recovery Options:
All the needs for data backup and restore must be specified and documented. It is essential to make sure that the business continuity and disaster recovery plans are updated and aligned with whatever procedures are implemented.
Long-term data archiving with encryption could present a challenge for the organization about key management. The encryption policy should consider which media is used, the restoral options, and what the threats are that should be mitigated by the encryption. Bad key management could lead to the destruction of the entire archive and therefore requires attention.
Data Format and Media Type:
The format of the data is a significant consideration because it might be saved for an extended period. Proprietary formats can modify, thus leaving data in a useless state, so selecting the suitable format is most important. The similar consideration should be made for media storage types as well.
Data Monitoring Procedure:
Data stored in the Cloud tend to be replicated and moved. To maintain data governance, it is necessary that all data access and movements be tracked and logged to ensure that all security controls are to appropriately applied throughout the data life cycle.
Data Restoration Procedure:
Data restoral testing must start periodically to ensure that the whole process is working. The trial data restore must be made into an isolated environment to lesser risks, such as restoring an old virus or accidentally overwriting existing data.
Inactive data is data that does not “show up” in your file tree, also known as your data structure or directory. Even though, this data resides on your hard drive, it cannot be accessed by the operating system and therefore it can no longer be maintained by it.
eDiscovery: Electronic discovery is a method in which electronic data is sought, secured, located, and explored with the intent of using it as evidence in a criminal or civil legal case.
Events can be defined as things that happen. Not all events are important, but many are, and being able to discern which events you need to pay attention to can be a challenge.
Event sources are monitored to provide the raw data on events that will be used to clarify a system to be monitored. Event attributes are used to identify the type of information and data that are associated with an event that you would want to capture for analysis. Depending on the number of events and attributes being tracked, a large volume of data will be produced. This data will be stored and then analyzed to uncover patterns of activity that might identify vulnerabilities or threats, existing in the system that have to be addressed.
Security Information and Event Management (SIEM)
SIEM systems are used to collect and analyze the data flow from several systems, allowing for the automation of this process.
Events are essential and available for capture, will vary and depend on the particular Cloud service model employed. These include IaaS, PaaS, and SaaS.
IaaS Event Sources:
With an IaaS environment, Cloud Service Provider (CSP) usually will have control of, and access to event and diagnostic data. More or less all infrastructure level logs will be visible to the CSP, along with detailed application logs. To maintain a reasonable investigation auditability, traceability, and accountability of data, it is suggested that you specify the required data access requirements in the Cloud Service Level Agreement (SLA) or contract with the CSP.
The following logs could be essential to examine at some point but could not be available by default:
- API access logs
- Billing records
- Hypervisor and host operating system logs
- Logs from DNS servers
- Management portal logs
- Network or Cloud provider perimeter network logs
- Packet captures
- Virtual machine monitor (VMM) logs
PaaS Event Sources:
With a PaaS environment, the user usually will have control of, and access to event and diagnostic data. A few infrastructure-level logs will be visible to the CSP, along with detailed application logs. Because the applications that will be monitored are being built and designed by the organization directly, the level of application data that can be extracted and monitored is up to the developers.
To maintain reasonable investigation auditability, traceability, and accountability of data, it is suggested that you work with the development team to understand the capabilities of the applications under development and to help design and implement monitoring regimes that will maximize the organization’s visibility into the applications and their data streams.
OWASP recommends the following application events is logged:
- Authentication successes and failures
- Authorization (access control) failures
- Application errors and system events, such as syntax and runtime errors, etc.
- Application and related systems start-ups, logging initialization (starting, pausing or stopping) and shut-downs.
- Legal such as permissions for mobile phone capabilities, terms and conditions etc.
- Output validation failures such as invalid data encoding and database recordset mismatch.
- Session management failures such as cookie session identification value modification.
- Use of higher-risk functionality such as network connections, addition or deletion of users, changes to privileges, assigning users to tokens, adding or deleting tokens, Data Event Logging, etc.
SaaS Event Sources:
With a SaaS environment, the user will usually have minimal control of, and access to event and diagnostic data. Most of the infrastructure level logs will not be visible to the CSP, and they will be limited to the high-level, application-generated logs that are located on a client endpoint. In order to maintain a reasonable investigation capabilities, auditability, and traceability of data, it is suggested to identify required data access requirements in the Cloud SLA or contract with the CSP.
The following data sources play a significant role in event examination and documentation:
- Application server logs
- Billing records
- Database logs
- Guest operating system logs
- Host access logs
- Network captures
- SaaS portal logs and Virtualization platform logs.
- Web server logs
To be able to perform efficient audits and examinations, the event log must contain as much as possible of the relevant data for the processes that are being examined as possible. OWASP recommends the following data event logging and event attributes to be integrated into event data.
Who (Any user):
Log date and time (international format).
Application identifiers, such as name and version
Source address, including the user’s device/machine identifier, user’s IP address, mobile telephone number
Type of event
Event date and time. The event time stamp might be different to the time of logging,
Application address such as port number, workstation identity, and local device identifier
User identity (if authenticated), including the user database table primary key value, username, and license number
The severity of the event (0=emergency, 1=alert, …, 7=debug), (fatal, error, warning, info, debug, and trace)
Service name and protocol
Security-relevant event flag (if the logs contain non-security event data too)
Window/page/forms, such as entry point URL and HTTP method
Code location, including the script and module name
Table 2-7: Shows an Identify Attributes
“A process to maintain and safeguard the integrity and original condition of the potential digital evidence” defined by ISO 27037:2012
With the volume of logs created and collected for any application or system, there is a need for a method or technology to catalog and make those events reportable or searchable. Without having a system in place to synthesize and process the event data, there could mainly be a large amount of data collected that does not serve any meaningful or useful purpose, and that is not accessible for the fulfillment of regulatory and auditing requirements.
The most important techniques used for this type of operation is called a Security and Information Event Management (SIEM) system.
Security and Information Event Management (SIEM):
SIEM is a term for product services and software that merges Security Information Management (SIM) and Security Event Management (SEM). SIEM technology provides real-time analysis of security alerts generated by network hardware and applications.
Figure 2-4 – Security and Information Event Management
SIEM is sold as appliances, software, or managed services and is also used to log security data and generate reports for compliance purposes.
The security management segment that deals with real-time monitoring, correlation of events, notifications, and console views is called Security Event Management (SEM). The second part provides analysis, long-term storage, and reporting of log data. It iscalled Security Information Management (SIM).
SIEM systems will usually provide the following capabilities:
The automated analysis of correlated events and production of alerts, to inform recipients of instant issues. Alerting can be to a dashboard or sent through third-party channels such as email.
Looking for linking events and common attributes together into meaningful bundles. This technology provides the ability to perform a variety of correlation techniques to integrate different sources, to turn data into useful information. Typically, correlation is a function of the Security Event Management, part of a full SIEM solution.
Applications can be employed to automate the gathering of auditing processes, compliance data, governance, and producing reports that adapt to existing security.
Log management aggregates data from numerous sources, including applications, databases, network, security, servers, and providing the ability to consolidate monitored data to help avoid missing important events.
Tools that can take event data and turn it into informational charts to assist in seeing patterns or identifying activity that is not creating a standard pattern.
The ability to search across logs on different nodes and periods based on specific criteria. This should aggregate log information in your head or having to search through thousands of logs.
Employing long-term storage of old data to facilitate correlation of data over time and to provide the retention necessary for compliance requirements. Long-term log data retention is essential in forensic examinations as it is unlikely that the discovery of a network breach being at the time of its action.
To support continuous operations, the following principles have to be adopted as part of the security operations policies:
New Event Detection
The objective of auditing is to identify new information security events. Policies have to be created that define what a security event is and how to address it.
Adding New Rules
Rules are built to permit detection of new events. Rules permit for the mapping of expected values to log files to detect events. Mode of continuous operation, rules have to be updated to address new risks.
Reduction of False Positives
The continuous operations audit logging quality is reliant on the ability to decrease over time the number of false positives to maintain operational effectiveness. This requires continuous improvement of the rule set in use.
Chain of custody is the protection of evidence and preservation from the period it is collected till the time it is presented in court. In order for evidence to be considered admissible in court, documentation should exist in many ways such as collection, condition, possession, transfer, and location. Its access and analysis performed on an item from acquisition through eventual final disposition. This concept is referred to as the “chain of custody” of evidence.
Creating a verifiable chain of custody for evidence within a Cloud-computing environment where there are several data centers spread across different jurisdictions can become challenging. Sometimes, the only way to provide for a chain of custody is to include this provision in the service contract and make sure that the Cloud provider will comply with requests relating to chain of custody issues.
Nonrepudiation is the ability to confirm the origin or authenticity of data to a high degree of certainty. This usually is complete through hashing and digital signatures, to make sure that data has not been changed from its actual form. This idea plays directly into and complements chain of custody to make sure the integrity and validity of data.
In this chapter, Cloud data security covers a wide range of areas focused on the concepts, structures, principles, and standards used to monitor and secure assets. Also, the controls used to apply numerous levels of availability, confidentiality, and integrity across IT services through the organization. Cloud Security Professionals focused on Cloud security need to use and apply standards to make sure that the systems under their protection are maintained and supported correctly.
Security specialists understand the different security frameworks, standards. They adopt best practices leveraged by numerous methodologies and how they may be used together to make systems stronger. Information security governance and risk management have enabled information technology to be used safely, responsibly, and securely in environments like never before. The ability to establish strong system protections based on standards and policy and to assess the level and efficacy of that protection through auditing and monitoring is vital to the success of Cloud computing security.
Figure 2-5: Mind Map of Cloud Data Security
a) Volume and block
b) Structured and object
c) Unstructured and ephemeral
d) Volume and object
b) Service Level Agreement(s)
d) Continuous monitoring
a) At the application using the database
b) On the instance(s) attached to the volume
c) In a key management system
d) Within the database
a) One who cannot be identified, directly or indirectly, in particular by reference to an identification number or other factors specific to his/her physical, physiological, mental, economic, cultural, or social identity
b) The natural or legal person, public authority, agency, or any other body which alone or jointly with others, determines the purposes and means of processing of personal data
c) A natural or legal person, public authority, agency, or any other body, which processes personal data on behalf of the customer.
d) None of the above
a) Persistent protection, dynamic policy control, automatic expiration, continuous audit trail, and support for existing authentication infrastructure
b) Persistent protection, static policy control, automatic expiration, continuous audit trail, and support for existing authentication infrastructure
c) Persistent protection, dynamic policy control, manual expiration, continuous audit trail, and support for existing authentication infrastructure
d) Persistent protection, dynamic policy control, automatic expiration, intermittent audit trail, and support for existing authentication infrastructure.
a) Physical destruction
a) Management, provisioning, and location
b) Function, location, and actors
c) Actors, policies, and procedures
d) Life cycle, function, and cost
a) Raw and block
b) Structured and unstructured
c) Unstructured and ephemeral
d) Tabular and object
a) On a user’s workstation
b) In the storage system
c) Near the gateway
d) On a VLAN
a) Metadata, labels, and content analysis
b) Metadata, structural analysis, and labels
c) Statistical analysis, labels, and content analysis
d) Bit splitting, labels, and content analysis
a) A set of regulatory requirements for cloud service providers
b) An inventory of cloud service security controls that are arranged into separate security domains
c) A set of software development life cycle requirements for cloud service providers
d) An inventory of cloud service security controls that are arranged into a hierarchy of security domains
a) Retention periods, data access methods, data security, and data retrieval procedures
b) Retention periods, data formats, data security, and data destruction procedures
c) Retention periods, data formats, data security, and data communication procedures
d) Retention periods, data formats, data security, and data retrieval procedures
a) Application logging, contract/authority maintenance, secure disposal, and business continuity preparation
b) Audit logging, contract/authority maintenance, secure usage, and incident response legal preparation
c) Audit logging, contract/authority maintenance, secure disposal, and incident response legal preparation
d) Transaction logging, contract/authority maintenance, secure disposal, and disaster recovery preparation
a) Billing records
b) Management plane logs
c) Network captures
d) Operating system logs
a) On the application server
b) On the user’s device
c) On the network boundary
d) Integrated with the database server
b) Policy control
c) Chain of custody
d) None of the above