Hora da T.I.: Cloud Forensics: Challenges and Possible Solutions

Introduction

The cloud computing paradigm basically consists in moving applications, services and data to virtual distributed infrastructures. It is further characterized by the provision of computing power and storage as a service, responding to varying client demands. In this paradigm, instead of deploying own applications and infrastructure, a client relies on a \textit{cloud service provider} to obtain such resources as services on demand. This dramatically improves the availability, reliability and scalability of applications and systems, while reducing the total cost of ownership. On the other hand, the client does not have direct access to the underlying information resources, having to implicitly trust the provider.

Digital information resources are commonly used to perpetrate illegal actions and crimes, which may leave behind relevant evidence. Digital forensics and investigation methods are employed in order to obtain such evidence and solve digital crimes. Forensic analysis of information resources may focus on different layers (\textit{e.g.} file systems, network, volatile memory) to obtain the information necessary to elucidate a given malicious activity. For example, an investigator's need may range from retrieving a sole deleted file to reconstructing network activity and running processes. Several methods have been proposed to address different forensic purposes, all of them requiring physical access to the resources involved.

It is clearly impractical to track data, services and communications as they migrate between different underlying resources in a cloud computing environment. It may even be impossible to determine where a given piece of data is stored or which system originated a given network connection. Since it is practically impossible to physically access the virtual distributed data and communications of a cloud computing environment, regular forensics methods are not effective in such scenarios. Moreover, depending on implementation specific characteristics, no adequate log generation mechanisms may be available, increasing the difficulties in analysing and auditing malicious activities in these systems.

Since it is impossible to obtain of physical access to cloud computing resources, one of the viable approaches is applying a digital forensics framework that acts as a middleware between cloud applications and the underlying computing and storage resources. It would capture relevant information on filesystem, network and operating system operations as they are requested by applications and processed by the cloud infrastructure. This information would be aggregated at a central repository for further analysis in case of future investigations. Being completely application agnostic and software based, this solution is compatible with current environments and may be easily deployed.

Digital Forensics

Digital forensics aims at unearthing such evidence and providing valuable data on the possibly malicious actions conducted on a digital resource. Notice that by digital resource we mean any information stored, generated or transmitted by digital information resources, e.g. images, documents, network traffic and files. The different steps of a digital investigation process may be summarized as follows:

1) Preservation: The first step of any investigation processes, digital or not, is to preserve evidence and subsequently collect it in conditions useful for further examination. Once an incident is identified, actions must be taken in order to ensure that the relevant affected digital resources remain unaltered, preserving the same state as immediately after the incident occurred.These actions may vary according to the type of digital sources and collection process.

2) Collection This process usually involves copying the evidence to the system where it will be actually analysed. Observe that precautions must be taken not to modify the evidence data as it is copied, since it would affect the accuracy of investigation or even render them ineffective. Collection is performed under two main scenarios, namely: live and post-mortem. The classical collection process involves capturing evidence (mostly filesystem data) from a system that was previously taken offline (often after the incident occurrence). In a live collection process, digital evidence is collected from a running system, which may possibly be actively serving client requests. This process aims at taking a snapshot of the system's current state, allowing the investigator to analyse items such as RAM memory content and volatile operating system parameters.

3) Validation and Identification: The validation phase consists in ensuring that the acquired forensic was not corrupted in the collection process, providing accurate data for the investigation process. Validation is usually carried out using hashing techniques, which allow the investigator to efficiently identify modification in the collected evidence by comparing its hashes with hashes of the original data. The same hashing techniques can be used for identification, a process that aims at assigning unique tags to the collected evidence, allowing the investigator to determine to which system or resource the evidence is related.

Here we will focus on preservation, collection, validation and identification of evidence on cloud environments, since the other steps can be carried out with standard tools.

Network Forensics

Traditionally, digital forensics has been performed on filesystems and individual files. However, since most current information resources are connected to a network (be it local or the Internet) and many applications have been moved to the web, much valuable evidence is lost when considering only locally stored or processed data..Network forensics method have been developed to address this problem, providing means of analysing network traffic and identifying relevant evidence among it. It has also been observed that network forensics may help increase information security in networked systems. Network traffic may provide useful information on the activities conducted on a given networked information resource, help determine remote attacker's identities and techniques and even contain information (such as documents and messages) that would not be retrieved in a local storage device.

Cloud Computing

The constant and rapid growth in the volume of data and communications in current networks and information systems raises the need for efficient scalable data storage and processing techniques. In order to address this issue, a new paradigm commonly called cloud computing was introduced. In this section we present the main characteristics of current cloud computing models and widely adopted architectures.

The main characteristic of the cloud computing paradigm is the transparent decentralization of data processing and storage through clustered environments that seamlessly scale to fit the constantly increasing demand, offering high performance and achieving efficient response times. Cloud computing based applications and systems run on virtual environments that may be distributed across several different physical information resources, migrating over different resources in order to respond to client demand. Such virtual environments are ubiquitously accessible through networks and provide diverse applications as network services. This approach has several advantages over the traditional centralized client server architecture, since it eliminates single points of failure (improving availability and reliability) and provides seamless scalability for constantly growing demands.

Architectures and models for cloud computing address mainly two scenarios: massive data processing and providing service for final users. Frameworks such as MapReduce focus on storing and processing large volumes of data, acting as a back-end service for data intensive systems and providing information for end user applications. Such environments are not usually directly accessed by end users. In this paper, we focus on frameworks such as Amazon EC2 \cite{amazon}, which provides applications and full operating systems as services. These frameworks are commonly referred to as market-oriented cloud computing and leverage technologies such as virtual machines (VM) and storage area networks (SAN) to provide end users with ubiquitous access to virtual systems, which may be physically hosted in dynamically changing locations.

A common architecture for market-oriented clouds is mainly based on VM and SAN technology and provides access to virtual systems hosted on dynamic physical resources. End users access a front-end interface that redirects them to their respective VMs, which are dynamically allocated in different physical systems depending on the Service Level Agreement (SLA). The VM data is stored in background SANs, which offer transparent access to data across different systems. A similar approach is taken to provide virtual cloud applications, a cloud service provider runs the application its serving on VM that dynamically migrate across its underlying infrastructure.

Challenges and Issues in Cloud Forensics

Although cloud computing has several benefits to end users, it poses new challenges to digital forensics as regular digital forensic techniques cannot be applied in such environments. In this section we point out the main issues in conducting digital forensics on cloud computing environments and the challenges that have yet to be addressed by current methods.

The collection process of a digital investigation requires physical access to the digital resource, since the compromised or potantially malicious operating system of a digital resource cannot be trusted to provide honest responses. However, in a cloud computing environment it is impossible to determine exactly where a VM (or application) was executed as it dynamically migrates across physical systems. Notice that it is necessary to access implementation specific metadata in order to determine where each VM (or application) is running and to correlate a VM (or application) to its user. Thus, an investigator cannot perform live collection and extract data such as operating system state or RAM memory contents.

Deeper issues are present in the storage back-end, which is usually composed by one or more storage area networks with distributed disk drives. Once again, it is impossible to determine where data pertaining to a specific application or VM is physically stored. Moreover, even if one can track the exact disk array that stores such data, it is impractical to forensically reconstruct disk array data without accessing the controller and potentially compromised metadata. This renders cloud filesystem forensics virtually impossible.

Network forensics is also severely affected by the inherently dynamical and massive nature of cloud computing environments. It is clearly infeasible to capture all traffic originated and directed at a cloud environment. Furthermore, the dynamically changing allocation of VMs makes it impossible to track the exact origin of network traffic and activities inside the cloud environment.

A Framework for Cloud Forensics

Bearing in mind the various issues in cloud computing forensics, we describe a potential framework for digital forensics in cloud computing environments that addresses the main evidence collection requirements. The objective here is to establish some guidelines towards efficient cloud forensics analysis.

The main concept of this framework is to collect relevant evidence as it is generated by cloud based applications and aggregate it at central system for further analysis. It addresses three aspects of digital forensics evidence collection: filesystem, network and operating system state. It is completely software based, being composed by an evidence broker middleware running on each physical resource and a central evidence repository. We consider that each evidence broker has previously registered a digital certificate with the evidence broker (which also acts as a CA), subsequently using this certificate to sign evidence data.

The evidence brokers run between the VMs or cloud applications and their underlying physical resources, intercepting any important filesystem, network, or operating system operations. They can be implemented with minor modifications to the underlying infrastructure through techniques such as API hooking and loadable kernel modules. Once an operation considered to be relevant evidence is performed, the evidence broker captures data on this operation. It then computes a suitable hash (SHA-1 or the more recent SHA-2) of the captured data and creates an evidence ticket composed by the captured data, its hash, a time stamp and a description (specifying whether the evidence pertains to a filesystem, network or operating system event). The evidence ticket is then signed and sent to the evidence repository.

Filesystem evidence is collected for each operation performed on a virtual filesystem hosted at the cloud environment. The captured data consists in a description of the operation and its parameters, which can be directly obtained from the filesystem system call. The actual data being recorded is not captured, since it would be impractical to transfer and store such a large volume of data.

In order to collect network traffic, the evidence brokers use network intrusion detection techniques to identify attack signatures and patterns that may indicate relevant activities. The packets that correspond to a given signature or pattern are fully captured (including their payload) and sent it the evidence ticket.

Operating system information is constantly captured by the evidence brokers and sent to the evidence repository. Evidence brokers keep track of sensitive OS security alerts, processes, changes in user's databases and previously determined log files. Any activity concerning these items is captured and sent to the evidence repository.

Upon receiving an evidence ticket, the evidence repository first verifies its authenticity, rejecting it if the verification fails. Otherwise, it verifies its integrity and stores the evidence in a central database if it was not altered. The hash is used as index field that uniquely identifies the piece of evidence and allows fast searches. Moreover, the database record contains the time stamp and description of the event to which the evidence is related, allowing analysis mechanisms to construct time lines. The evidence repository must be hosted on infrastructure independent from the cloud environment in order to ensure preservation and reliability of evidence.

Conclusion

The growing adoption of cloud computing based applications and services has rendered ineffective current digital forensics methods. The virtual decentralized nature of cloud computing resources makes it impossible to apply regular investigation and forensic techniques, raising the need for new methodologies adapted to this paradigm. The digital forensics and investigation framework for cloud environments described in this article aims at collecting and aggregating the necessary forensic data as it is generated by the applications and underlying infrastructure. It seems that this framework can be completely implemented in software with minor alterations to the underlying infrastructure and operating systems. Furthermore, being application agnostic, it may be readily used with current applications. However, it has not been implemented, and is still waiting for any adventurous talented programmers out there that would take on the challenge.

Hora da T.I.

Páginas

quarta-feira, 18 de maio de 2011

Cloud Forensics: Challenges and Possible Solutions

Nenhum comentário:

Postar um comentário