They are all coming to the governmentagencies such as US State Department for dissemination to many stakeholders, to make sure ofsecurity of classified information cyber data, user data,
Trang 1Technology Final Report
Secure/Resilient Systems and
Data Dissemination/Provenance
September 2017
Prepared forThe Northrop Grumman Cyber Research Consortium
As part of IS Sector Investment Program
Prepared byBharat BhargavaCERIAS, Purdue University
Trang 2Table of Contents
1 Executive Summary 3
1.1 Statement of Problem 3
1.2 Current State of Technology 4
1.3 Proposed Solution 5
1.4 Technical Activities, Progress, Findings and Accomplishments 18
1.5 Distinctive Attributes, Advantages and Discriminators 23
1.6 Tangible Assets Created by Project 24
1.7 Outreach Activities and Conferences 25
1.8 Intellectual Property Accomplishments 26
2 General Comments and Suggestions for Next Year 26
List of Figures Figure 1 High-level view of proposed resiliency framework … ……… ……….……….……….5
Figure 2 Service acceptance test ……… ……… 7
Figure 3 View of space and time of MTD-based resiliency solution….……… … 9
Figure 4 Moving target defense application example…….…… ……… 10
Figure 5 High-level resiliency framework architecture.……… … 11
Figure 6 System states of the framework….……….12
Figure 7 Data d 1 leakage from Service X to Service Y….………14
Figure 8 Data Sensitivity Probability Functions ……… ………………15
Figure 9 Encrypted search over database of active bundles (by Leon Li, NG “WAXEDPRUNE” project) ……16
Figure 10 Experiment Setup For Moving Target Defense (MTD)………20
Figure 11 EHR dissemination in cloud (created by Dr Leon Li, NGC) ……… 21
Figure 12 AB performance overhead with browser's crypto capabilities on / off …… ……… 22
Figure 13 Encrypted Search over Encrypted Database ……… ……… 23
List of Tables Table 1 Executive Summary……….……… ….1
Table 2 Operations supported by different crypto systems … ……… 15
Table 3 Moving Target Defense (MTD) Measurements…… ……….21
Table 4 Encrypted Database of Active Bundles Table ‘EHR_DB’ ……… 22
Trang 31 Executive Summary
Title Secure/Resilient Systems and Data Dissemination/Provenance
Author(s) Bharat Bhargava
Principal Investigator Bharat Bhargava
zero-The volume of information and real time requirements have increased due to the advent ofmultiple input points of emails, texts, voice, tweets They are all coming to the governmentagencies such as US State Department for dissemination to many stakeholders, to make sure ofsecurity of classified information (cyber data, user data, attack event data) so that it can beidentified as classified (secret) and disseminated based on access privileges to the right user in aspecific location on a specific device For forensics/provenance, the determination of the identity
of all who have accessed/updated/disseminated the sensitive cyber data including the attackevent data is needed There is need to build systems capable of collecting, analyzing and reacting
to dynamic cyber events across all domains while also ensuring that cyber threats are notpropagated across security domain boundaries and compromise the operation of system
Trang 4Solutions that develop a science of cyber security that can apply to all systems, infrastructure,and applications are needed The current resilience schemes based on replication lead to anincrease in the number of ways an attacker can exploit or penetrate the systems It is critical to
design a vertical resiliency solution from the application layer down to physical infrastructure in
which the protection against attacks is integrated across all the layers of the system (i.e.,
application, runtime, network) at all times, allowing the system to start secure, stay secure and return secure+ (i.e return with increased security than before) [13] after performing its function.
1.2 Current State of Technology
Current industry-standard cloud systems such as Amazon EC2 provide coarse-grain monitoringcapabilities (e.g CloudWatch) for various performance parameters for services deployed in thecloud Although such monitors are useful for handling issues such as load distribution andelasticity, they do not provide information regarding potentially malicious activity in the domain.Log management and analysis tools such as Splunk [1], Graylog [2] and Kibana [3] providecapabilities to store, search and analyze big data gathered from various types of logs onenterprise systems, enabling organizations to detect security threats through examination bysystem administrators Such tools mostly require human intelligence for detection of threats andneed to be complemented with automated analysis and accurate threat detection capability toquickly respond to possibly malicious activity in the enterprise and provide increased resiliency
by providing automation of response actions In addition Splunk is expensive
There are well-established moving target defense (MTD) solutions designed to combat againstspecific threats, but limited when there are exploits beyond their boundaries For instance,application-level redundancy and replication schemes prevent exploits that target the applicationcode base, but fail against code injection attacks that target runtime execution, e.g buffer andheap overflows, and control flow of the application
Instruction set randomization [51], address space randomization [4], randomizing runtime [5],and system calls [6] have been used to effectively combat against system-level (i.e return-oriented/code injection) attacks System-level diversification and randomizations are consideredmature and tightly integrated into some operating systems Most of these defensive securitymechanisms (i.e instruction/memory address randomizations) are effective for their targets,however, modern sophisticated attacks require defensive solution approaches to be deeplyintegrated into the architecture, from the application-level down to the infrastructuresimultaneously and at all times
Several general approaches have been proposed for controlling access to shared data andprotecting its privacy DataSafe is a software-hardware architecture that supports dataconfidentiality throughout their lifecycle [7] It is based on additional hardware and uses atrusted hypervisor to enforce policies, track data flow, and prevent data leakage Applicationsrunning on the host are not required to be aware of DataSafe and can operate unmodified andaccess data transparently The hosts without DataSafe can only access encrypted data, but it isunable to track data if they are disclosed to non- DataSafe hosts The use of a special architecturelimits the solution to well-known hosts that already have the required setup It is not practical to
Trang 5assume that all hosts will have the required hardware and software components in a domain service environment.
cross-A privacy-preserving information brokering (PPIB) system has been proposed for secureinformation access and sharing via an overlay network of brokers, coordinators, and a centralauthority (CA) [8] The approach does not consider the heterogeneity of components such asdifferent security levels of client’s browsers, different user authentication schemes, trust levels ofservices The use of a trusted third party (TTP) creates a single point of trust and failure
Other solutions address secures data dissemination in untrusted environments Pearson et al.present a case study of EnCoRe project that uses sticky policies to manage the privacy of shareddata across different domains [9] In the EnCoRe project, the sticky policies are enforced by aTTP and allow tracking of data dissemination, which makes it prone to TTP-related issues Thesticky policies are also vulnerable to attacks from malicious recipients
1.3 Proposed Solution
We propose an approach for enterprise system and data resiliency that is capable of dynamically
adapting to attack and failure conditions through performance/cost-aware process and data replication, data provenance tracking and automated software-based monitoring & reconfiguration of cloud processes (see Figure 1) The main components of the proposed
solution and the challenges involved in their implementation are described below
Figure 1 High-level view of proposed resiliency framework
Trang 61.3.1 Software-Defined Agility & Adaptability
Adaptability to adverse situations and restoration of services is significant for high performanceand security in a distributed environment Changes in both service context and the context ofusers can affect service compositions, requiring dynamic reconfiguration While changes in usercontext can result in updated priorities such as trading accuracy for shorter response time in anemergency, as well as updated constraints such as requiring trust levels of all services in acomposition to be higher than a particular threshold in a critical mission, changes in servicecontext can result in failures requiring the restart of a whole service composition Advances invirtualization have enabled rapid provisioning of resources, tools, and techniques to build agilesystems that provide adaptability to changing runtime conditions In this project, we will buildupon our previous work in adaptive network computing [10], end-to-end security in SOA [11]and the advances in software-defined networking (SDN) to create a dynamically reconfigurableprocessing environment that can incorporate a variety of cyber defense tools and techniques Ourenterprise resiliency solution is based on two main industry-standard components: The cloudmanagement software of OpenStack [12] – Nova, which provides virtual machines on demand;and the Software Defined Networks (SDN) solution – Neutron, which provides networking as aservice and runs on top of OpenStack
The solution that we developed for monitoring cloud processes and dynamic reconfiguration ofservice compositions as described in [10] involved a distributed set of monitors in every servicedomain for tracking service/domain-level performance and security parameters and a centralmonitor to keep track of the health of various cloud services Even though the solution enablesdynamic reconfiguration of entire service compositions in the cloud, it requires replication,registration and tracking of services at multiple sites, which could have performance and costimplications for the enterprise In order to overcome these challenges, the proposed frameworkutilizes live monitoring of cloud resources to dynamically detect deviations from normal service
behavior and integrity violations, and self-heal by reconfiguring service compositions through software-defined networking of automatically migrated service instances A component of this software-defined agility and adaptability solution is live monitoring of services as described
below
1.3.1.1 Live Monitoring
Cyber-resiliency is the ability of a system to continue degraded operations, self-heal, or deal withthe present situation when attacked [13] We may need to shut down less critical computations,communications and allow for weaker consistency as long as the mission requirements aresatisfied For this we need to measure the assurance level, (integrity/accuracy/trust) of the systemfrom the Quality of Service (QoS) parameters such as response time, throughput, packet loss,delays, consistency, acceptance test success, etc
To ensure the enforcement of SLAs and provide high security assurance in enterprise cloudcomputing, a generic monitoring framework needs to be developed The challenges involved ineffective monitoring and analysis of service/domain behavior include the following:
Identification of significant metrics, such as response time, CPU usage, memory usage,
etc., for service performance and behavior evaluation
Trang 7 Development of models for identifying deviations from performance (e.g., achieving thetotal response time below a specific threshold) and security goals (e.g., having servicetrust levels above a certain threshold).
Design and development of adaptable service configurations and live migration solutionsfor increased resilience and availability
Development of effective models for detection of anomalies in a service domain relies on carefulselection of performance and security parameters to be integrated into the models Modelparameters should be easy-to-obtain and representative of performance and securitycharacteristics of various services running on different platforms We plan to investigate andutilize the following monitoring tools that provide integration with OpenStack in order to gathersystem usage/resiliency parameters in real time [14]:
1 Ceilometer [15]: Provides a framework to meter and collect infrastructure metrics such asCPU, network, and storage utilization This tool provides alarms set when a metriccrosses a predefined threshold, and can be used to send alarm information to externalservers
2 Monasca [16]: Provides a large framework for various aspects of monitoring includingalarms, statistics, and measurements for all OpenStack components Tenants can definewhat to measure, what statistics to collect, how to trigger alarms and the notificationmethod
3 Heat [17]: Provides an orchestration engine to launch multiple composite cloudapplications based on templates in the form of text files that can be treated like code.Enabling actions like autoscaling based on alarms received from Ceilometer
As a further improvement for dynamic service orchestration and self-healing, we plan to
investigate models that are based on a graceful degradation approach for service composition,
which replace services that do not pass acceptance tests as seen in Figure 2 based on specified or context-based policies with ones that are more likely to pass the tests at the expense
user-of decreased performance
Figure 2 Service acceptance test
1.3.2 Moving Target Defense for Resiliency/Self-healing
The traditional defensive security strategy for distributed systems is to prevent attackers fromgaining control of the system using known techniques such as firewalls, redundancy,replications, and encryption However, given sufficient time and resources, all these methods can
be defeated, especially when dealing with sophisticated attacks from advanced adversaries thatleverage zero-day exploits This highlights the need for more resilient, agile and adaptablesolutions to protect systems MTD is a component in NGC project Cyber Resilient System [13].Sunil Lingayat of NGC has taken interest and connected us with other researchers in NGC
Trang 8working in Dayton.
Our proposed Moving Target Defense (MTD) [18, 19] attack-resilient virtualization-based
framework is a defensive strategy that aims to reduce the need to continuously fight againstattacks by decreasing the gain-loss balance perception of attackers The framework narrows theexposure window of a node to such attacks, which increases the cost of attacks on a system andlowers the likelihood of success and the perceived benefit of compromising it The reduction inthe vulnerability window of nodes is mainly achieved through three steps:
1 Partitioning the runtime execution of nodes in time intervals
2 Allowing nodes to run only with a predefined lifespan (as low as a minute) onheterogeneous platforms (i.e different OSs)
3 Proactively monitoring their runtime below the OS
The main idea of this approach is allowing nodes to run on a given computing platform (i.e.hardware, hypervisors and OS) for a controlled period of time chosen in such a manner thatsuccessful ongoing attacks become ineffective as suggested in [20, 21, 22, 23] We accomplish
such a control by allowing nodes to run only for a short period of time to complete n client
requests on a given underlying computing platform, then vanish and appear on a differentplatform with different characteristics, i.e., guest OS, Host OS, hypervisor, hardware, etc Werefer to this randomization and diversification technique of vanishing a node to appear in another
platform as reincarnation
The proposed framework introduces resiliency and adaptability to systems Resilience has two
main components (1) continuing operation and (2) fighting through compromise [13] The MTD
framework takes into consideration these components since it transforms systems to be able toadapt and self-heal when ongoing attacks are detected, which guarantees operation continuity.The initial target of the framework is to prevent successful attacks by establishing short lifespansfor nodes/services to reduce the probability of attackers’ taking over control In case an attackoccurs within the lifespan of a node, the proactive monitoring system triggers a reincarnation ofthe node
The attack model considers an adversary taking control of a node undetected by the traditionaldefensive mechanisms, a valid assumption in the face of novel attacks The adversary gains highprivileges of the system and is able to alter all aspects of the applications Traditionally, theadvantage of the adversaries, in this case, is the unbounded time and space to compromise anddisrupt the reliability of the system, especially when it is replicated (i.e colluding) Thefundamental premise of proposed framework is to eliminate the time and space advantage of theadversaries and create agility to avoid attacks that can defeat system objectives by extending thecloud framework We assume the cloud management software stack (i.e framework) and thevirtual introspection libraries are secure
1.3.2.1 Resiliency Framework Design
The criticality of diversity as a defensive strategy in addition to replication/redundancy was firstproposed in [24] Diversity and randomization allow the system defender to deceive adversaries
by continuously shifting the attack surface of the system We introduce a unified generic MTDframework designed to simultaneously move in space (i.e across platforms) and in time (i.e
Trang 9time-intervals as low as a minute) Unlike the-state-of-the-art singular MTD solution approaches[25, 26, 27, 28, 29, 30], we view the system space as a multidimensional space where we applyMTD on all layers of the space (application, OS, network) in short time intervals while we areaware of the status of the rest of the nodes in the system
Figure 3 illustrates how the MTD framework works The y-axis depicts the view of the space(e.g application, OS, network) and the x-axis the runtime (i.e elapsed time) The figurecompares the traditional replicated systems without any diversification and randomizationtechnique, the state-of-the-art systems [25, 26, 27, 28, 29, 30] with diversification andrandomization techniques applied to certain layers of the infrastructure (application, OS ornetwork) and proposed solution, which applies MTD to all layers
Figure 3 View of space and time of MTD-based resiliency solution
As illustrated in Figure 3.c, nodes/services that are not reincarnated in a particular time-intervalare marked with the result of an observation (e.g introspection) of either Clean (C) or Dirty (D)(i.e not compromised/compromised) To illustrate, in the third reincarnation round with replica
n, we detect replica1 to be clean (marked with C) and replica 2 as dirty as shown in that timeinterval entry with D We reincarnate the node whose entry shows D prior to the scheduled node
in the next time-interval
Two important factors need to be considered in the design of this framework: the lifespan ofnodes or virtual machines, and the migration technique used in the reincarnation Figure 4 shows
a possible scenario in which virtual machines running on a platform become IDLE when anattack occurs and is detected When to and how to reincarnate nodes are our main researchquestions
Trang 10Figure 4 Moving target defense application example
Long periods of times increase the probability of success of an ongoing attack, while too shorttimes impact the performance of the system Novel ways to determine when to vanish a node torun the replica in a new VM need to be developed In [23] VMs are reincarnated in fixed periods
of times chosen using Round Robin or a randomized selection mechanism We propose theimplementation of a more adaptable solution, which uses Virtual Machine Introspection (VMI)
to persistently monitor the communication between virtual requests and available physicalresources and switch the VM when anomalous behaviors are observed
The other crucial factor in our design is the live migration technique used for the virtual machinereincarnation Migrating operating system instances across different platforms have traditionallybeen used to facilitate fault management, load balancing, and low-level system maintenance[31] Several techniques have been proposed to carry out the majority of migration while OSescontinue to run to achieve acceptable performance with minimal service downtimes We propose
to integrate some of these techniques [31, 32, 33] in a clustered environment to our MTDsolution to guarantee adaptability and agility in our system When virtual machines are runninglive services, it is important that the reincarnation occurs in such a manner that both downtimeand total transfer time are minimal The downtime refers to the time when no service is availableduring the transition The total transfer time refers to the time it takes complete the transition[31] Our main idea is to continue running the service in the source VM until the destination VM
is ready to offer the service independently In this process there will be some time where part ofthe state (the most unchangeable state information) is copied to the destination VM while thesource VM is still running At some point, the source VM will be stopped to copy the rest of theinformation (the most changeable state information) to the destination VM, which will takecontrol after the information is copied No service will be available when the source VM isstopped and the copying process has not been completed This period of time defines thedowntime
1.3.2.2 Resiliency Framework Infrastructure
Our framework will be built on top of OpenStack cloud framework [12], a widely adopted opensource cloud management software stack Figure 5 shows the high-level architecture of our
Trang 11framework on top right and OpenStack cloud framework on the bottom left
Figure 5 High-level resiliency framework architecture
In the cloud framework, starting from the infrastructure at the bottom layer is the hardware.Each hardware has a host OS, a hypervisor (KVM/Xen) to virtualize the hardware for the guestVMs on top of it, and the cloud software stack framework, OpenStack [12] in our case Thevertical bars are some of the OpenStack framework implementation components: nova, neutron,horizon, and glance In addition, the libvmi library for virtual introspection interfaces with thelibvirt library, which is used by the hypervisor for virtualization This library allows us tointercept the resource-mapping requests (i.e memory) from the VM to the physical availableresource in order to detect anomalous behavior even in the event of VM/OS compromise
We introduce two abstraction layers: a high-level System State (top) and the ApplicationRuntime (bottom), dubbed time-interval runtime To illustrate the system state, we considerDesired as the desired system state at all times, and Undesired as the state we like to avoid (i.e.turbulence, compromised, failed or exit system state) The driving engine of these two high-levelstates is the time-interval runtime indirect outputs depicted as the dotted arrows
The Application Runtime defines the time an application runs in a VM Our frameworktransforms the traditional services designed to be protected their entire runtime (as shown on theguest VMs on the cloud framework) to services that deal with attacks in time intervals asdepicted in Figure 5 (as Time Interval Runtime) Such transformation is simply achieved byallowing the applications to run on heterogeneous OS’s and variable underlying computingplatforms (i.e hardware and hypervisors), thereby, creating a mechanically generated systeminstance (s) that is diversified in time and space The Application Runtime can vary depending
on the detection of anomalous behaviors
The System State and the Application Runtime are two abstraction layers that operate insynchrony At the application layer, we refresh and map one or more Apps/VMs (App1 .Appn) to different platforms (Hardware1 HWn) in pre-specified time intervals, referred astime-interval runtime To gain a holistic view of the high-level system state, we continuously re-evaluate the system state (with libvmi depicted in the horizontal blue arrow) at the end of each
Trang 12interval to determine the current state of the system at that specific time interval
System state is the state of the system at any given time The state changes are dictated by theapplication runtime status For instance, if the application fails or crashes, then the system is inthe failed state Similarly, the system is in a compromised state when the attacker succeeds and is
undetected These two states failed and compromised are under the Undesired category Figure 6
shows the possible different states, where TIRE (Time Interval Runtime Execution) representsthe observation of the current state at the end of the runtime, D the Desired state, C theCompromised state, F the Failed state and E the Exit state
Figure 6 System states of the framework
The key objective of the framework is to start the system in a Desired state and stay in that state
as often as possible This technique implements resiliency in the three phases of the systemlifecycle “Start Secure Stay Secure Return Secure” [13] In the event that the system transitionsinto one Undesired state, a valid assumption in cyberspace, the system bounces back seamlesslyinto the Desired state even in the event of an OS compromise For this, applications run in aspecific VM for a pre-specified time and then are moved to a new one
1.3.2.3 Live Reincarnation of Virtual Machines
Reincarnation is a technique for enhancing the resiliency of a system by terminating a runningnode and starting a fresh new one in its place on (possibly) a different platform/OS which willcontinue to perform the tasks as its predecessor when it was dropped off of the network andreconnected to it One key question is determining when to reincarnate a machine One approach
is setting a fixed period of time for each machine and reincarnating them after that lifespan Inthis first approach machines to be reincarnated are selected either in Round Robin or randomly.However, attacks can occur within the lifespan of each machine, which makes live monitoringmechanisms a crucial element Whether an attack is going on at the beginning of thereincarnation determines how soon the source VM must be stopped to keep the system resilient.When no threats are present both source VM and destination VM can participate in thereincarnation process The source VM can continue running until the destination VM is ready tocontinue On the contrary, in case an attack is detected the source VM should be stoppedimmediately and the reincarnation must consist simply of copying the state of the source to thedestination VM and continue the tasks after the process The latter case presents a higherdowntime than the former case Our target is including adaptability to our system in a manner wecan distinguish between these two cases
Trang 13In a clustered environment the reincarnation of a machine needs to concentrate on physicalresources such as disk, memory and network Disk resources are assumed to be shared through aNetwork-Attached Storage (NAS) so not much needs to be done regarding this Memory andnetwork are the targets of our study There is a tradeoff we need to consider to manage theseresources Stopping the source virtual machine at once to copy its entire address space consumes
a lot of network resources, which negatively impact the performance The decision for themigration should involve consideration of various parameters including the number of modifiedpages that need to be migrated, available network bandwidth, hardware configuration of thesource and destination, load on the source and destination etc We previously proposed a modelbased on mobile agent-based services [52] that migrate to different platforms in the cloud tooptimize the running time of mobile applications under varying contexts By utilizing thestatefulness of mobile agents, the model enables resuming processes after migration to a differentplatform We plan to build on our knowledge in adaptable systems and performance optimization
to design a model for live migration of services and restoration on the destination platform toenable high performance and continuous availability We will focus on the following aspects:
Memory Management: We are interested in developing an adaptable solution that copies
the memory state to the destination VM while still running the source VM when noattacks are detected A daemon will be in charge of this task Initially the entire memoryspace is copied to the destination and later just dirty pages are copied when thedestination VM is ready to take over This guarantees a much smaller downtime whenmachines are reincarnated because of their lifespan expiry (no attacks detected)
Network Management: The new virtual machine must be set up with the same IP address
as its predecessor All other machines in the cluster must be aware of this change, whichcan be achieved by sending ARP replies to the rest of the machines
1.3.2 Data Provenance and Leakage Detection in Untrusted Cloud
Monitoring data provenance and adaptability in data dissemination process is crucial for resilience against data leakage and privacy violations in untrusted cloud environments Our
solution ensures that each service can only access data for which it is authorized In addition tothat, our approach detects several classes of data leakages made by authorized services to
unauthorized ones Context-sensitive evaporation of data in proportion to the distance from the
data owner can make illegitimate disclosures less probable, as distant data guardians tend to beless trusted
W extended our “WaxedPrune” prototype, demonstrated at Northrop Grumman Tech Fest 2016,
to support automatic derivation of data provenance through interaction of the Active Bundle (AB) engine [34] with the central cloud monitor responsible for automated monitoring and
reconfiguration of cloud enterprise processes Extension [50] supports detection of several types
of data leakages, in addition to enforcing fine-grained security policies Some of the ideas wereproposed by Leon Li (NG) during our weekly meetings
1.3.3.1 Data Leakage Detection
In the current implementation of the secure data dissemination mechanism, data is transferred
Trang 14among services in encrypted form by means of Active Bundles (AB) [40, 41] This preserving mechanism protects against tamper attacks, eavesdropping, spoofing and man-in-the-middle attacks Active Bundles ensure that a party can access only those portions of data it isauthorized for However, authorized service may leak sensitive data to unauthorized one.Leakage scenario is illustrated on Fig 7 Service X, who is authorized to read data d1, canleak this data item behind the scene to an unauthorized service, e.g to Service Y This leakageneeds to be detected and reported to data owner Let us denote data D = {d1, d2, … dn}; set ofpolicies P = {p1, p2, … pk} Data leakage can occur in two forms:
privacy-1) Encrypted data got leaked, i.e the whole AB (the whole Electronic Health Record)
Active Bundle data can only be extracted by Active Bundle’s kernel after authentication ispassed and access control policies are evaluated by Policy Enforcement Engine If AB is sent byService X to unauthorized service Y behind the scene, then Service Y won’t be able to decryptdata it is not authorized for When Y tries to decrypt d1 , before decryption the AB kernel willquery CM in order to check whether d1 is supposed to be at Y’s side If not then data leakagealert will be raised
In addition to CM, enforcing data obligations, there is one more embedded protection measurethat relies on digital watermarks, embedded into Active Bundles, that can be verified by webcrawlers If attacker uploads the illegal content to publicly available web hosting that can bescanned by web crawlers, then web crawler can verify the watermark and can detect copyrightviolation However, this approach is only limited to cases when attacker uploads unauthorizedcontent to publicly available folder in the network
Figure 7 Data d 1 leakage from Service X to Service Y
2) Plaintext data got leaked
If Service X, who is authorized to see d1 , takes a picture of a screen and sends it via email toService Y, who is not authorized to see d1, then our solution relies on visual watermarksembedded into data Visual watermark that will be visible on a captured image, can be used inthe court to prove data ownership
The key challenge here is that it is hard to come up with an approach that covers all possiblecases of data leakage Embedded watermarks can help to detect leakage of document, but ifwatermarks are removed then protection is gone For instance, if a service gets access to a creditcard number, then writes it down on a piece of paper in order to remember it and then leaks it viaemail to an unauthorized party In this case, there are several ways to mitigate the problem:
Layered approach: Don't give all the data to the requester at once
Trang 15o First give part of data (incomplete, less sensitive)
o Watch how data is used and monitor trust level of using service
o If trust level is sufficient – give next portion of data
Raise the level of data classification to prevent leakage repetition
Intentional leakage to create uncertainty and lower data value
Monitor network messages
o Check whether they contain e.g credit card number that satisfies specific patternand can be validated using regular expressions
After leakage is detected, make system stronger against similar attacks
o Separate compromised role into two: e.g suspicious_role and benign_role
o Send new certificates to all benign users for benign role
o Create new AB with new policies, restricting access to suspicious_role (e.g to all
doctors from the same hospital with a malicious one)
o Increase sensitivity level for leaked data items, i.e for diagnosis
Data Leakage Damage Assessment
After data leakage is detected damage is assessed based on:
• To whom was the data leaked (service with low trust level vs service with high level of trust)
• Sensitivity (Classification) of leaked data (classified vs unclassified)
• When was leaked data received
• Can other sensitive data be derived from the leaked data (i.e diagnosis can be derived from leaked