Monday, June 3, 2019

A GPDR-compliant solution to protect user data in the cloud environment (Identity-as-a-Service demo)

What is Identity-as-a-Service?

Personal Identifiable Information (PII) is information about a person such as name, addresses, health records, tax number, etc. In recent years, users may store their PII in the cloud environment so that cloud services may access and use it on demand. In Figure 1, users store their PII in Salesforce so that cloud services on Salesforce can work on it.


Figure 1: Users store their PII in Salesforce

Why Identity-as-a-Service?

Gartner predicts, by 2020, 70% of all businesses will use user identities to control access to their services. For example, a user Bob can buy a DVD if he can prove that he is over 18 years old. In this case, Bob uses his "age" to access the DVD online service.

Identity-as-a-Service (IDaaS) is a trusted service provider that provides user identity to a cloud service on demand.

What is the main issue of Identity-as-a-Service?

The main issue is user privacy. Facebook is an example of a public Identity Provider that collects PII about users. According to the Facebook data scandal in early 2018 [1], an application was allowed to collect PII of 50 million users for “academic” use but gave the collected data further to a company, Cambridge Analytica, for “analysis” purpose. This example shows that users typically disclose their identities with a frontend service. However, the frontend service may consume other backend services in a business-to-business relationship. In general, even if cloud services specify their privacy policies, we cannot guarantee that they follow their policies and will not (accidentally) transfer PII to another party.

What is the solution?

In the following video, we show our implementation result at the University of Plymouth, how Identity-as-a-Service can protect user privacy.



In the first one minute: 

We show a use case, whereby PII is fully disclosed to a frontend service (e.g., a shopping service). In the backend, the shopping service calls a delivery service (to ship a product) and fully disclose PII to the delivery service.

From minute 1 - 3:

  1. Users encrypt their data with specific "purposes" (e.g., purchase, delivery) and "time" (e.g., 14 days). 
  2. IDaaS distributes the resulting ciphertext to all services that need it. 
  3. The shopping service can decrypt the user "birthday" to proceed a "purchase" order. The delivery service can decrypt the user "address" for the "delivery" purpose but nothing more. It means, a service can decrypt PII if it has the correct "purpose" and access it in a given "time".
  4. After the authentication token is expired (i.e., the current business transaction completed), the shopping service cannot decrypt the ciphertext anymore (even it was authorised to decrypt it before).
  5. After 14 days, the delivery service also cannot decrypt the user "address" anymore (i.e., the ciphertext is expired).

Advantages

In comparison to existing work in the past 10 years. our solution is compliant with the General Data Protection Regulation and involves the least user interaction to prevent identity theft via the human link. We protect the confidentiality of PII over both frontend and backend services, and against untrusted hosts. The implementation can be easily adapted to existing Identity Management systems, and the performance is fast.

Future work

In Internet Of Things and Machine Learning, the machines talk to each other and process user data without user interaction. We think our solution is also useful in these areas.

Implementation details


Reference

[1] Cadwalladr, C.; Graham-Harrison, E. Revealed: 50 million Facebook Profiles Harvested for Cambridge Analytica in Major Data Breach. Available online: https://www.theguardian.com/news/2018/mar/17/cambridge-analytica-facebook-influence-us-election (accessed on 17 Mar 2018).

Sunday, March 25, 2018

A solution for Facebook Misused Applications

Recently, Cambridge Analytica, a data analytics company, collected more than 50 million Facebook accounts without user consent. When we read about this incident, we may question ourselves if it is safe to put our personal data on Facebook, or in general on the Cloud. How can we control our data after we uploaded them to iCloud? At the University of Darmstadt & the University of Plymouth, we have investigated in this kind of incident and researched a solution since 3 years. This topic will give you some details.


Figure 1: Facebook mised applications

Who fault was it?


The story began with an application developed by Aleksandr Kogan from the University of Cambridge. His application collected personal data from the participants in a personality test. The participants agreed to have their data collected for academic use. However, their "friend" connections on Facebook are also available to the application. As a result, the application had access to 50 million accounts. In the end, Kogan sold this data further to the company Cambridge Analytica (Figure 1).

To understand who was fault, we will first give you an overview about the EU Data Protection Directive as follows:

EU Data Protection Directive


In short, when a service provider (or an application) collects personal data, it has to state clearly (to the users) for which purposes their data will be used. Then user data will be used for the same purposes. After the purposes are fulfilled, the service have to delete the data. In Figure 2, a user Bob submits his "address" to "purchase" a product in Germany. A shopping application can use his data to deliver the package, but a "marketing" application are not allowed. After delivering the product, the service must delete Bob's data.


Figure 2: An example of EU Data Protection Directive

The EU General Data Protection Regulation (GDPR)


The protection directive has been out there since 1995. However, the EU Commission let each member intepretate and implement the directive differently. On May 2018, the directive will be updated for the first time. It is now a regulation, not a directive anymore. In the following, we will name some interesting differences:

Previously, a service provider do not need to protect a phone number without an associated name or address. Now if the service provider stores a phone number from a user, it must protect this information as well.
A service provider must report the data breaches within 72 hours to a supervisory and to the user. In EU, we now have one supervisory authority that controls data breaches access the entire union.
An organisation have to pay 4% of their global turnover or 20 Million EUR if user data is breached.
Any companies who sell or receive data from the EU citizens will be affected.
Any companies store data of the EU citizens outside of the EU or transfer it to another country will be affected.
 Are we ready?

The regulation will be effective in May this year. The question is, if the companies in EU have any implementations that are complaints to the GDPR? Unfortunately not. How can they deal with this upcoming regulation?

In our Facebook example, Facebook gained access to Kogan in a legitimate way (i.g., for "research" purpose) and through the proper channels (i.e., users accepted it). But Kogan forwarded it to a different company which used the data for a different purpose (i.g., commercial "analysis"). Also, Kogan did not delete the data, after the purpose was fulfilled according to the law.

In Figure 3, after we allowed an application to access our data, we will loose control. It means we cannot control that our data will be used correctly, and the application will not forward our data to another company without our consent. In short, the traditional authorization system so far (that is based on roles or based on an explicit application) is not enough.


Figure 3: We will loose control for our data to an application on Facebook after we click "OKAY"

Solution


At the university, we have developed a trusted Identity Provider that supports mobile users to encrypt their data with a disclosure policy. The encryption is based on "purpose", "time", "domain", and "country".

For example: A user Bob uses Kogan's research application (i.e., "research" is a purpose condition). Bob wants to make sure that an administrator of Kogan's application cannot read his data on the server. Also, Kogan cannot forward Bob's data further to a partner company in China (i.e., "europe" is a country condition). If Kogan forwards Bob's data to Cambridge Analytica for commercial "analysis", Facebook will be awared of the new purpose. After 2 months, the Kogan's application cannot decrypt Bob's data anymore (i.e., limited access time is a condition). As a result, Facebook does not need to tell Kogan: "please do delete user data!".

In the example, our service makes sure that Bob's data, which is collected for "research" purpose hosted within "europe", cannot be decrypted for either "analysis" purpose or outside this union or after 2 months. In short, this is an implementation that is completely compliant to the EU Protection Directive. We may deliver a plugin that any mobile users, companies, governments can install and use this service on demand.

We have a working prototype and expect to complete at the end of this year.

For more information, please read our publication in IEEE or contact me.
Reference: https://www.researchgate.net/publication/323869635_Privacy-preserving_user_identity_in_Identity-as-a-Service

Monday, April 21, 2014

Automat openSack deployment on vagrant for development and reference test system

When I joined Deutsche Telekom more than one year ago, I had to share a common reference test system with everyone in the rooms, including all operators and developers. This is quite troublesome when you have new ideas to test without interfering anyone and also make sure that your experiments will not break things down and make your colleagues angry.

Figure1: a local integration test for experiment on new features
Like any development process, a local integration test system is required. It must support developers editing and debugging openStack on the fly, as well as operators or packaging-manager testing openStack packages. It's also nice to reset the test system from dirty changes and provision it again as fast as possible. This post introduces such system and now available upstream [1].

1. Overview of the vagrant openstack project

Figure 2: deployment of openStack by vagrant
Vagrant is responsible for bringing the VMs up, setting up host-only networks within Virtual Box. From now on there are two ways to deploy openStack depends on your needs. For development purpose, openStack is deployed by devstack. For the purpose of testing packages, a puppet is in use. The two deployments are configurable in a global file.

From my personal use case, I always need to switch between the 2 deployments: puppet for testing packages and devstack for coding. Switching between the two is also supported to keep the previous deployment save, separated and reuse.

1.1 Networking

Back to that time I only found projects that deploy all openStack components in one VM. This does not satisfy our needs because the all-in-one deployment does not reflect the behavior of the GRE data network within different openStack components. Figure 2 above shows control, compute and neutron node along with the 3 host-only networks for management, data GRE, and public network are brought up automatically.

Figure 3: SNAT for testing floating ips
In such testing environment, you also need to test the floating ips of the VMs over the public network. Well, it would be extremely boring if the nova booting VMs cannot connect to the Internet. For this reason, figure 3 shows how packages from inside the neutron node go out and back. Packages coming from br-tun, br-int, go to br-ex on neutron node, are forwarded to the NAT interface (vboxnet0) and SNATed so that they can find the way to go back.

1.2 Storage

For a simple nova volume setup, iSCSI is chosen by default. The VBOXManage command is very useful in this case to create a vdi storage and attach to the control node.

Of course not forget to format the storage, and create a volume group cinder-volumes for cinder [2].

2 Deployment environments

2.1 puppet

A VM puppetmaster is up with puppetdb installed. It pulls manifests from a configurable git repository to the directory /opt/deploy inside the vm and use these manifests to deploy openStack on the other VMs. By default manifests in [3] is provided as an example to try out the new Icehouse release with ML2 plugin and l2 population. You can also provide your own manifests by configuring a puppet repository and which site.pp to use for the nodes definition:


2.2 devstack

I like the deployment whereby provisioning script is provided directly inside the vm. For this reason, puppet master for deployment devstack is not necessary. Insteads devstack is directly cloned and setup inside all VMs. Devstack is also config to use the .pip repository of openStack [4]. Follow this article to use the remote-debugging that already prepared in this environment.

3. Performance boost

One issue is the long deployment time, especially if you have a low connection or connection drops in the middle of the deployment. So I tried out all tiny possibilities to reduce the time consuming.

2.1 Caching


When a VM is destroy and up again, it must download all packages from scratch. A simple solution for caching is implemented which cuts the deployment time by half. It's even more faster for a second deployment, since all packages and the glance image are cached for further use so internet access is not necessary.

Caching is supported for both environments: all .deb packages installed by puppet, as well as all .pip packages installed by devstack are cached and shared between VMs. The tables below just gives a clue how much time we can save for bringing up the machines with cache enabled (Internet download speed 4Mbit/sec, each vm 1cpu, 1024 ram).

Puppet deployment in secs
node no cache with cache
control 312 227
compute 110 83
neutron 109 62
total 532 ~230 (in parallel) win 5 min







Devstack deployment in secs
node no cache with cache
control 766 655
compute 764 341
neutron 224 208
total 1754 ~660 (in parallel) win 18 min







To test a custom package, simply replace it under the cache folder and bringing up new VMs.

2.2 Customizing your vagrant box

In additional to reduce the vagrant up time, a vagrant box is customized with packages pre-installed. The box is based on precise64 with packages such as VBox Guest Additions 4.3.8, puppet, dnsmasq, r10k, vim, git, rubygems, msgpack, lvm2 pre-installed. The box is also zero out all empty spaces and white out all logs to have a minimum size as possible (378 Mb). This cuts down 70 secs for each vm up (from 79 secs to 8 secs).

[1] vagrant openStack project