Book Read Free

Digital Marketplaces Unleashed

Page 37

by Claudia Linnhoff-Popien


  Both with server side managed user profiles and during the process of creating and updating user profiles, user profile data might have to be shared among user devices and with servers. Such data has to be protected by encryption, but at the same time it has to be available for participating processes. In the Ensafer project [18], the focus has been efficient and user friendly secure storing and sharing of data. By combining end‐to‐end symmetric and public key encryption, smart encryption key management, cloud storage, and user controlled sharing, the user is put in charge of her own private information. The Ensafer approach can be used to provide fine‐grained user‐controlled access to user‐profile data. The user can enforce explicit control to what user‐profile data that should be available for a given service. In principle, this means that a copy of a subset of a user’s user‐profile is encrypted with the use of the service provider’s public key. This copy can only be accessed by this service provider and is used to personalize the service for the user.

  26.4 Privacy Preserving Processing

  Processing and mining of user data creates and updates the user profiles. User privacy concerns influence how and where this processing is performed, and can be performed on server side, at user devices, and/or in a distributed fashion.

  In a personalization system, user profiles are important since they identify interests, behavior and other characteristics of individual users. Data processing in a personalization system involves analysis of user profiles, content models (which models the content that users may be interested in) and also social or community‐based information describing preferences for groups of users.

  For personalization systems, we can distinguish between individualized, community‐based and aggregate‐level personalization [12, 13]. In individualized personalization, tailoring of content is based on information about each individual user as it is found in the user profile, and perform content filtering by comparing user profile to content models. In community‐based personalization, the relevance of a certain item is determined based on information of preferences among groups of similar users. Collaborative filtering is an example of this approach. In aggregate‐level personalization, tailoring of content is based on collective usage data and not on an individual user profile. This provides a general recommendation, based on for example the most popular films or books, and not a recommendation that is explicitly tailored to specific interests of the user.

  There are many approaches to reduce privacy risks in personalization systems. A number of them are surveyed in [19, 20]. We will in the following present some privacy preserving techniques, and describe their relevance to data processing performed in user profiling and personalization systems.

  26.4.1 Client‐Side Processing

  Client‐side personalization [21], is assumed to give the user better control over their personal data, since data is collected and stored on a client‐side device. The user can then determine how and when to use this data for personalization. This is closely related to scrutability, that emphasizes the user’s ability to understand and control what is included in the user profile, and what is made available to different personalization systems [19, 22].

  Another advantage of the client‐side approach is the ability to collect, on a single device, a wide range of user interest and behavior data form multiple applications. Client‐side personalization will, on the other hand, complicate community‐based personalization, such as collaborative filtering, since usage data from a group of users is more difficult to obtain.

  Applications on a client device, like a mobile phone, are typically executed in separate security domains or sandboxed. To achieve application independent user profiling with client‐side only processing, a way to share data between applications has to be available [15]. This might also include applications and data in secure elements on the device [23]. What applications or service (background process) performing the maintenance of the user profile is also of interest. Is each application responsible for maintaining its sub (but potentially overlapping) parts of the user profile in a shared database, or is a service running in the background collecting data from all registered applications? Access to a shared resource has to be controlled even on a personal mobile device. A user might install applications and services that she prefers to use but only share a limited or no user profile data with.

  Updating the client‐side user profile is based on application requests to access and update user profile data, and the context. Context is a collection of information that describes a given situation. It can include sensor data (e. g. temperature, GPS location, and proximity), calendar data, registered events, and so on. As illustrated in Fig. 26.3, on the client a process maintaining the user profile, called the UP manager, collects and analyze all this data to create and update the user profile. The UP manager can participate in a distributed computation involving other devices or servers to perform its task.

  Fig. 26.3Client side user profile management

  26.4.2 Obfuscation and Anonymization

  Anonymization or pseudonyms

  is a privacy preserving technique that shields the true identity of the user. A system can for example track a pseudonym across different sessions, and provide personalization without knowing the user identity. A system providing user anonymization, typically use some de‐identification solution, such as replacing the user identity with some random number.

  Differential privacy

  is a method for preserving privacy, which ensures that the removal or addition of a data item does not (substantially) affect the outcome of any analysis [24]. It is a statistical approach to learn as much as possible about a group without exposing individual users. Differential privacy typically uses perturbation, where user data are systematically altered or randomized. A commonly used approach to differential privacy is through use of the Laplace mechanism, where noise sampled from the Laplace distribution is added to a computation [20].

  Non‐perturbative

  techniques are also used. In contrast to perturbative techniques, non‐perturbative techniques do not modify the data. They reduce the details by generalization and suppression of data. For example by replacing numerical values with intervals.

  All these obfuscation and anonymization techniques should be used with care. If not, the risk of exposing sensible information about individuals could be high. Examples include, combining several anonymized data‐sets to de‐anonymize the data, and gaining access to individual data when differential privacy is used by repeating the same data request (question).

  When analyzing and maintaining user profile data a combination of these techniques are used. The obfuscation and anonymization techniques described should be considered a part of the toolbox available when performing privacy aware (distributed) processing of user‐sensitive data.

  26.4.3 Privacy Preserving Algorithms

  Secure Multi‐Party Computation (SMC)

  is an algorithmic approach to protect privacy, which allows to gain new knowledge based on user data in a distributed computation, while keeping user data confidential. In principle, an SMC algorithm performs a distributed computation where each node involved access its local data as part of the overall computation, but does not expose any of its local data to the other nodes. In practice, this means that the local data at each node can be involved in and contribute to the computation without being exposed.

  In [25], the usage of SMC to process sensitive health data is discussed. The same approach can be used to process data for personalization. Correlation between two types of data in a distributed data set is a typical example of what we can compute using SMC. Detecting correlation between parameters in a data set, is a common approach to improve and personalize a user application or service. For a horizontal partitioned data an SMC algorithm can be used to calc
ulate the Pearson product‐moment correlation coefficient (Pearson’s r). Pearson’s r is used to measure the correlation (linear dependence) between n samples of two variables x and y, where and are the mean value of x and y, respectively:

  In the case of m user profiles with s j samples of x ji and y ji at each user, r can be rewritten like this:

  For each user j the following three intermediate results have to be calculated:

  When all intermediate values are calculated at each user (node) and collected, Pearson’s r can be calculated:

  The mean values and can be securely calculated using another SMC algorithm. We assume that all messages between the nodes are protected by encryption. The trick to make this calculation secure is how we calculate the three sums in the equation above. A coordinator generates three large random unique values u 0 , , . These values are the initial sum values and they are kept secret at the coordinator. The initial sum values are sent to the first node, who adds its values u 1 , , and to the sums. The first node sends the partial sums to the next node who performs the same calculations. The algorithm continues until the last node performs its calculations and forwards the sums to the coordinator. The coordinator subtracts the initial sum values from the sum to get the real sum values. And these values are used to finally calculate Pearson’s r. Fig. 26.4 illustrates the messages and node calculations when calculating Pearson’s r with three nodes and a coordinator C. The final calculation performed at the coordinator C is the following:

  Fig. 26.4Three nodes and a coordinator C calculating Pearson’s r

  A more complete description of this approach applied to processing of health data is found in [25].

  Homomorphic encryption [26] allows us to perform computation on encrypted data. In principle, this could be a good match for personalization where privacy is a major concern. However, currently such computation is inefficient and does not scale very well.

  26.5 Conclusion

  The creation and maintenance of user profiles involves private and potential sensitive user data. We have in this chapter discussed how to protect this data when providing personalized services. We have also illustrated how Secure Multiparty Computation (SMC) can be used to process user profile data from a number of users without compromising their privacy.

  References

  1.

  M. Gao, K. Liu and Z. Wu, “Personalisation in web computing and informatics: Theories, techniques, applications, and future research,” Information Systems Frontiers, vol. 12, no. 5, pp. 607–629, 2010.Crossref

  2.

  M. Wiesner and D. Pfeifer, “Health recommender systems: concepts, requirements, technical basics and challenges,” International journal of environmental research and public health, vol. 11, no. 3, pp. 2580–2607, 2014.Crossref

  3.

  B. Krulwich, “Lifestyle finder: Intelligent user profiling using large-scale demographic data,” AI magazine, vol. 18, no. 2, p. 37, 1997.

  4.

  S.-H. Min and I. Han, “Detection of the customer time-variant pattern for improving recommender systems,” Expert Systems with Applications, vol. 28, no. 2, pp. 189–199, 2005.Crossref

  5.

  A. S. Das, M. Datar, A. Garg and S. Rajaram, “Google news personalization: scalable online collaborative filtering,” in Proceedings of the 16th international conference on World Wide Web, Banff, AB, Canada, 2007.Crossref

  6.

  I. Guy, N. Zwerdling, I. Ronen, D. Carmel and E. Uziel, “Social media recommendation based on people and tags,” in Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, Geneva, Switzerland, 2010.Crossref

  7.

  R. W. White, P. Bailey and L. Chen, “Predicting user interests from contextual information,” in Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, Boston, MA, USA, 2009.Crossref

  8.

  R. W. White and J. Huang, “Assessing the scenic route: measuring the value of search trails in web logs,” in Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, Geneva, Switzerland, 2010.Crossref

  9.

  M. Harvey, F. Crestani and M. J. Carman, “Building user profiles from topic models for personalised search,” in Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, San Francisco, CA, USA, 2013.Crossref

  10.

  P. N. Bennett, R. W. White, W. Chu, S. T. Dumais, P. Bailey, F. Borisyuk and X. Cui, “Modeling the Impact of Short- and Long-term Behavior on Search Personalization,” in Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, Portland, Oregon, USA, 2012.Crossref

  11.

  M. Balabanovic and Y. Shoham, “Fab: content-based, collaborative recommendation,” Communications of the ACM, vol. 40, no. 3, pp. 66–72, 1997.Crossref

  12.

  M. Gao, K. Liu and Z. Wu, “Personalisation in web computing and informatics: Theories, techniques, applications, and future research,” Information Systems Frontiers, vol. 12, no. 5, pp. 607–629, 2010.Crossref

  13.

  M. R. Ghorab, D. Zhou, A. O’Connor and V. Wade, “Personalised information retrieval: survey and classification,” User Modeling and User-Adapted Interaction, vol. 23, no. 4, pp. 381–443, 2013.Crossref

  14.

  S. Gauch, M. Speretta, A. Chandramouli and A. Micarelli, “User profiles for personalized information access,” in The adaptive web, P. Brusilovsky, A. Kobsa and W. Nejdl, Eds., Berlin Heidelberg NewYork, Springer, 2007, pp. 54–89.Crossref

  15.

  A. Andersen and R. Karlsen, “User profiling through NFC interactions: Mining NFC-based user information from mobile devices and back-end systems,” in Proceedings of the 14th International Symposium on Mobility Management and Wireless Access, Malta, 2016.Crossref

  16.

  S. Panjwani, N. Shrivastava, S. Shukla and S. Jaiswal, “Understanding the privacy-personalization dilemma for web search: a user perspective,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Paris, France, 2013.Crossref

  17.

  M. Alaggan, S. Gambs and A.-M. Kermarrec, “Heterogeneous differential privacy,” arXiv preprint arXiv:1504.06998, 2015.

  18.

  A. Andersen, T. Hardersen and N. Schirmer, “Privacy for Cloud Storage,” in ISSE 2014 Securing Electronic Business Processes: Highlights of the Information Security Solutions Europe 2014 Conference, Springer, 2014.

  19.

  E. Toch, Y. Wang and L. F. Cranor, “Personalization and privacy: a survey of privacy risks and remedies in personalization-based systems,” User Modeling and User-Adapted Interaction, vol. 22, no. 1–2, pp. 203–220, 2012.Crossref

  20.

  A. Friedman, B. P. Knijnenburg, K. Vanhecke, L. Martens and S. Berkovsky, “Privacy aspects of recommender systems,” in Recommender Systems Handbook, Springer, 2015, pp. 649–688.Crossref

  21.

  L. Cassel and U. Wolz, “Client Side Personalization,” in Proceedings of the joint DELOS-NSF workshop on personalization and recommender systems in digital libraries, Dublin, 2001.

  22.

  M. Asif and J. Krogstie, “Mobile client-side personalization,” in Proceedings of the 2013 International Conference on Privacy and Security in Mobile Systems (PRISMS), 2013.Crossref

  23.

  A. Munch-Ellingsen, A. Andersen, S. Akselsen and R. Karlsen, “Customer managed security domain on mobile network operators’ SIM cards: Opportunities to enable new business models,” in Marktplätze im Umbruch: Digitale Strategien und das Zusammenwachsen von Shop, Online-Business sowie Services im Mobilen Internet, Springer, 2015.

  24.

  C. Dwork, “Differential privacy: A survey of results,” in Proceedings of the International Conference on Theory and Applications of Models of Computation, 2008.MATH

  25.

  A
. Andersen, K. Y. Yigzaw and R. Karlsen, “Privacy preserving health data processing,” in Proceedings of the 16th International Conference on e-Health Networking, Applications and Services (Healthcom), 2014.Crossref

  26.

  C. Gentry, A fully homomorphic encryption scheme, Stanford University, 2009.MATH

  Further Reading

  27.

  M. K. L. Z. W. Gao, “Personalisation in web computing and informatics: Theories, techniques, applications, and future research,” Information Systems Frontiers, vol. 12, no. 5, pp. 607–629, 2010.Crossref

 

‹ Prev