Skip to content

Secure and Scalable Document Similarity on Distributed Databases: Differential Privacy to the Rescue

Author: Schoppmann, P., Vogelsang, L., Gascón, A., & Balle, B.
Published in: Proceedings on Privacy Enhancing Technologies, 2020(2), 209–229
Year: 2020
Type: Academic articles
DOI: 10.2478/popets-2020-0024

Privacy-preserving collaborative data analysis enables richer models than what each party can learn with their own data. Secure Multi-Party Computation (MPC) offers a robust cryptographic approach to this problem, and in fact several protocols have been proposed for various data analysis and machine learn- ing tasks. In this work, we focus on secure similarity computation between text documents, and the application to k-nearest neighbors (k-NN) classification. Due to its non-parametric nature, k-NN presents scalability challenges in the MPC setting. Previous work addresses these by introducing non-standard assumptions about the abilities of an attacker, for example by relying on non-colluding servers. In this work, we tackle the scalability challenge from a different angle, and instead introduce a secure preprocessing phase that reveals differentially private (DP) statistics about the data. This allows us to exploit the inherent sparsity of text data and significantly speed up all subsequent classifications.

Visit publication
Download Publication

Publication

Connected HIIG researchers

Phillipp Schoppmann

Former Associated Researcher: Data, actors, infrastructures


  • Open Access
  • Peer Reviewed

Explore current HIIG Activities

Research issues in focus

HIIG is currently working on exciting topics. Learn more about our interdisciplinary pioneering work in public discourse.