Secure and Scalable Document Similarity on Distributed Databases: Differential Privacy to the Rescue
| Author: | Schoppmann, P., Vogelsang, L., Gascón, A., & Balle, B. |
| Published in: | Proceedings on Privacy Enhancing Technologies, 2020(2), 209–229 |
| Year: | 2020 |
| Type: | Academic articles |
| DOI: | 10.2478/popets-2020-0024 |
Privacy-preserving collaborative data analysis enables richer models than what each party can learn with their own data. Secure Multi-Party Computation (MPC) offers a robust cryptographic approach to this problem, and in fact several protocols have been proposed for various data analysis and machine learn- ing tasks. In this work, we focus on secure similarity computation between text documents, and the application to k-nearest neighbors (k-NN) classification. Due to its non-parametric nature, k-NN presents scalability challenges in the MPC setting. Previous work addresses these by introducing non-standard assumptions about the abilities of an attacker, for example by relying on non-colluding servers. In this work, we tackle the scalability challenge from a different angle, and instead introduce a secure preprocessing phase that reveals differentially private (DP) statistics about the data. This allows us to exploit the inherent sparsity of text data and significantly speed up all subsequent classifications.
| Visit publication |
| Download Publication |

Connected HIIG researchers
Phillipp Schoppmann
- Open Access
- Peer Reviewed
