Scalable Distributed Data Anonymization
In this demo paper is presented an approach for enabling a distributed anonymization process over large collections of sensor data. The described approach anonymizes large datasets (which might not fit in main memory) using an arbitrary number of workers within the Spark framework. We describe how to parallelize the anonymization process through a proper partitioning of the dataset. The experimental evaluation performed shows that the proposed approach is scalable and do not affect the quality of the anonymized dataset.
Authors:
Sabrina De Capitani di Vimercati, Dario Facchinetti, Sara Foresti, Gianluca Oldani, Stefano Paraboschi, Matthew Rossi, Pierangela Samarati