Abstract:
Machine learning (ML) based classifiers that take a privacy policy as the input and predict relevant concepts are useful in different applications such as (semi-)automated compliance analysis against requirements of a specific data protection law such as the EU GDPR. Although many researchers have studied ML-based privacy policy concept classifiers, we observed multiple research gaps, e.g., the lack of a more complete GDPR taxonomy and the less consideration of hierarchical information in privacy policies. The speaker will introduce a recent work which fills such research gaps. The speaker and his collaborators produced a more complete GDPR-oriented privacy policy concept taxonomy, constructed the first privacy policy corpus with explicitly hierarchical information at three levels, and conducted the most comprehensive performance evaluation study of GDPR concept classifiers for privacy policies, covering many aspects that have not been studied systematically. Their work led to multiple findings and insights, including the usefulness of considering hierarchical contextual features and different hierarchical structures, the observation that a “one size fits all” approach may not work, the reduced performance of such classifiers on their newly constructed corpus especially after the first level, and the necessity to split the training and testing sets by documents.
The talk is based on the following research paper:
Peng Tang, Xin Li, Yuxin Chen, Weidong Qiu, Haochen Mei, Allison Holmes, Fenghua Li and Shujun Li (2026) A Comprehensive Study on GDPR-Oriented Analysis of Privacy Policies: Taxonomy, Corpus and GDPR Concept Classifiers. Accepted to IEEE Transactions on Dependable and Secure Computing in March 2026, in press with IEEE. Preprint available at arXiv.org (https://doi.org/10.48550/arXiv.2410.04754) and the artefacts are available on GitHub (https://github.com/tp-sh/GDPR_privacy_policies).
Bio:
Shujun Li is Professor of Cyber Security and Director of Institute of Cyber Security for Society (iCSS) at the University of Kent, UK. His main research interests are interdisciplinary topics related to cyber security and privacy, practical applications of different AI techniques, multimedia computing, human factors, and a wide range of socio-technical aspects such as cybercrime, cyber law and governance, AI safety, cyber education, digital health, and creative approaches to cyber security. More about his research work and other activities can be found on his personal website https://www.hooklee.com/.
Please use the link below to join this event online:
https://teams.microsoft.com/meet/35984532878605?p=spk2jPjgPxCXxnImZ3
Meeting ID: 359 845 328 786 05
Passcode: sn9Fj3jQ