When using a public communication channel -- whether formal or informal, such as commenting or posting on social media -- end users have no expectation of privacy: they compose a message and broadcast it for the world to see. Even if an end user takes utmost precautions to anonymize their online presence -- using an alias or pseudonym; masking their IP address; spoofing their geolocation; concealing their operating system and user agent; deploying encryption; registering with a disposable phone number or email; disabling non-essential settings; revoking permissions; and blocking cookies and fingerprinting -- one obvious element still lingers: the message itself. Assuming they avoid lapses in judgment or accidental self-exposure, there should be little evidence to validate their actual identity, right? Wrong. The content of their message -- necessarily open for public consumption -- exposes an attack vector: stylometric analysis, or author profiling. In this paper, we dissect the technique of stylometry, discuss an antithetical counter-strategy in adversarial stylometry, and devise enhancements through Unicode steganography.
翻译:当使用公共通信渠道——无论是正式还是非正式的,例如在社交媒体上发表评论或帖子——终端用户并不期望隐私:他们撰写信息并向全世界广播。即使终端用户采取最高级别的预防措施来匿名化其在线存在——使用别名或化名;掩盖IP地址;伪造地理位置;隐藏操作系统和用户代理;部署加密;使用一次性电话号码或电子邮件注册;禁用非必要设置;撤销权限;以及阻止Cookie和指纹识别——一个明显的元素仍然存在:信息本身。假设他们避免了判断失误或意外的自我暴露,那么验证其真实身份的证据应该很少,对吗?错了。他们信息的内容——必然公开供公众消费——暴露了一个攻击向量:文体计量分析,或作者画像。在本文中,我们剖析了文体计量技术,讨论了对抗性文体计量中的对立反制策略,并通过Unicode隐写术设计了增强方法。