SpeakerStephen Clark
DateMay 10, 2012
TitleLinguistic Steganography: Information Hiding in Text

Linguistic Steganography is concerned with hiding information in a natural language text, for the purposes of sending secret messages. A related area is natural language watermarking, in which information is added to a text in order to identify it, for example for the purposes of copyright. Linguistic Steganography algorithms hide information by manipulating properties of the text, for example by replacing some words with their synonyms. Unlike image-based steganography, linguistic steganography is in its infancy with little existing work. In this talk I will motivate the problem, in particular as an interesting application for natural language processing (NLP) and especially generation. Linguistic steganography is a difficult NLP problem because any change to the cover text must retain the meaning and style of the original, in order to prevent detection by an adversary.

I will describe a number of linguistic transformations that we have investigated, including synonym substitution and adjective deletion. For the adjective deletion I will describe a novel secret sharing scheme in which many people receive a copy of the original text, but with different adjectives deleted; only when the various texts are combined together can the secret message be revealed.

Joint work with Ching-Yun (Frannie) Chang.


Stephen Clark is a Senior Lecturer at the University of Cambridge Computer Laboratory, where he is a member of the Natural Language and Information Processing Research Group. He has a PhD in Computer Science and Artificial Intelligence from the University of Sussex, and a Philosophy degree from Cambridge. Previously he was a University Lecturer in Computer Science at Oxford University and a postdoctoral researcher at the University of Edinburgh. He works on a wide range of topics in Natural Language Processing, but his main research interest is syntactic and semantic analysis, with a focus on type-driven approaches (in particular Categorial Grammar) and the combination of symbolic and data-driven methods.

Previous Next