Shay Cohen's homepage

Shay Cohen's publications
Show all abstracts Theorem Prover as a Judge for Synthetic Data Generation, Joshua Ong, Giwon Hong, Wenda Li and Shay B. Cohen, ACL 2025 PersonaLens: A Benchmark for Personalization Evaluation in Conversational AI Assistants, Zheng Zhao et al., Findings of ACL 2025 Eliciting In-context Retrieval and Reasoning for Long-context Language Models, Yifu Qiu et al., Findings of ACL 2025 Transferrable Surrogates in Expressive Neural Architecture Search Spaces, Shiwen Qin, Gabriela Kadlecova, Martin Pilat, Shay B. Cohen, Roman Neruda, Elliot J. Crowley, Jovita Lukasik and Linus Ericsson, AutoML 2025 Pre-training Time Series Models with Stock Data Customization, Mengyu Wang, Tiejun Ma and Shay B. Cohen, KDD 2025 (accepted) PoisonBench: Assessing Large Language Model Vulnerability to Poisoned Preference Data, Tingchen Fu, Mrinank Sharma, Philip Torr, Shay B. Cohen, David Krueger, Fazl Barez, ICML 2025 (accepted) DEPfold: RNA Secondary Structure Prediction as Dependency Parsing, Ke Wang and Shay B. Cohen, ICLR 2025 What can Large Language Models Capture about Code Functional Equivalence? Nickil Maveli, Antonio Vergari and Shay B. Cohen, Findings of NAACL 2025 People Attribute Purpose to Autonomous Vehicles When Explaining Their Behavior, Balint Gyevnar, Stephanie Droop, Tadeg Quillien, Shay B. Cohen, Neil R. Bramley, Christopher G. Lucas and Stefano V. Albrecht, CHI 2025 TSPRank: Bridging Pairwise and Listwise Methods with a Bilinear Travelling Salesman Model, Weixian Waylon Li, Yftah Ziser, Yifei Xie, Shay B. Cohen and Tiejun Ma, KDD 2025 (accepted) einspace: Searching for Neural Architectures from Fundamental Operations, Linus Ericsson et al., NeurIPS 2024 [project page] Spectral Editing of Activations for Large Language Model Alignment, Yifu Qiu, Zheng Zhao, Yftah Ziser, Anna Korhonen, Edoardo M. Ponti, Shay B. Cohen, NeurIPS 2024 Layer by Layer: Uncovering Where Multi-Task Learning Happens in Instruction-Tuned Large Language Models, Zheng Zhao, Yftah Ziser and Shay B. Cohen, EMNLP 2024 Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions, Clement Neo, Shay B. Cohen and Fazl Barez, EMNLP 2024 Modeling News Interactions and Influence for Financial Market Prediction, Mengyu Wang, Shay B. Cohen and Tiejun Ma, Findings of EMNLP 2024 Evaluating Automatic Metrics with Incremental Machine Translation Systems, Guojun Wu, Shay B. Cohen and Rico Sennrich, Findings of EMNLP 2024 (short) Can Large Language Model Summarizers Adapt to Diverse Scientific Communication Goals?, Marcio Fonseca and Shay B. Cohen, Findings of ACL 2024 Can Large Language Models Follow Concept Annotation Guidelines? A Case Study on Scientific and Financial Domains, Marcio Fonseca and Shay B. Cohen, Findings of ACL 2024 Large Language Models Relearn Removed Concepts, Michelle Lo, Shay B. Cohen, Fazl Barez, Findings of ACL 2024 On the Trade-off between Redundancy and Cohesiveness in Extractive Summarization, Ronald Cardenas, Matthias Galle, Shay B. Cohen, JAIR 2024 LeanReasoner: Boosting Complex Logical Reasoning with Lean, Dongwei Jiang, Marcio Fonseca, Shay B. Cohen, In NAACL 2024 Are Large Language Models Temporally Grounded?, Yifu Qiu, Zheng Zhao, Yftah Ziser, Anna Korhonen, Edoardo Ponti, Shay B. Cohen, In NAACL 2024 Think While You Write: Hypothesis Verification Promotes Faithful Knowledge-to-Text Generation, Yifu Qiu, Varun Embar, Shay B. Cohen, Benjamin Han, In Findings of NAACL 2024 Causal Explanations for Sequential Decision-Making in Multi-Agent Systems, Balint Gyevnar, Cheng Wang, Christopher G. Lucas, Shay B. Cohen, Stefano V. Albrecht, In AAMAS 2024 CivilSum: A Dataset for Abstractive Summarization of Indian Court Decisions, Manuj Malik, Zheng Zhao, Marcio Fonseca, Shrisha Rao and Shay B. Cohen, SIGIR 2024 Detecting and Mitigating Hallucinations in Multilingual Summarisation, Yifu Qiu, Yftah Ziser, Anna Korhonen, Edoardo Ponti, Shay B. Cohen, In EMNLP 2023 AMR Parsing is Far from Solved: GrAPES, the Granular AMR Parsing Evaluation Suite, Jonas Groschwitz, Shay B. Cohen, Lucia Donatelli, Meaghan Fowlie, In EMNLP 2023 A Joint Matrix Factorization Analysis of Multilingual Representations, Zheng Zhao, Yftah Ziser, Bonnie Webber, Shay B. Cohen, In Findings of EMNLP 2023 PMIndiaSum: Multilingual and Cross-lingual Headline Summarization for Languages in India, Ashok Urlana, Pinzhen Chen, Zheng Zhao, Shay B. Cohen, Manish Shrivastava, Barry Haddow, In Findings of EMNLP 2023 The Larger They Are, the Harder They Fail: Language Models do not Recognize Identifier Swaps in Python, Antonio Miceli, Fazl Barez, Ioannis Konstas and Shay B. Cohen, In ACL 2023 (short) Knowledge Base Question Answering for Space Debris Queries, Paul Darm, Antonio Miceli, Shay B. Cohen and Annalisa Ricardi, In ACL 2023 (industry track) Rant or Rave: Variation over Time in the Language of Online Reviews, Yftah Ziser, Bonnie Webber and Shay B. Cohen, In LRE 2023 Erasure of Unaligned Attributes from Neural Representations, Shun Shao, Yftah Ziser and Shay B. Cohen, In TACL 2023 [mini-errata] Gold Doesn't Always Glitter: Spectral Removal of Linear and Nonlinear Guarded Attribute Information, Shun Shao, Yftah Ziser and Shay B. Cohen, In EACL 2023 BERT is not The Count: Learning to Match Mathematical Statements with Proofs, Weixian Li, Yftah Ziser, Maximin Coavoux and Shay B. Cohen, In EACL 2023 Nonparametric Learning of Two-Layer ReLU Residual Units, Zhunxuan Wang, Linyun He, Chunchuan Lyu and Shay B. Cohen, In Transactions on Machine Learning Research 2022 Factorizing Content and Budget Decisions in Abstractive Summarization of Long Documents, Marcio Fonseca, Yftah Ziser and Shay B. Cohen, In EMNLP 2022 Sentence-Incremental Neural Coreference Resolution, Matt Grenander, Shay B. Cohen and Mark Steedman, In EMNLP 2022 Abstractive Summarization Guided by Latent Hierarchical Document Structure, Yifu Qiu and Shay B. Cohen, In EMNLP 2022 Understanding Domain Learning in Language Models Through Subpopulation Analysis, Zheng Zhao, Yftah Ziser and Shay B. Cohen, In BlackboxNLP 2022 Co-training an Unsupervised Constituency Parser with Weak Supervision, Nickil Maveli and Shay B. Cohen, In Findings of ACL 2022 A Root of a Problem: Optimizing Single-Root Dependency Parsing, Milos Stanojevic and Shay B. Cohen, In EMNLP 2021 Open-Domain Contextual Link Prediction and its Complementarity with Entailment Graphs, Javad Hosseini, Shay B. Cohen, Mark Johnson and Mark Steedman, In Findings of EMNLP 2021 A Differentiable Relaxation of Graph Segmentation and Alignment for AMR Parsing, Chunchuan Lyu, Shay B. Cohen and Ivan Titov, In EMNLP 2021 A Closer Look into the Robustness of Neural Dependency Parsers Using Better Adversarial Examples, Yuxuan Wang et al., In Findings of ACL 2021 Text Generation from Discourse Representation Structures, Jiangming Liu, Shay B. Cohen and Mirella Lapata, In NAACL 2021 Universal Discourse Representation Structure Parsing, Jiangming Liu, Shay B. Cohen, Mirella Lapata and Johan Bos, Computational Linguistics 2021 Learning to Match Mathematical Statements with Proofs, Maximin Coavoux and Shay B. Cohen, arXiv preprint 2021 [note: this manuscript was completed in 2018, and rejected from NAACL 2019 and later on. Eventually, it was accepted in a newer form to EACL 2023, see above.] Narration Generation for Cartoon Videos, Nikos Papasarantopoulos and Shay B. Cohen, arXiv preprint 2021 Bottom-Up Unranked Tree-to-Graph Transducers for Translation into Semantic Graphs, Johanna Björklund, Shay B. Cohen, Frank Drewes and Giorgio Satta, Theoretical Computer Science 2020 Multi-Step Inference for Reasoning Over Paragraphs, Jiangming Liu, Matt Gardner, Shay B. Cohen and Mirella Lapata, In EMNLP 2020 Lightweight, Dynamic Graph Convolutional Networks for AMR-to-Text Generation, Yan Zhang, Zhijiang Guo, Zhiyang Teng, Wei Lu, Shay B. Cohen, Zuozhu Liu and Lidong Bing, In EMNLP 2020 The Role of Reentrancies in Abstract Meaning Representation Parsing, Ida Szubert, Marco Damonte, Shay B. Cohen and Mark Steedman, In Findings of EMNLP 2020 Reducing Quantity Hallucinations in Abstractive Summarization, Zheng Zhao, Shay B. Cohen and Bonnie Webber, In Findings of EMNLP 2020 English-to-Chinese Transliteration with a Phonetic Auxiliary Task, Yuan He and Shay B. Cohen, In AACL 2020 Tensors over Semirings for Latent-Variable Weighted Logic Programs, Esma Balkir, Daniel Gildea and Shay B. Cohen, In IWPT 2020 Obfuscation for Privacy-preserving Syntactic Parsing, Zhifeng Hu, Serhii Havrylov, Ivan Titov and Shay B. Cohen, In IWPT 2020 Machine Reading of Historical Events, Or Honovich, Lucas Torroba Hennigen, Omri Abend and Shay B. Cohen, In ACL 2020 Learning Dialog Policies from Weak Demonstrations, Gabriel Gordon-Hall, Philip John Gorinski and Shay B. Cohen, In ACL 2020 DScorer: A Fast Evaluation Metric for Discourse Representation Structure Parsing, Jiangming Liu, Shay B. Cohen and Mirella Lapata, In ACL short 2020 Learning Latent Forests for Medical Relation Extraction, Zhijiang Guo, Guoshun Nan, Wei Lu, Shay B. Cohen, In IJCAI 2020 Compositional Languages Emerge in a Neural Iterated Learning Model, Yi Ren, Shangmin Guo, Matthieu Labeau, Shay B. Cohen, Simon Kirby, In ICLR 2020 Semantic Role Labeling with Iterative Structure Refinement, Chunchuan Lyu, Shay B. Cohen and Ivan Titov, In EMNLP 2019 Experimenting with Power Divergences for Language Modeling, Matthieu Labeau and Shay B. Cohen, In EMNLP 2019 Partners in Crime: Multi-view Sequential Inference for Movie Understanding, Nikos Papasarantopoulos, Lea Frermann, Mirella Lapata and Shay B. Cohen, In EMNLP 2019 Bottom-Up Unranked Tree-to-Graph Transducers for Translation into Semantic Graphs, Johanna Björklund, Shay B. Cohen, Frank Drewes and Giorgio Satta, In FSMNLP 2019 Discourse Representation Structure Parsing with Recurrent Neural Networks and the Transformer Model, Jiangming Liu, Shay B. Cohen and Mirella Lapata, In the Shared Task on Semantic Parsing of IWCS 2019 (winning system) What is this Article about? Extreme Summarization with Topic-aware Convolutional Neural Networks, Shashi Narayan, Shay B. Cohen and Mirella Lapata, In JAIR 2019 Wide-Coverage Neural A* Parsing for Minimalist Grammars, John Torr, Milos Stanojevic, Mark Steedman and Shay B. Cohen, In ACL 2019 [pdf] [abstract] [bibtex] Minimalist Grammars \cite{Stabler1997} are a computationally oriented, and rigorous formalisation of many aspects of Chomsky's (\citeyear{Chomsky1995}) Minimalist Program. This paper presents the first ever application of this formalism to the task of realistic wide-coverage parsing. The parser uses a linguistically expressive yet highly constrained grammar, together with an adaptation of the A* search algorithm currently used in CCG parsing \cite{Lewis2014b,Lewis2016}, with supertag probabilities provided by a bi-LSTM neural network supertagger trained on MGbank, a corpus of MG derivation trees. We report on some promising initial experimental results for overall dependency recovery as well as on the recovery of certain unbounded long distance dependencies. Finally, although like other MG parsers, ours has a high order polynomial worst case time complexity, we show that in practice its expected time complexity is $\mathcal{O}(n^3)$. The parser is publicly available.\footnote{\url{https://github.com/mgparsing/astar_mg_parser}} @inproceedings{torr-19, title={Wide-Coverage Neural A* Parsing for Minimalist Grammars}, author={John Torr, Milos Stanojevic, Mark Steedman and Shay B. Cohen}, journal={Proceedings of {ACL}}, year={2019} } Discourse Representation Parsing for Sentences and Documents, Jiangming Liu, Shay B. Cohen and Mirella Lapata, In ACL 2019 [pdf] [abstract] [bibtex] We introduce a novel semantic parsing task based on Discourse Representation Theory (DRT; \citealt{kamp1993discourse}). Our model operates over Discourse Representation Tree Structures which we formally define for sentences and documents. We present a general framework for parsing discourse structures of arbitrary length and granularity. We achieve this with a neural model equipped with a supervised hierarchical attention mechanism and a linguistically-motivated copy strategy. Experimental results on sentence- and document-level benchmarks show that our model outperforms competitive baselines by a wide margin. @inproceedings{liu-19, title={Discourse Representation Parsing for Sentences and Documents}, author={Jiangming Liu and Shay B. Cohen and Mirella Lapata}, journal={Proceedings of {ACL}}, year={2019} } Duality of Link Prediction and Entailment Graph Induction, Mohammad Javad Hosseini, Shay B. Cohen, Mark Johnson and Mark Steedman, In ACL 2019 [pdf] [abstract] [bibtex] Link prediction and entailment graph induction are often treated as different problems. In this paper, we show that these two problems are actually complementary. We train a link prediction model on a knowledge graph of assertions extracted from raw text. % to estimate the probability of any new fact % We reform the knowledge graph representation into a bipartite graph with relations and entity-pairs as nodes, and their relationships as edges. We define a new entailment score on the new We propose an entailment score that exploits the new facts discovered by the link prediction model, and then form entailment graphs between relations. We further use the learned entailments to predict improved link prediction scores. Our results show that the two tasks can benefit from each other. The new entailment score outperforms prior state-of-the-art results on a standard entialment dataset and the new link prediction scores show improvements over the raw link prediction scores. @inproceedings{hosseini-19, title={Duality of Link Prediction and Entailment Graph Induction}, author={Mohammad Javad Hosseini and Shay B. Cohen and Mark Johnson and Mark Steedman}, journal={Proceedings of {ACL}}, year={2019} } PartCrafter: Find, Generate, and Analyze BioParts, Emily Scher, Shay B. Cohen and Guido Sanguinetti, In Synthetic Biology 2019 [website] [abstract] [bibtex] The field of Synthetic Biology is both practically and philosophically reliant on the idea of BioParts—concrete DNA sequences meant to represent discrete functionalities. While there are a number of software tools which allow users to design complex DNA sequences by stitching together BioParts or genetic features into genetic devices, there is a lack of tools assisting Synthetic Biologists in finding BioParts and in generating new ones. In practice, researchers often find BioParts in an ad hoc way. We present PartCrafter, a tool which extracts and aggregates genomic feature data in order to facilitate the search for new BioParts with specific functionalities. PartCrafter can also turn a genomic feature into a BioPart by packaging it according to any manufacturing standard, codon optimizing it for a new host, and removing forbidden sites. PartCrafter is available at partcrafter.com. @article{scher-19, title={PartCrafter: Find, Generate, and Analyze BioParts}, author={Emily Scher and Shay B. Cohen and Guido Sanguinetti}, journal={Synthetic Biology}, year={2019} } Structural Neural Encoders for AMR-to-text Generation, Marco Damonte and Shay B. Cohen, In NAACL 2019 [pdf] [abstract] [bibtex] AMR-to-text generation is a problem recently introduced to the NLP community, in which the goal is to generate sentences from Abstract Meaning Representation (AMR) graphs. Sequence-to-sequence models can be used to this end by converting the AMR graphs to strings. Approaching the problem while working directly with graphs requires the use of graph-to-sequence models that encode the AMR graph into a vector representation. Such encoding has been shown to be beneficial in the past, and unlike sequential encoding, it allows us to explicitly capture reentrant structures in the AMR graphs. We investigate the extent to which reentrancies (nodes with multiple parents) have an impact on AMR-to-text generation by comparing graph encoders to tree encoders, where reentrancies are not preserved. We show that improvements in the treatment of reentrancies and long-range dependencies contribute to higher overall scores for graph encoders. Our best model achieves 24.40 BLEU on LDC2015E86, outperforming the state of the art by 1.1 points and 24.54 BLEU on LDC2017T10, outperforming the state of the art by 1.24 points. @inproceedings{damonte-19, title={Structural Neural Encoders for AMR-to-text Generation}, author={Marco Damonte and Shay B. Cohen}, booktitle={Proceedings of {NAACL}}, year={2019} } Jointly Extracting and Compressing Documents with Summary State Representations, Afonso Mendes, Shashi Narayan, Sebastião Miranda, Zita Marinho, André F. T. Martins and Shay B. Cohen, In NAACL 2019 [pdf] [abstract] [bibtex] We present a new neural model for text summarization that first extracts sentences from a document and then compresses them. The proposed model offers a balance that sidesteps the difficulties in abstractive methods while generating more concise summaries than extractive methods. In addition, our model dynamically determines the length of the output summary based on the gold summaries it observes during training and does not require length constraints typical to extractive summarization. The model achieves state-of-the-art results on the CNN/DailyMail and Newsroom datasets, improving over current extractive and abstractive methods. Human evaluations demonstrate that our model generates concise and informative summaries. We also make available a new dataset of oracle compressive summaries derived automatically from the CNN/DailyMail reference summaries. @inproceedings{mendes-19, title={Jointly Extracting and Compressing Documents with Summary State Representations}, author={Afonso Mendes, Shashi Narayan, Sebasti{\~a}o Miranda, Zita Marinho, Andr{\'{e}} F. T. Martins and Shay B. Cohen}, booktitle={Proceedings of {NAACL}}, year={2019} } Discontinuous Constituency Parsing with a Stack-free Transition System and a Dynamic Oracle, Maximin Coavoux and Shay B. Cohen, In NAACL 2019 [pdf] [abstract] [bibtex] Lexicalized parsing models are based on the assumptions that (i) constituents are organized around a lexical head (ii) bilexical statistics are crucial to solve ambiguities. In this paper, we introduce an unlexicalized transition-based parser for discontinuous constituency structures, based on a structure-label transition system and a bi-LSTM scoring system. We compare it to lexicalized parsing models in order to address the question of lexicalization in the context of discontinuous constituency parsing. Our experiments show that unlexicalized models systematically achieve higher results than lexicalized models, and provide additional empirical evidence that lexicalization is not necessary to achieve strong parsing results. Our best unlexicalized model sets a new state of the art on English and German discontinuous constituency treebanks. We further provide a per-phenomenon analysis of its errors on discontinuous constituents. @inproceedings{coavoux-19b, title={Discontinuous Constituency Parsing with a Stack-free Transition System and a Dynamic Oracle}, author={Maximin Coavoux and Shay B. Cohen}, booktitle={Proceedings of {NAACL}}, year={2019} } Unlexicalized Transition-based Discontinuous Constituency Parsing, Maximin Coavoux, Benoît Crabbé and Shay B. Cohen, In TACL 2019 [pdf] [abstract] [bibtex] Lexicalized parsing models are based on the assumptions that (i) constituents are organized around a lexical head (ii) bilexical statistics are crucial to solve ambiguities. In this paper, we introduce an unlexicalized transition-based parser for discontinuous constituency structures, based on a structure-label transition system and a bi-LSTM scoring system. We compare it to lexicalized parsing models in order to address the question of lexicalization in the context of discontinuous constituency parsing. Our experiments show that unlexicalized models systematically achieve higher results than lexicalized models, and provide additional empirical evidence that lexicalization is not necessary to achieve strong parsing results. Our best unlexicalized model sets a new state of the art on English and German discontinuous constituency treebanks. We further provide a per-phenomenon analysis of its errors on discontinuous constituents. @inproceedings{coavoux-19, title={Unlexicalized Transition-based Discontinuous Constituency Parsing}, author={Maximin Coavoux, Beno{\^{\i}}t Crabb{\'{e}} and Shay B. Cohen}, journal = "Transactions of the Association for Computational Linguistics", year={2019} } Learning Typed Entailment Graphs with Global Soft Constraints, Mohammad Javad Hosseini, Nathanael Chambers, Siva Reddy, Xavier Holt, Shay B. Cohen, Mark Johnson, and Mark Steedman, In TACL 2018 [pdf] [abstract] [bibtex] This paper presents a new method for learning typed entailment graphs from text. We extract predicate-argument structures from multiple-source news corpora, and compute local distributional similarity scores to learn entailments between predicates with typed arguments (e.g.,\ {\em person} contracted {\em disease}). Previous work has used transitivity constraints to improve local decisions, but these constraints are intractable on large graphs. We instead propose a scalable method that learns globally consistent similarity scores based on new soft constraints that consider both the structures across typed entailment graphs and inside each graph. Learning takes only a few hours to run over $100$K predicates and our results show large improvements over local similarity scores on two entailment datasets. We further show improvements over paraphrases and entailments from the Paraphrase Database, and prior state-of-the-art entailment graphs. We show that the entailment graphs improve performance in a downstream task. @article{hosseini-18, title={Learning Typed Entailment Graphs with Global Soft Constraints}, author={Mohammad Javad Hosseini, Nathanael Chambers, Siva Reddy, Xavier Holt, Shay B. Cohen, Mark Johnson, and Mark Steedman}, journal = "Transactions of the Association for Computational Linguistics", year={2018} } Privacy-preserving Neural Representations of Text, Maximin Coavoux, Shashi Narayan and Shay B. Cohen, In EMNLP 2018 [pdf] [abstract] [bibtex] [Maximin's slides] This article deals with adversarial attacks towards deep learning systems for Natural Language Processing (NLP), in the context of privacy protection. We study a specific type of attack: an attacker eavesdrops on the hidden representations of a neural text classifier and tries to recover information about the input text. Such scenario may arise in situations when the computation of a neural network is shared across multiple devices, e.g.\ some hidden representation is computed by a user's device and sent to a cloud-based model. We measure the privacy of a hidden representation by the ability of an attacker to predict accurately specific private information from it and characterize the tradeoff between the privacy and the utility of neural representations. Finally, we propose several defense methods based on modified training objectives and show that they improve the privacy of neural representations. @inproceedings{coavoux-18, title={Privacy-preserving Neural Representations of Text}, author={Maximin Coavoux and Shashi Narayan and Shay B. Cohen}, booktitle={Proceedings of {EMNLP}}, year={2018} } Don't Give Me the Details, Just the Summary! Topic-aware Convolutional Neural Networks for Extreme Summarization, Shashi Narayan, Shay B. Cohen and Mirella Lapata, In EMNLP 2018 [pdf] [abstract] [bibtex] [Shashi's slides] We introduce \emph{extreme summarization}, a new single-document summarization task which does not favor extractive strategies and calls for an abstractive modeling approach. The idea is to create a short, one-sentence news summary answering the question ``What is the article about?''. We collect a real-world, large scale dataset for this task by harvesting online articles from the British Broadcasting Corporation (BBC). We propose a novel abstractive model which is conditioned on the article's topics and based entirely on convolutional neural networks. We demonstrate experimentally that this architecture captures long-range dependencies in a document and recognizes pertinent content, outperforming an oracle extractive system and state-of-the-art abstractive approaches when evaluated automatically and by humans.\footnote{Our dataset, code, and demo are available at: \url{https://github.com/shashiongithub/XSum}. @inproceedings{narayan-18b, title={Don't Give Me the Details, Just the Summary! Topic-aware Convolutional Neural Networks for Extreme Summarization}, author={Shashi Narayan and Shay B. Cohen and Mirella Lapata}, booktitle={Proceedings of {EMNLP}}, year={2018} } Multilingual Clustering of Streaming News, Sebastião Miranda, Artūrs Znotiņš, Shay B. Cohen and Guntis Barzdins, In EMNLP 2018 [pdf] [abstract] [bibtex] Clustering news across languages enables efficient media monitoring by aggregating articles from multilingual sources into coherent stories. Doing so in an online setting allows scalable processing of massive news streams. To this end, we describe a novel method for clustering an incoming stream of multilingual documents into monolingual and crosslingual story clusters. Unlike typical clustering approaches that consider a small and known number of labels, we tackle the problem of discovering an ever growing number of cluster labels in an online fashion, using real news datasets in multiple languages. Our method is simple to implement, computationally efficient and produces state-of-the-art results on datasets in German, English and Spanish. @inproceedings{miranda-18, title={Multilingual Clustering of Streaming News}, author={Sebasti{\~a}o Miranda and Arturs Znotins and Shay B. Cohen and Guntis Barzdins}, booktitle={Proceedings of {EMNLP}}, year={2018} } Local String Transduction as Sequence Labeling, Joana Ribeiro, Shashi Narayan, Shay B. Cohen and Xavier Carreras, In COLING 2018 [pdf] [abstract] [bibtex] [errata] We show that the general problem of string transduction can be reduced to the problem of sequence labeling. While character deletions and insertions are allowed in string transduction, they do not exist in sequence labeling. We show how to overcome this difference. Our approach can be used with any sequence labeling algorithm and it works best for problems in which string transduction imposes a strong notion of locality (no long range dependencies). We experiment with spelling correction for social media, OCR correction, and morphological inflection, and we see that it behaves better than seq2seq models and yields state-of-the-art results in several cases. @inproceedings{ribeiro-18, title={Local String Transduction as Sequence Labeling}, author={Joana Ribeiro and Shashi Narayan and Shay B. Cohen and Xavier Carreras}, booktitle={Proceedings of {COLING}}, year={2018} } Discourse Representation Structure Parsing, Jiangming Liu, Shay B. Cohen and Mirella Lapata, In ACL 2018. [pdf] [abstract] [bibtex] [appendix] We introduce an open-domain neural semantic parser which generates formal meaning representations in the style of Discourse Representation Theory (DRT; \citealt{kamp1993discourse}). We propose a method which transforms Discourse Representation Structures (DRSs) to trees and develop a structure-aware model which decomposes the decoding process into three stages: basic DRS structure prediction, condition prediction (i.e.,~predicates and relations), and referent prediction (i.e.,~variables). Experimental results on the Groningen Meaning Bank (GMB) show that our model outperforms competitive baselines by a wide margin. @inproceedings{liu-18, title={Discourse Representation Structure Parsing}, author={Jiangming Liu and Shay B. Cohen and Mirella Lapata}, booktitle={Proceedings of {ACL}}, year={2018} } Stock Movement Prediction from Tweets and Historical Prices, Yumo Xu and Shay B. Cohen, In ACL 2018. [pdf] [abstract] [bibtex] Stock movement prediction is a challenging problem: the market is highly \textit{stochastic}, and we make \textit{temporally-dependent} predictions from \textit{chaotic} data. We treat these three complexities and present a novel deep generative model jointly exploiting text and price signals for this task. Unlike the case with discriminative or topic modeling, our model introduces recurrent, continuous latent variables for a better treatment of stochasticity, and uses neural variational inference to address the intractable posterior inference. We also provide a hybrid objective with temporal auxiliary to flexibly capture predictive dependencies. We demonstrate the state-of-the-art performance of our proposed model on a new stock movement prediction dataset which we collected.\footnote{\url{https://github.com/yumoxu/stocknet-dataset}} @inproceedings{xu-18, title={Stock Movement Prediction from Tweets and Historical Prices}, author={Yumo Xu and Shay B. Cohen}, booktitle={Proceedings of {ACL}}, year={2018} } Document Modeling with External Attention for Sentence Extraction, Shashi Narayan, Ronald Cardenas, Nikos Papasarantopoulos, Shay B. Cohen, Mirella Lapata, Jiangsheng Yu and Yi Chang, In ACL 2018. [pdf] [abstract] [bibtex] [appendix] [errata] Document modeling is essential to a variety of natural language understanding tasks. We propose to use external information to improve document modeling for problems that can be framed as sentence extraction. We develop a framework composed of a hierarchical document encoder and an attention-based extractor with attention over external information. We evaluate our model on extractive document summarization (where the external information is image captions and the title of the document) and answer selection (where the external information is a question). We show that our model consistently outperforms strong baselines, in terms of both informativeness and fluency (for CNN document summarization) and achieves state-of-the-art results for answer selection on WikiQA and NewsQA. @inproceedings{narayan-18b, title={Document Modeling with External Attention for Sentence Extraction}, author={Shashi Narayan and Ronald Cardenas and Nikos Papasarantopoulos and Shay B. Cohen and Mirella Lapata and Jiangsheng Yu and Yi Chang}, booktitle={Proceedings of {ACL}}, year={2018} } Cross-lingual Abstract Meaning Representation Parsing, Marco Damonte and Shay B. Cohen, In NAACL 2018 [pdf] [abstract] [bibtex] Abstract Meaning Representation (AMR) annotation efforts have mostly focused on English. In order to train parsers on other languages, we propose a method based on annotation projection, which involves exploiting annotations in a source language and a parallel corpus of the source language and a target language. Using English as the source language, we show promising results for Italian, Spanish, German and Chinese as target languages. Besides evaluating the target parsers on non-gold datasets, we further propose an evaluation method that exploits the English gold annotations and does not require access to gold annotations for the target languages. This is achieved by inverting the projection process: a new English parser is learned from the target language parser and evaluated on the existing English gold standard. @inproceedings{damonte-18, title={Cross-lingual Abstract Meaning Representation Parsing}, author={Marco Damonte and Shay B. Cohen}, booktitle={Proceedings of {NAACL}}, year={2018} } Ranking Sentences for Extractive Summarization with Reinforcement Learning, Shashi Narayan, Shay B. Cohen and Mirella Lapata, In NAACL 2018 [pdf] [abstract] [bibtex] Single document summarization is the task of producing a shorter version of a document while preserving its principal information content. In this paper we conceptualize extractive summarization as a sentence ranking task and propose a novel training algorithm which globally optimizes the ROUGE evaluation metric through a reinforcement learning objective. We use our algorithm to train a neural summarization model on the CNN and DailyMail datasets and demonstrate experimentally that it outperforms state-of-the-art extractive and abstractive systems when evaluated automatically and by humans. @inproceedings{narayan-18a, title={Ranking Sentences for Extractive Summarization with Reinforcement Learning}, author={Shashi Narayan and Shay B. Cohen and Mirella Lapata}, booktitle={Proceedings of {NAACL}}, year={2018} } Abstract Meaning Representation for Paraphrase Detection, Fuad Issa, Marco Damonte, Shay B. Cohen, Xiaohui Yan and Yi Chang, In NAACL 2018 [pdf] [abstract] [bibtex] Abstract Meaning Representation (AMR) parsing aims at abstracting away from the syntactic realization of a sentence, and denoting only its meaning in a canonical form. As such, it is ideal for paraphrase detection, a problem in which one is required to specify whether two sentences have the same meaning. We show that naive use of AMR in paraphrase detection is not necessarily useful, and turn to describe a technique based on latent semantic analysis in combination with AMR parsing that significantly advances state-of-the-art results in paraphrase detection for the Microsoft Research Paraphrase Corpus. Our best results in the transductive setting are 86.6\% for accuracy and 90.0\% for F_1 measure. @inproceedings{issa-18, title={Abstract Meaning Representation for Paraphrase Detection}, author={Fuad Issa and Marco Damonte and Shay B. Cohen and Xiaohui Yan and Yi Chang}, booktitle={Proceedings of {NAACL}}, year={2018} } Canonical Correlation Inference for Mapping Abstract Scenes to Text, Nikos Papasarantopoulos, Helen Jiang and Shay B. Cohen, In AAAI 2018 [pdf] [abstract] [bibtex] We describe a technique for structured prediction, based on canonical correlation analysis. Our learning algorithm finds two projections for the input and the output spaces that aim at projecting a given input and its correct output into points close to each other. We demonstrate our technique on a language-vision problem, namely the problem of giving a textual description to an "abstract scene". @inproceedings{papas-18, title={Canonical Correlation Inference for Mapping Abstract Scenes to Text}, author={Nikos Papasarantopoulos and Helen Jiang and Shay B. Cohen}, booktitle={Proceedings of {AAAI}}, year={2018} } Whodunnit? Crime Drama as a Case for Natural Language Understanding, Lea Frermann, Shay B. Cohen and Mirella Lapata, In TACL 2018 [pdf] [abstract] [bibtex] [Lea's slides] In this paper we argue that crime drama exemplified in television programs such as \emph{CSI: Crime Scene Investigation} is an ideal testbed for approximating real-world natural language understanding and the complex inferences associated with it. We propose to treat crime drama as a new inference task, capitalizing on the fact that each episode poses the same basic question (i.e.,~who committed the crime) and naturally provides the answer when the perpetrator is revealed. We develop a new dataset based on CSI episodes, formalize perpetrator identification as a sequence labeling problem, and develop an LSTM-based model which learns from multi-modal data. Experimental results show that an incremental inference strategy is key to making accurate guesses as well as learning from representations fusing textual, visual, and acoustic input. @article{frermann-18, author = "L. Frermann and S. B. Cohen and M. Lapata", title = "Whodunnit? Crime Drama as a Case for Natural Language Understanding", journal = "Transactions of the Association for Computational Linguistics", year = "2018" } Split and Rephrase, Shashi Narayan, Claire Gardent, Shay B. Cohen and Anastasia Shimorina, In EMNLP 2017 [pdf] [abstract] [bibtex] [Shashi's slides] We propose a new sentence simplification task (Split-and-Rephrase) where the aim is to split a complex sentence into a meaning preserving sequence of shorter sentences. Like sentence simplification, splitting-and-rephrasing has the potential of benefiting both natural language processing and societal applications. Because shorter sentences are generally better processed by NLP systems, it could be used as a preprocessing step which facilitates and improves the performance of parsers, semantic role labelers and machine translation systems. It should also be of use for people with reading disabilities because it allows the conversion of longer sentences into shorter ones. This paper makes two contributions towards this new task. First, we create and make available a benchmark consisting of 1,066,115 tuples mapping a single complex sentence to a sequence of sentences expressing the same meaning. (The task dataset is available here: https://github.com/shashiongithub/Split-and-Rephrase.) Second, we propose five models (vanilla sequence-to-sequence to semantically-motivated models) to understand the difficulty of the proposed task. @inproceedings{narayan-17, title={Split and Rephrase}, author={Shashi Narayan, Claire Gardent, Shay B. Cohen and Anastasia Shimorina} booktitle={Proceedings of {EMNLP}}, year={2017} } Latent-Variable PCFGs: Background and Applications, Shay B. Cohen, In MOL 2017 [pdf] [abstract] [bibtex] [invited talk slides] Latent-variable probabilistic context-free grammars are latent-variable models that are based on context-free grammars. Nonterminals are associated with latent states that provide contextual information during the top-down rewriting process of the grammar. We survey a few of the techniques used to estimate such grammars and to parse text with them. We also give an overview of what the latent states represent for English Penn treebank parsing, and provide an overview of extensions and related models to these grammars. @inproceedings{cohen-17, title={Latent-Variable PCFGs: Background and Applications}, author={Shay B. Cohen}, booktitle={Proceedings of {MOL}}, year={2017} } The SUMMA Platform Prototype, Renars Liepins et al., In EACL 2017 (demo track) [pdf] An Incremental Parser for Abstract Meaning Representation, Marco Damonte, Shay B. Cohen and Giorgio Satta, In EACL 2017 [pdf] [arxiv] [abstract] [bibtex] [Marco's slides] Abstract Meaning Representation (AMR) is a semantic representation for natural language that embeds annotations related to traditional tasks such as named entity recognition, semantic role labeling, word sense disambiguation and co-reference resolution. We describe a transition-based parser for AMR that parses sentences left-to-right, in linear time. We further propose a test-suite that assesses specific subtasks that are helpful in comparing AMR parsers, and show that our parser is competitive with the state of the art on the LDC2015E86 dataset and that it outperforms state-of-the-art parsers for recovering named entities and handling polarity. @inproceedings{damonte-17, title={An Incremental Parser for Abstract Meaning Representation}, author={Marco Damonte and Shay B. Cohen and Giorgio Satta} booktitle={Proceedings of {EACL}}, year={2017} } Semi-Supervised Learning of Sequence Models with the Method of Moments, Zita Marinho, André F. T. Martins, Shay B. Cohen and Noah A. Smith , In EMNLP 2016 [pdf] [abstract] [bibtex] [Zita's slides] We propose a fast and scalable method for semi-supervised learning of sequence models, based on anchor words and moment matching. Our method can handle hidden Markov models with feature-based log-linear emissions. Unlike other semi-supervised methods, no decoding passes are necessary on the unlabeled data and no graph needs to be constructed---only one pass is necessary to collect moment statistics. The model parameters are estimated by solving a small quadratic program for each feature. Experiments on part-of-speech (POS) tagging for Twitter and for a low-resource language (Malagasy) show that our method can learn from very few annotated sentences. @inproceedings{marinho-16, title={Semi-Supervised Learning of Sequence Models with the Method of Moments}, author={Z. Marinho and A. F. T. Martins and S. B. Cohen and N. A. Smith}, booktitle={Proceedings of {EMNLP}}, year={2016} } Bayesian Analysis in Natural Language Processing, Shay B. Cohen, Synthesis Lectures on Human Language Technologies, Morgan and Claypool, 2016 [abstract] [bibtex] [website] [hardcopy] [amazon] Natural language processing (NLP) went through a profound transformation in the mid-1980s when it shifted to make heavy use of corpora and data-driven techniques to analyze language. Since then, the use of statistical techniques in NLP has evolved in several ways. One such example of evolution took place in the late 1990s or early 2000s, when full-fledged Bayesian machinery was introduced to NLP. This Bayesian approach to NLP has come to accommodate for various shortcomings in the frequentist approach and to enrich it, especially in the unsupervised setting, where statistical learning is done without target prediction examples. We cover the methods and algorithms that are needed to fluently read Bayesian learning papers in NLP and to do research in the area. These methods and algorithms are partially borrowed from both machine learning and statistics and are partially developed "in-house" in NLP. We cover inference techniques such as Markov chain Monte Carlo sampling and variational inference, Bayesian estimation, and nonparametric modeling. We also cover fundamental concepts in Bayesian statistics such as prior distributions, conjugacy, and generative modeling. Finally, we cover some of the fundamental modeling techniques in NLP, such as grammar modeling, and their use with Bayesian analysis. Keywords: natural language processing, computational linguistics, Bayesian statistics, Bayesian NLP, statistical learning, inference in NLP, grammar modeling in NLP @book{cohen-16, title={Bayesian Analysis in Natural Language Processing}, author={Shay B. Cohen}, series = {Synthesis Lectures on Human Language Technologies}, publisher={Morgan and Claypool}, year={2016} } Encoding Prior Knowledge with Eigenword Embeddings, Dominique Osborne, Shashi Narayan and Shay B. Cohen, In TACL 2016 [pdf] [abstract] [bibtex] [Shashi's slides] Canonical correlation analysis (CCA) is a method for reducing the dimension of data represented using two views. It has been previously used to derive word embeddings, where one view indicates a word, and the other view indicates its context. We describe a way to incorporate prior knowledge into CCA, give a theoretical justification for it, and test it by deriving word embeddings and evaluating them on a myriad of datasets. @article{osborne-16, author = "D. Osborne and S. Narayan and S. B. Cohen", title = "Encoding Prior Knowledge with Eigenword Embeddings", journal = "Transactions of the Association for Computational Linguistics", year = "2016" } Optimizing Spectral Learning for Parsing, Shashi Narayan and Shay B. Cohen, In ACL 2016 [pdf] [abstract] [bibtex] [models] [Shashi's slides] We describe a search algorithm for optimizing the number of latent states when estimating latent-variable PCFGs with spectral methods. Our results show that contrary to the common belief that the number of latent states for each nonterminal in an L-PCFG can be decided in isolation with spectral methods, parsing results significantly improve if the number of latent states for each nonterminal is globally optimized, while taking into account interactions between the different nonterminals. In addition, we contribute an empirical analysis of spectral algorithms on eight morphologically rich languages: Basque, French, German, Hebrew, Hungarian, Korean, Polish and Swedish. Our results show that our estimation consistently performs better or close to coarse-to-fine expectation-maximization techniques for these languages @inproceedings{narayan-16b, title={Optimizing Spectral Learning for Parsing}, author={Shashi Narayan and Shay B. Cohen}, booktitle={Proceedings of {ACL}}, year={2016} } Paraphrase Generation from Latent-Variable PCFGs for Semantic Parsing, Shashi Narayan, Siva Reddy and Shay B. Cohen, In INLG, 2016 [arxiv] [abstract] [bibtex] [Shashi's slides] One of the limitations of semantic parsing approaches to open-domain question answering is the lexicosyntactic gap between natural language questions and knowledge base entries -- there are many ways to ask a question, all with the same answer. In this paper we propose to bridge this gap by generating paraphrases of the input question with the goal that at least one of them will be correctly mapped to a knowledge-base query. We introduce a novel grammar model for paraphrase generation that does not require any sentence-aligned paraphrase corpus. Our key idea is to leverage the flexibility and scalability of latent-variable probabilistic context-free grammars to sample paraphrases. We do an extrinsic evaluation of our paraphrases by plugging them into a semantic parser for Freebase. Our evaluation experiments on the WebQuestions benchmark dataset show that the performance of the semantic parser significantly improves over strong baselines. @inproceedings{narayan-16, title={Paraphrase Generation from Latent-Variable PCFGs for Semantic Parsing}, author={Shashi Narayan and Siva Reddy and Shay B. Cohen}, booktitle={Proceedings of {INLG}}, year={2015} } Parsing Linear Context-Free Rewriting Systems with Fast Matrix Multiplication, Shay B. Cohen and Daniel Gildea, In Computational Linguistics, 2016 [pdf] [abstract] [bibtex] [arxiv] We describe a matrix multiplication recognition algorithm for a subset of binary linear context-free rewriting systems (LCFRS) with running time O(n^{\omega d}) where M(m)=O(m^\omega) is the running time for mm matrix multiplication and d is the "contact rank" of the LCFRS -- the maximal number of combination and non-combination points that appear in the grammar rules. We also show that this algorithm can be used as a subroutine to get a recognition algorithm for general binary LCFRS with running time O(n^(\omega d+1)). The currently best known \omega is smaller than 2.38. Our result provides another proof for the best known result for parsing mildly context sensitive formalisms such as combinatory categorial grammars, head grammars, linear indexed grammars, and tree adjoining grammars, which can be parsed in time O(n^4.76). It also shows that inversion transduction grammars can be parsed in time O(n^5.76). In addition, binary LCFRS subsumes many other formalisms and types of grammars, for some of which we also improve the asymptotic complexity of parsing. @article{cohen-15a, title={Parsing Linear Context-Free Rewriting Systems with Fast Matrix Multiplication}, author={Shay B. Cohen and Daniel Gildea}, journal={Computational Linguistics}, year={2016} } Low-Rank Approximation of Weighted Tree Automata, Guillaume Rabusseau, Borja Balle and Shay B. Cohen, In AISTATS 2016 [pdf] [abstract] [bibtex] [supplementary material] [arxiv] We describe a technique to minimize weighted tree automata (WTA), a powerful formalisms that subsumes probabilistic context-free grammars (PCFGs) and latent-variable PCFGs. Our method relies on a singular value decomposition of the underlying Hankel matrix defined by the WTA. Our main theoretical result is an efficient algorithm for computing the SVD of an infinite Hankel matrix implicitly represented as a WTA. We provide an analysis of the approximation error induced by the minimization, and we evaluate our method on real-world data originating in newswire treebank. We show that the model achieves lower perplexity than previous methods for PCFG minimization, and also is much more stable due to the absence of local optima. @inproceedings{rabusseua-16, title={Low-Rank Approximation of Weighted Tree Automata}, author={Guillaume Rabusseau and Borja Balle and Shay B. Cohen}, booktitle={Proceedings of {AISTATS}}, year={2016} } Conversation Trees: A Grammar Model for Topic Structure in Forums, Annie Louis and Shay B. Cohen, In EMNLP 2015 [pdf] [abstract] [bibtex] [data] Online forum discussions proceed differently from face-to-face conversations and any single thread on an online forum contains posts on different subtopics. This work aims to characterize the content of a forum thread as a \emph{conversation tree} of topics. We present models that jointly perform two tasks: segment a thread into subparts, and assign a topic to each part. Our core idea is a definition of topic structure using probabilistic grammars. By leveraging the flexibility of two grammar formalisms, Context-Free Grammars and Linear Context-Free Rewriting Systems, our models create desirable structures for forum threads: our topic segmentation is hierarchical, links non-adjacent segments on the same topic, and jointly labels the topic during segmentation. We show that our models outperform a number of tree generation baselines. @inproceedings{louis-15, title={Conversation Trees: A Grammar Model for Topic Structure in Forums}, author={Annie Louis and Shay B. Cohen}, booktitle={Proceedings of {EMNLP}}, year={2015} } Diversity in Spectral Learning for Natural Language Parsing, Shashi Narayan and Shay B. Cohen, In EMNLP 2015 [pdf] [abstract] [bibtex] We describe an approach to create a diverse set of predictions with spectral learning of latent-variable PCFGs (L-PCFGs). Our approach works by creating multiple spectral models where noise is added to the underlying features in the training set before the estimation of each model. We describe three ways to decode with multiple models. In addition, we describe a simple variant of the spectral algorithm for L-PCFGs that is fast and leads to compact models. Our experiments for natural language parsing, for English and German, show that we get a significant improvement over baselines comparable to state of the art. For English, we achieve the $F_1$ score of 90.18, and for German we achieve the $F_1$ score of 83.38. @inproceedings{narayan-15, title={Diversity in Spectral Learning for Natural Language Parsing}, author={Shashi Narayan and Shay B. Cohen}, booktitle={Proceedings of {EMNLP}}, year={2015} } A Coactive Learning View of Online Structured Prediction in Statistical Machine Translation, Artem Sokolov, Stefan Riezler and Shay B. Cohen, In CoNLL 2015 [pdf] [abstract] [bibtex] [Artem's slides] We present a theoretical analysis of online parameter tuning in statistical machine translation (SMT) from a coactive learning view. This perspective allows us to give regret and generalization bounds for latent perceptron algorithms that are common in SMT, but fall outside of the standard convex optimization scenario. Coactive learning also introduces the concept of weak feedback, which we apply in a proof-of-concept experiment to SMT, showing that learning from feedback that consists of slight improvements over predictions leads to convergence in regret and translation error rate. This suggests that coactive learning might be a viable framework for interactive machine translation. Furthermore, we find that surrogate translations replacing references that are unreachable in the decoder search space can be interpreted as weak feedback and lead to convergence in learning, if they admit an underlying linear model. @inproceedings{sokolv15, title={A Coactive Learning View of Online Structured Prediction in Statistical Machine Translation}, author={Artem Sokolv and Stefan Riezler and Shay B. Cohen}, booktitle={Proceedings of {CoNLL}}, year={2015} } Lexical Event Ordering with an Edge-Factored Model, Omri Abend, Shay B. Cohen and Mark Steedman, In NAACL 2015 [pdf] [abstract] [bibtex] [data] [slides] Extensive lexical knowledge is necessary for temporal analysis and planning tasks. We address in this paper a lexical setting that allows for the straightforward incorporation of rich features and structural constraints. We explore a lexical event ordering* task, namely determining the likely temporal order of events based solely on the identity of their predicates and arguments. We propose an ``edge-factored'' model for the task that decomposes over the edges of the event graph. We learn it using the structured perceptron. As lexical tasks require large amounts of text, we do not attempt manual annotation and instead use the textual order of events in a domain where this order is aligned with their temporal order, namely cooking recipes. @inproceedings{abend-15, author = "Omri Abend and S. B. Cohen and Mark Steedman", title = "Lexical Event Ordering with an Edge-Factored Model", booktitle = "Proceedings of {NAACL}", year = "2015" } Online Adaptor Grammars with Hybrid Inference, Ke Zhai, Jordan Boyd-Graber and Shay B. Cohen, In TACL 2014 [pdf] [supplementary material] [abstract] [bibtex] [Ke's code] Adaptor grammars are a flexible, powerful formalism for defining nonparametric, unsupervised models of grammar productions. This flexibility comes at the cost of expensive inference. We address the difficulty of inference through an online algorithm which uses a hybrid of Markov chain Monte Carlo and variational inference. We show that this inference strategy improves scalability without sacrificing performance on unsupervised word segmentation and topic modeling tasks. @inproceedings{zhai-14, author = "K. Zhai and J. Boyd-Graber and S. B. Cohen", title = "Online Adaptor Grammars with Hybrid Inference", booktitle = "Transactions of the Association for Computational Linguistics", year = "2014" } Latent-Variable Synchronous CFGs for Hierarchical Translation, Avneesh Saluja, Chris Dyer and Shay B. Cohen, In EMNLP 2014 [pdf] [abstract] [bibtex] [code] Data-driven refinement of non-terminal categories has been demonstrated to be a reliable technique for improving monolingual parsing with PCFGs. In this paper, we extend these techniques to learn latent refinements of single-category synchronous grammars, so as to improve translation performance. We compare two estimators for this latent-variable model: one based on EM and the other is a spectral algorithm based on the method of moments. We evaluate their performance on a Chinese-English translation task. The results indicate that we can achieve significant gains over the baseline with both approaches, but in particular the moments-based estimator is both faster and performs better than EM. @inproceedings{saluja-14, author = "A. Saluja and C. Dyer and S. B. Cohen", title = "Latent-Variable Synchronous {CFGs} for Hierarchical Translation", booktitle = "Proceedings of {EMNLP}", year = "2014" } Spectral Learning of Latent-Variable PCFGs: Algorithms and Sample Complexity, Shay B. Cohen, Karl Stratos, Michael Collins, Dean P. Foster and Lyle Ungar, In JMLR 2014 [pdf] [abstract] [bibtex] We introduce a spectral learning algorithm for latent-variable PCFGs (Matsuzaki et al., 2005; Petrov et al., 2006). Under a separability (singular value) condition, we prove that the method provides statistically consistent parameter estimates. Our result rests on three theorems: the first gives a tensor form of the inside-outside algorithm for PCFGs; the second shows that the required tensors can be estimated directly from training examples where hidden-variable values are missing; the third gives a PAC-style convergence bound for the estimation method. @article{cohen-14c, author = "S. B. Cohen and K. Stratos and M. Collins and D. P. Foster and L. Ungar", title = "Spectral Learning of Latent-Variable {PCFGs}: Algorithms and Sample Complexity", journal = "Journal of Machine Learning Research", year = "2014" } A Provably Correct Learning Algorithm for Latent-Variable PCFGs, Shay B. Cohen and Michael Collins, In ACL 2014 [pdf] [abstract] [bibtex] [slides] We introduce a provably correct learning algorithm for latent-variable PCFGs. The algorithm relies on two steps: first, the use of a matrix-decomposition algorithm applied to a co-occurrence matrix estimated from the parse trees in a training sample; second, the use of EM applied to a convex objective derived from the training samples in combination with the output from the matrix decomposition. Experiments on parsing and a language modeling problem show that the algorithm is efficient and effective in practice. @inproceedings{cohen-14b, author = "S. B. Cohen and M. Collins", title = "A Provably Correct Learning Algorithm for Latent-Variable {PCFGs}", booktitle = "Proceedings of {ACL}", year = "2014" } Spectral Unsupervised Parsing with Additive Tree Metrics, Ankur P. Parikh, Shay B. Cohen and Eric Xing, In ACL 2014 [pdf] [abstract] [bibtex] [appendix] [Ankur's slides] We propose a spectral approach for unsupervised constituent parsing that comes with theoretical guarantees on latent structure recovery. Our approach is grammarless -- we directly learn the bracketing structure of a given sentence without using a grammar model. The main algorithm is based on lifting the concept of additive tree metrics for structure learning of latent trees in the phylogenetic and machine learning communities to the case where the tree structure varies across examples. Although finding the ``minimal'' latent tree is NP-hard in general, for the case of projective trees we find that it can be found using bilexical parsing algorithms. Empirically, our algorithm performs favorably compared to the constituent context model of Klein and Manning (2002) without the need for careful initialization. @inproceedings{parikh-14, author = "A. P. Parikh and S. B. Cohen and E. Xing", title = "Spectral Unsupervised Parsing with Additive Tree Metrics", booktitle = "Proceedings of {ACL}", year = "2014" } Lexical Inference over Multi-Word Predicates: A Distributional Approach, Omri Abend, Shay B. Cohen and Mark Steedman, In ACL 2014 [pdf] [abstract] [bibtex] [code for train/test split] Representing predicates in terms of their argument distribution is common practice in NLP. Multi-word predicates (MWPs) in this context are often either disregarded or considered as fixed expressions. The latter treatment is unsatisfactory in two ways: (1) identifying MWPs is notoriously difficult, (2) MWPs show varying degrees of compositionality and could benefit from taking into account the identity of their component parts. We propose a novel approach that integrates the distributional representation of multiple sub-sets of the MWP's words. We assume a latent distribution over sub-sets of the MWP, and estimate it relative to a downstream prediction task. Focusing on the supervised identification of lexical inference relations, we compare against state-of-the-art baselines that consider a single sub-set of an MWP, obtaining substantial improvements. To our knowledge, this is the first work to address lexical relations between MWPs of varying degrees of compositionality within distributional semantics. @inproceedings{abend-14, author = "O. Abend and S. B. Cohen and M. Steedman", title = "Lexical Inference over Multi-Word Predicates: A Distributional Approach", booktitle = "Proceedings of {ACL}", year = "2014" } Spectral Learning of Refinement HMMs, Karl Stratos, Alexander M. Rush, Shay B. Cohen and Michael Collins, In CoNLL 2013 [pdf] [abstract] [bibtex] [Karl's slides] We derive a spectral algorithm for learning the parameters of a refinement HMM. This method is simple, efficient, and can be applied to a wide range of supervised sequence labeling tasks. Like other spectral methods, it avoids the problem of local optima and provides a consistent estimate of the parameters. Our experiments on a phoneme recognition task show that when equipped with informative feature functions, it performs significantly better than a supervised HMM and competitively with EM. @inproceedings{stratos-13, author = "K. Stratos and A. M. Rush and S. B. Cohen and M. Collins", title = "Spectral Learning of Refinement {HMMs}", booktitle = "Proceedings of {CoNLL}", year = "2013" } The Effect of Non-tightness on Bayesian Estimation of PCFGs, Shay B. Cohen and Mark Johnson, In ACL 2013 [pdf] [mathematica output] [abstract] [bibtex] [Mark's slides] [blog post] Probabilistic context-free grammars have the unusual property of not always defining tight distributions (i.e., the sum of the ``probabilities'' of the trees the grammar generates can be less than one). This paper reviews how this non-tightness can arise and discusses its impact on Bayesian estimation of PCFGs. We begin by presenting the notion of ``almost everywhere tight grammars'' and show that linear CFGs follow it. We then propose three different ways of reinterpreting non-tight PCFGs to make them tight, show that the Bayesian estimators in Johnson et al. (2007) are correct under one of them, and provide MCMC samplers for the other two. We conclude with a discussion of the impact of tightness empirically. @inproceedings{cohen-13c, author = "S. B. Cohen and M. Johnson", title = "The Effect of Non-tightness on Bayesian Estimation of {PCFGs}", booktitle = "Proceedings of {ACL}", year = "2013" } Experiments with Spectral Learning of Latent-Variable PCFGs, Shay B. Cohen, Karl Stratos, Michael Collins, Dean P. Foster and Lyle Ungar, In NAACL 2013 [pdf] [abstract] [bibtex] [talk (video)] [talk (slides)] Latent-variable PCFGs (L-PCFGs) are a highly successful model for natural language parsing. Recent work (Cohen et al., 2012) has introduced a spectral algorithm for parameter estimation of L-PCFGs, which---unlike the EM algorithm---is guaranteed to give consistent parameter estimates (it has PAC-style guarantees of sample complexity). This paper describes experiments using the spectral algorithm. We show that the algorithm provides models with the same accuracy as EM, but is an order of magnitude more efficient. We describe a number of key steps used to obtain this level of performance; these should be relevant to other work on the application of spectral learning algorithms. We view our results as strong empirical evidence for the viability of spectral methods as an alternative to EM. @inproceedings{cohen-13b, author = "S. B. Cohen and K. Stratos and M. Collins and D. P. Foster and L. Ungar", title = "Experiments with Spectral Learning of Latent-Variable {PCFGs}", booktitle = "Proceedings of {NAACL}", year = "2013" } Approximate PCFG Parsing Using Tensor Decomposition, Shay B. Cohen, Giorgio Satta and Michael Collins, In NAACL 2013 [pdf] [abstract] [bibtex] We provide an approximation algorithm for PCFG parsing, which asymptotically improves time complexity with respect to the input grammar size, and prove upper bounds on the approximation quality. We test our algorithm on two treebanks, and get significant improvements in parsing speed. @inproceedings{cohen-13a, author = "S. B. Cohen and G. Satta and M. Collins", title = "Approximate PCFG Parsing Using Tensor Decomposition", booktitle = "Proceedings of {NAACL}", year = "2013" } Tensor Decomposition for Fast Parsing with Latent-Variable PCFGs, Shay B. Cohen and Michael Collins, In Advances in Neural Information Processing Systems 2012 [pdf] [abstract] [bibtex] We describe an approach to speed-up inference with latent-variable PCFGs, which have been shown to be highly effective for natural language parsing. Our approach is based on a tensor formulation recently introduced for spectral estimation of latent-variable PCFGs coupled with a tensor decomposition algorithm well-known in the multilinear algebra literature. We also describe an error bound for this approximation, which gives guarantees showing that if the underlying tensors are well approximated, then the probability distribution over trees will also be well approximated. Empirical evaluation on real-world natural language parsing data demonstrates a significant speed-up at minimal cost for parsing performance. @inproceedings{cohen-12c, author = "S. B. Cohen and M. Collins", title = "Tensor Decomposition for Fast Parsing with Latent-Variable {PCFGs}", booktitle = "Advances in Neural Information Processing Systems", year = "2012" } Elimination of Spurious Ambiguity in Transition-Based Dependency Parsing, Shay B. Cohen, Carlos Gómez-Rodríguez and Giorgio Satta, In arXiv (1206.6735), 2012 [pdf] [abstract] [bibtex] We present a novel technique to remove spurious ambiguity from transition systems for dependency parsing. Our technique chooses a canonical sequence of transition operations (computation) for a given dependency tree. Our technique can be applied to a large class of bottom-up transition systems, including for instance Nivre (2004) and Attardi (2006). @techreport{cohen-12b, author = "S. B. Cohen and C. G{\'o}mez-Rodr{\'\i}guez and G. Satta", title = "Elimination of Spurious Ambiguity in Transition-Based Dependency Parsing", year = "2012", eprint = "arXiv:1206.6735", url = "http://arxiv.org/pdf/1206.6735v1" } Spectral Learning of Latent-Variable PCFGs, Shay B. Cohen, Karl Stratos, Michael Collins, Dean P. Foster and Lyle Ungar, In ACL 2012 [pdf] [longer JMLR version, stronger model] [abstract] [bibtex] We introduce a spectral learning algorithm for latent-variable PCFGs (Petrov et al., 2006; Matsuzaki et al., 2005). Under a separability (singular value) condition, we prove that the method provides consistent parameter estimates. Our result rests on three theorems: the first gives a tensor form of the inside-outside algorithm for PCFGs; the second shows that the required tensors can be estimated directly from training examples where hidden-variable values are missing; the third gives a PAC-style convergence bound for the estimation method. @inproceedings{cohen-12a, author = "S. B. Cohen and K. Stratos and M. Collins and D. P. Foster and L. Ungar", title = "Spectral Learning of Latent-Variable {PCFGs}", booktitle = "Proceedings of ACL", year = "2012" } Empirical Risk Minimization for Probabilistic Grammars: Sample Complexity and Hardness of Learning, Shay B. Cohen and Noah A. Smith, Computational Linguistics (2012) [pdf] [abstract] [bibtex] Probabilistic grammars are generative statistical models that are useful for compositional and sequential structures. They are used ubiquitously in computational linguistics. We present a framework, reminiscent of structural risk minimization, for empirical risk minimization of probabilistic grammars using the log-loss. We derive sample complexity bounds in this framework that apply both to the supervised setting and the unsupervised setting. By making assumptions about the underlying distribution that are appropriate for natural language scenarios, we are able to derive distribution-dependent sample complexity bounds for probabilistic grammars. We also give simple algorithms for carrying out empirical risk minimization using this framework in both the supervised and unsupervised settings. In the unsupervised case, we show that the problem of minimizing empirical risk is NP-hard. We therefore suggest an approximate algorithm, similar to expectation-maximization, to minimize the empirical risk. @article{cohen-12c, author = "S. B. Cohen and N. A. Smith", title = "Empirical Risk Minimization for Probabilistic Grammars: Sample Complexity and Hardness of Learning", journal = "Computational Linguistics", volume = "38", number = "3", pages = "479--526", year = "2012" } Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance, Shay B. Cohen, Dipanjan Das and Noah A. Smith, In EMNLP 2011[pdf] [abstract] [bibtex] We describe a method for prediction of linguistic structure in a language for which only unlabeled data is available, using annotated data from a set of one or more helper languages. Our approach is based on a model that locally mixes between supervised models from the helper languages. Parallel data is not used, allowing the technique to be applied even in domains where human-translated texts are unavailable. We obtain state-of-the-art performance for two tasks of structure prediction: unsupervised part-of-speech tagging and unsupervised dependency parsing. @inproceedings{cohen-11b, author = "S. B. Cohen and D. Das and N. A. Smith", title = "Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance", booktitle = "Proceedings of EMNLP", year = "2011" } Exact Inference for Generative Probabilistic Non-Projective Dependency Parsing, Shay B. Cohen, Carlos Gómez-Rodríguez and Giorgio Satta, In EMNLP 2011 [pdf] [abstract] [bibtex] We describe a generative model for non-projective dependency parsing based on a simplified version of a transition system that has recently appeared in the literature. We then develop a dynamic programming parsing algorithm for our model, and derive an inside-outside algorithm that can be used for unsupervised learning of non-projective dependency trees. @inproceedings{cohen-11b, author = "S. B. Cohen and C. G{\'o}mez-Rodr{\'\i}guez and G. Satta", title = "Exact Inference for Generative Probabilistic Non-Projective Dependency Parsing", booktitle = "Proceedings of EMNLP", year = "2011" } Products of Weighted Logic Programs, Shay B. Cohen, Robert J. Simmons and Noah A. Smith, In Theory and Practice of Logic Programming, 2011 [pdf] [abstract] [bibtex] Weighted logic programming, a generalization of bottom-up logic programming, is a well-suited framework for specifying dynamic programming algorithms. In this setting, proofs correspond to the algorithm's output space, such as a path through a graph or a grammatical derivation, and are given a real-valued score (often interpreted as a probability) that depends on the real weights of the base axioms used in the proof. The desired output is a function over all possible proofs, such as a sum of scores or an optimal score. We describe the PRODUCT transformation, which can merge two weighted logic programs into a new one. The resulting program optimizes a product of proof scores from the original programs, constituting a scoring function known in machine learning as a ``product of experts.'' Through the addition of intuitive constraining side conditions, we show that several important dynamic programming algorithms can be derived by applying PRODUCT to weighted logic programs corresponding to simpler weighted logic programs. @article{cohen-11a, author = "S. B. Cohen and R. J. Simmons and N. A. Smith", title = "Products of Weighted Logic Programs", journal = "Theory and Practice of Logic Programming", year = "2011" } Empirical Risk Minimization with Approximations of Probabilistic Grammars, Shay B. Cohen and Noah A. Smith, In Advances in Neural Information Processing Systems, 2010 [pdf] [appendix - pdf] [abstract] [bibtex] Probabilistic grammars are generative statistical models that are useful for compositional and sequential structures. We present a framework, reminiscent of structural risk minimization, for empirical risk minimization of the parameters of a fixed probabilistic grammar using the log-loss. We derive sample complexity bounds in this framework that apply both to the supervised setting and the unsupervised setting. @inproceedings{cohen-10e, author = "S. B. Cohen and N. A. Smith", title = "Empirical Risk Minimization with Approximations of Probabilistic Grammars", booktitle = "Advances in Neural Information Processing Systems", year = "2010" } Covariance in Unsupervised Learning of Probabilistic Grammars, Shay B. Cohen and Noah A. Smith, In JMLR, 2010 [pdf] [abstract] [bibtex] Probabilistic grammars offer great flexibility in modeling discrete sequential data like natural language text. Their symbolic component is amenable to inspection by humans, while their probabilistic component helps resolve ambiguity. They also permit the use of well-understood, general-purpose learning algorithms. There has been an increased interest in using probabilistic grammars in the Bayesian setting. To date, most of the literature has focused on using a Dirichlet prior. The Dirichlet prior has several limitations, including that it cannot directly model covariance between the probabilistic grammar's parameters. Yet, various grammar parameters are expected to be correlated because the elements in language they represent share linguistic properties. In this paper, we suggest an alternative to the Dirichlet prior, a family of logistic normal distributions. We derive an inference algorithm for this family of distributions and experiment with the task of dependency grammar induction, demonstrating performance improvements with our priors on a set of six treebanks in different natural languages. Our covariance framework permits soft parameter tying within grammars and across grammars for text in different languages, and we show empirical gains in a novel learning setting using bilingual, non-parallel data. @article{cohen-10d, author = "S. B. Cohen and N. A. Smith", title = "Covariance in Unsupervised Learning of Probabilistic Grammars", journal = "Journal of Machine Learning Research", volume = "11", pages = "3017--3051", year = "2010" } Viterbi Training for PCFGs: Hardness Results and Competitiveness of Uniform Initialization, Shay B. Cohen and Noah A. Smith, In ACL 2010 [pdf] [abstract] [bibtex] [slides] We consider the search for a maximum likelihood assignment of hidden derivations and grammar weights for a probabilistic context-free grammar, the problem approximately solved by ``Viterbi training.'' We show that solving and even approximating Viterbi training for PCFGs is NP-hard. We motivate the use of uniform-at-random initialization for Viterbi EM as an optimal initializer in absence of further information about the correct model parameters, providing an approximate bound on the log-likelihood. @inproceedings{cohen-10c, author = "S. B. Cohen and N. A. Smith", title = "Viterbi Training for {PCFGs}: Hardness Results and Competitiveness of Uniform Initialization", booktitle = "Proceedings of {ACL}", year = "2010" } Variational Inference for Adaptor Grammars, Shay B. Cohen, David M. Blei and Noah A. Smith, In NAACL 2010 [pdf] [abstract] [bibtex] [slides] Adaptor grammars extend probabilistic context-free grammars to define prior distributions over trees with ``rich get richer'' dynamics. Inference for adaptor grammars seeks to find parse trees for raw text. This paper describes a variational inference algorithm for adaptor grammars, providing an alternative to Markov chain Monte Carlo methods. To derive this method, we develop a stick-breaking representation of adaptor grammars, a representation that enables us to define adaptor grammars with recursion. We report experimental results on a word segmentation task, showing that variational inference performs comparably to MCMC. Further, we show a significant speed-up when parallelizing the algorithm. Finally, we report promising results for a new application for adaptor grammars, dependency grammar induction. @inproceedings{cohen-10b, author = "S. B. Cohen and D. M. Blei and N. A. Smith", title = "Variational Inference for Adaptor Grammars", booktitle = "Proceedings of {NAACL}", year = "2010" } Variational Inference for Grammar Induction with Prior Knowledge, Shay B. Cohen and Noah A. Smith, In ACL 2009 (short paper track) [pdf] [abstract] [bibtex] Variational EM has become a popular technique in probabilistic NLP with hidden variables. Commonly, for computational tractability, we make strong independence assumptions, such as the mean-field assumption, in approximating posterior distributions over hidden variables. We show how a looser restriction on the approximate posterior, requiring it to be a mixture, can help inject prior knowledge to exploit soft constraints during the variational E-step. @inproceedings{cohen-09b, author = "S. B. Cohen and N. A. Smith", title = "Variational Inference for Grammar Induction with Prior Knowledge", booktitle = "Proceedings of {ACL}", year = "2009" } Shared Logistic Normal Distributions for Soft Parameter Tying in Unsupervised Grammar Induction, Shay B. Cohen and Noah A. Smith, In NAACL 2009 [pdf] [abstract] [bibtex] We present a family of priors over probabilistic grammar weights, called the shared logistic normal distribution. This family extends the partitioned logistic normal distribution, enabling factored covariance between the probabilities of different derivation events in the probabilistic grammar, providing a new way to encode prior knowledge about an unknown grammar. We describe a variational EM algorithm for learning a probabilistic grammar based on this family of priors. We then experiment with unsupervised dependency grammar induction and show significant improvements using our model for both monolingual learning and bilingual learning with a non-parallel, multilingual corpus. @inproceedings{cohen-09a, author = "S. B. Cohen and N. A. Smith", title = "Shared Logistic Normal Distributions for Soft Parameter Tying in Unsupervised Grammar Induction", booktitle = "Proceedings of {NAACL}", year = "2009" } Logistic Normal Priors for Unsupervised Probabilistic Grammar Induction, Shay B. Cohen, Kevin Gimpel and Noah A. Smith, In Advances in Neural Information Processing Systems 2008 [pdf] [code] [abstract] [bibtex] We explore a new Bayesian model for probabilistic grammars, a family of distributions over discrete structures that includes hidden Markov models and probabilistic context-free grammars. Our model extends the correlated topic model framework to probabilistic grammars, exploiting the logistic normal distribution as a prior over the grammar parameters. We derive a variational EM algorithm for that model, and then experiment with the task of unsupervised grammar induction for natural language dependency parsing. We show that our model achieves superior results over previous models that use different priors. @inproceedings{cohen-08b, author = "S. B. Cohen and K. Gimpel and N. A. Smith", title = "Logistic Normal Priors for Unsupervised Probabilistic Grammar Induction", booktitle = "Advances in Neural Information Processing Systems", year = "2009" } Dynamic Programming Algorithms as Products of Weighted Logic Programs, Shay B. Cohen, Robert J. Simmons and Noah A. Smith, In ICLP 2008 (best student paper award) [springer] [journal-version] [abstract] [bibtex] Weighted logic programming, a generalization of bottom-up logic programming, is a successful framework for specifying dynamic programming algorithms. In this setting, proofs correspond to the algorithm's output space, such as a path through a graph or a grammatical derivation, and are given a weighted score, often interpreted as a probability, that depends on the score of the base axioms used in the proof. The desired output is a function over all possible proofs, such as a sum of scores or an optimal score. We describe the PRODUCT transformation, which can merge two weighted logic programs into a new one. The resulting program optimizes a product of proof scores from the original programs, constituting a scoring function known in machine learning as a ``product of experts.'' Through the addition of intuitive constraining side conditions, we show that several important dynamic programming algorithms can be derived by applying PRODUCT to weighted logic programs corresponding to simpler weighted logic programs. @inproceedings{cohen-08a, author = "S. B. Cohen and R. J. Simmons and N. A. Smith", title = "Dynamic Programming Algorithms as Products of Weighted Logic Programs", booktitle = "Proceedings of {ICLP}", year = "2008" } Joint Morphological and Syntactic Disambiguation, Shay B. Cohen and Noah A. Smith, In EMNLP 2007 [pdf] [abstract] [bibtex] In morphologically rich languages, should morphological and syntactic disambiguation be treated sequentially or as a single problem? We describe several efficient, probabilistically-interpretable ways to apply joint inference to morphological and syntactic disambiguation using lattice parsing. Joint inference is shown to compare favorably to pipeline parsing methods across a variety of component models. State-of-the-art performance on Hebrew Treebank parsing is demonstrated using the new method. The benefits of joint inference are modest with the current component models, but appear to increase as components themselves improve. @inproceedings{cohen-07b, author = "S. B. Cohen and N. A. Smith", title = "Joint Morphological and Syntactic Disambiguation", booktitle = "Proceedings of {EMNLP}", year = "2007" } Feature Selection Via Coalitional Game Theory, Shay B. Cohen, Gideon Dror and Eytan Ruppin, In Neural Computation 19:7, 2007 [pdf] [bibtex] @article{cohen-07a, author = "S. B. Cohen and G. Dror and E. Ruppin", title = "Feature Selection via Coalitional Game Theory", journal = "Neural Computation", year = "2007" } Feature Selection Based on the Shapley Value, Shay B. Cohen, Gideon Dror and Eytan Ruppin, In IJCAI 2005 [pdf] [bibtex] @inproceedings{cohen-05, author = "S. B. Cohen and G. Dror and E. Ruppin", title = "Feature Selection Based on the {Shapley} Value", booktitle = "Proceedings of {IJCAI}", year = "2005" }
Dissertation, technical reports and others Computational Learning of Probabilistic Grammars in the Unsupervised Setting, Ph.D. dissertation, Carnegie Mellon University, 2011. Coactive Learning for Interactive Machine Translation, Artem Sokolv, Stefan Riezler, Shay B. Cohen, In Workshop on Machine Learning for Interactive Systems, 2015. Workshop Paper about Hybrid Online Inference with Adaptor Grammars, Ke Zhai, Jordan Boyd-Graber and Shay B. Cohen, 2014. See also TACL paper. Unsupervised Bilingual POS Tagging with Markov Random Fields, Desai Chen, Chris Dyer, Shay B. Cohen and Noah A. Smith, In EMNLP Workshop on Unsupervised Learning in NLP, 2011 [pdf] Social Links from Latent Topics in Microblogs, Kriti Puniyani, Jacob Eisenstein, Shay B. Cohen and Eric P. Xing, In NAACL Workshop on Social Media, 2010 The Shared Logistic Normal Distribution for Grammar Induction, Shay B. Cohen and Noah A. Smith, In Neural Information Processing Systems Workshop on Speech and Language: Unsupervised Latent-Variable Models, 2008 [pdf of NAACL paper] Products of Weighted Logic Programs, Shay B. Cohen, Robert J. Simmons and Noah A. Smith, Technical Report, CMU-LTI-08-009 [pdf of TPLP paper]