Publications, by date
2011
-
Distributed Inference and Query Processing for RFID Tracking and Monitoring. Zhao Cao, Charles Sutton, Yanlei Diao, Prashant Shenoy.
Proceedings of the VLDB Endowment (PVLDB) 4 (5).
2011.
[ .pdf
| bib
]
@Article{ cao11vldb,
number = {5},
title = {Distributed Inference and Query Processing for RFID Tracking and Monitoring},
month = February,
author = {Zhao Cao and Charles Sutton and Yanlei Diao and Prashant Shenoy},
journal = {Proceedings of the VLDB Endowment (PVLDB)},
volume = {4},
pages = {326-337},
year = {2011}
}
-
An Introduction to Conditional Random Fields. Charles Sutton, Andrew McCallum.
Foundations and Trends in Machine Learning.
To appear.
2011.
[ .pdf
| bib
| abstract
| arXiv
]
Often we wish to predict a large number of variables that depend on each other as well as on other observed variables. Structured prediction methods are essentially a combination of classification and graphical modeling, combining the ability of graphical models to compactly model multivariate data with the ability of classification methods to perform prediction using large sets of input features. This tutorial describes conditional random fields, a popular probabilistic method for structured prediction. CRFs have seen wide application in natural language processing, computer vision, and bioinformatics. We describe methods for inference and parameter estimation for CRFs, including practical issues for implementing large scale CRFs. We do not assume previous knowledge of graphical modeling, so this tutorial is intended to be useful to practitioners in a wide variety of fields.
@Article{ crftut:fnt,
title = {An Introduction to Conditional Random Fields},
author = {Charles Sutton and Andrew McCallum},
journal = {Foundations and Trends in Machine Learning},
year = {2011},
note = {To appear}
}
-
Bayesian Inference in Queueing Networks. Charles Sutton, Michael I. Jordan.
Annals of Applied Statistics 5 (1).
2011.
[ .pdf
| bib
| arXiv
]
@Article{ sutton10qnet,
number = {1},
title = {Bayesian Inference in Queueing Networks},
author = {Charles Sutton and Michael I. Jordan},
journal = {Annals of Applied Statistics},
volume = {5},
pages = {254--282},
year = {2011}
}
-
Quasi-Newton Markov chain Monte Carlo. Yichuan Zhang, Charles Sutton.
In Advances in Neural Information Processing Systems (NIPS). 2011.
[ .pdf
| bib
]
@InProceedings{ zhang11quasi,
title = {Quasi-Newton Markov chain Monte Carlo},
author = {Yichuan Zhang and Charles Sutton},
booktitle = {Advances in Neural Information Processing Systems (NIPS)},
year = {2011}
}
2010
-
Learning and Inference in Queueing Networks. Charles Sutton, Michael I. Jordan.
In Conference on Artificial Intelligence and Statistics (AISTATS). 2010.
(Conference version of the longer paper "Bayesian Inference in Queueing Networks".)
[ .pdf
| bib
]
@InProceedings{ sutton10aistats,
title = {Learning and Inference in Queueing Networks},
author = {Charles Sutton and Michael I. Jordan},
booktitle = {Conference on Artificial Intelligence and Statistics (AISTATS)},
year = {2010}
}
2009
-
Automatic Exploration of Datacenter Performance Regimes. Peter Bodik, Rean Griffith, Charles Sutton, Armando Fox, Michael I. Jordan, David A. Patterson.
In First Workshop on Automated Control for Datacenters and Clouds (ACDC '09). 2009.
[ .pdf
| bib
]
@InProceedings{ bodik09exploration,
title = {Automatic Exploration of Datacenter Performance Regimes},
author = {Peter Bodik and Rean Griffith and Charles Sutton and Armando Fox and Michael I. Jordan and David A. Patterson},
booktitle = {First Workshop on Automated Control for Datacenters and Clouds (ACDC '09)},
year = {2009}
}
-
Statistical Machine Learning Makes Automatic Control Practical for Internet Datacenters. Peter Bodik, Rean Griffith, Charles Sutton, Armando Fox, Michael I. Jordan, David A. Patterson.
In Workshop on Hot Topics in Cloud Computing (HotCloud '09). 2009.
[ .pdf
| bib
]
@InProceedings{ bodik09hotcloud,
title = {Statistical Machine Learning Makes Automatic Control Practical for Internet Datacenters},
author = {Peter Bodik and Rean Griffith and Charles Sutton and Armando Fox and Michael I. Jordan and David A. Patterson},
booktitle = {Workshop on Hot Topics in Cloud Computing (HotCloud '09)},
year = {2009}
}
-
Capturing Data Uncertainty in High-Volume Stream Processing. Yanlei Diao, Boduo Li, Anna Liu, Liping Peng, Charles Sutton, Thanh Tran, Michael Zink.
In Conference on Innovative Data Systems Research (CIDR). 2009.
[ .pdf
| bib
]
@InProceedings{ diao09cidr,
title = {Capturing Data Uncertainty in High-Volume Stream Processing},
author = {Yanlei Diao and Boduo Li and Anna Liu and Liping Peng and Charles Sutton and Thanh Tran and Michael Zink},
booktitle = {Conference on Innovative Data Systems Research (CIDR)},
year = {2009}
}
-
Misleading learners: Co-opting your spam filter. Blaine Nelson, Marco Barreno, Fuching Jack Chi, Anthony D. Joseph, Benjamin I. P. Rubinstein, Udam Saini, Charles Sutton, J. D. Tygar, Kai Xia.
In Jeffrey J. P. Tsai and Philip S. Yu, editors. Machine Learning in Cyber Trust: Security, Privacy, Reliability. Springer. 2009.
[ .pdf
| bib
]
@InCollection{ nelson09spam,
title = {Misleading learners: Co-opting your spam filter},
author = {Blaine Nelson and Marco Barreno and Fuching Jack Chi and Anthony D. Joseph and Benjamin I. P. Rubinstein and Udam Saini and Charles Sutton and J. D. Tygar and Kai Xia},
editor = {Jeffrey J. P. Tsai and Philip S. Yu},
publisher = {Springer},
booktitle = {Machine Learning in Cyber Trust: Security, Privacy, Reliability},
year = {2009}
}
-
Piecewise Training for Structured Prediction. Charles Sutton, Andrew McCallum.
Machine Learning 77 (2--3).
2009.
(Train undirected graphical model by splitting into overlapping parts that are trained independently. Connections to pseudolikelihood and Bethe free energy. Journal version of UAI and ICML papers below.)
[ .pdf
| bib
| abstract
]
A drawback of structured prediction methods is that parameter estimation requires repeated inference, which is intractable for general structures. In this paper, we present an approximate training algorithm called piecewise training that divides the factors into tractable subgraphs, which we call pieces, that are trained independently. Piecewise training can be interpreted as approximating the exact likelihood using belief propagation, and different ways of making this interpretation yield different insights into the method. We also present an extension to piecewise training, called piecewise pseudolikelihood, designed for when variables have large cardinality. On several real-world NLP data sets, piecewise training performs superior to Besag's pseudolikelihood and sometimes comparably to exact maximum likelihood. In addition, PWPL performs similarly to piecewise and superior to standard pseudolikelihood, but is five to ten times more computationally efficient than batch maximum likelihood training.
@Article{ sutton08piecewise,
number = {2--3},
title = {Piecewise Training for Structured Prediction},
author = {Charles Sutton and Andrew McCallum},
journal = {Machine Learning},
volume = {77},
pages = {165--194},
year = {2009}
}
-
Probabilistic Inference over RFID Streams in Mobile Environments. Thanh Tran, Charles Sutton, Richard Cocci, Yanming Nie, Yanlei Diao, Prashant Shenoy.
In International Conference on Data Engineering (ICDE). 2009.
[ .pdf
| bib
]
@InProceedings{ tran09rfid,
title = {Probabilistic Inference over RFID Streams in Mobile Environments},
author = {Thanh Tran and Charles Sutton and Richard Cocci and Yanming Nie and Yanlei Diao and Prashant Shenoy},
booktitle = {International Conference on Data Engineering (ICDE)},
year = {2009}
}
2008
-
Unsupervised Deduplication using Cross-field Dependencies. Robert Hall, Charles Sutton, Andrew McCallum.
In Conference on Knowledge Discovery and Data Mining (KDD). 2008.
(Hierarchical DP model that jointly clusters citation venue strings based on both string-edit distance and title information.)
[ .pdf
| bib
| abstract
]
Recent work in deduplication has shown that collective deduplication of different attribute types can improve performance. But although these techniques cluster the attributes collectively, they do not model them collectively. For example, in citations in the research literature, canonical venue strings and title strings are dependent---because venues tend to focus on a few research areas---but this dependence is not modeled by current unsupervised techniques. We call this dependence between fields in a record a cross-field dependence. In this paper, we present an unsupervised generative model for the deduplication problem that explicitly models cross-field dependence. Our model uses a single set of latent variables to control two disparate clustering models: a Dirichlet-multinomial model over titles, and a non-exchangeable string-edit model over venues. We show that modeling cross-field dependence yields a substantial improvement in performance---a 58% reduction in error over a standard Dirichlet process mixture.
@InProceedings{ hall08unsupervised,
title = {Unsupervised Deduplication using Cross-field Dependencies},
author = {Robert Hall and Charles Sutton and Andrew McCallum},
booktitle = {Conference on Knowledge Discovery and Data Mining (KDD)},
year = {2008}
}
-
Exploiting Machine Learning to Subvert your Spam Filter. Blaine Nelson, Marco Barreno, Fuching Jack Chi, Anthony~D. Joseph, Benjamin~I.~P. Rubinstein, Udam Saini, Charles Sutton, J.~D. Tygar, Kai Xia.
In Proceedings of the First USENIX Workshop on Large-Scale Exploits and Emergent Threats (LEET). 2008.
(Send crafted email to a spam filter to cause it to misclassify your normal email as spam. Initial experiments on defenses to this attack.)
[ .pdf
| bib
]
@InProceedings{ nelson08spam,
title = {Exploiting Machine Learning to Subvert your Spam Filter},
author = {Blaine Nelson and Marco Barreno and Fuching Jack Chi and Anthony~D. Joseph and Benjamin~I.~P. Rubinstein and Udam Saini and Charles Sutton and J.~D. Tygar and Kai Xia},
booktitle = {Proceedings of the First USENIX Workshop on Large-Scale Exploits and Emergent Threats (LEET)},
year = {2008}
}
-
Probabilistic Inference in Queueing Networks. Charles Sutton, Michael I. Jordan.
In Workshop on Tackling Computer Systems Problems with Machine Learning Techniques (SysML). 2008.
[ .pdf
| bib
]
@InProceedings{ sutton08mm1,
title = {Probabilistic Inference in Queueing Networks},
author = {Charles Sutton and Michael I. Jordan},
booktitle = {Workshop on Tackling Computer Systems Problems with Machine Learning Techniques (SysML)},
year = {2008}
}
-
Probabilistic inference in queueing networks. Charles Sutton, Michael I. Jordan.
In Workshop on Tackling Computer Systems Problems with Machine Learning Techniques (SYSML). 2008.
[ .pdf
| bib
]
@InProceedings{ sutton08qnet,
title = {Probabilistic inference in queueing networks},
author = {Charles Sutton and Michael I. Jordan},
booktitle = {Workshop on Tackling Computer Systems Problems with Machine Learning Techniques (SYSML)},
year = {2008}
}
-
Bayesian Modeling of Dependency Trees Using Hierarchical Pitman-Yor Priors. Hanna Wallach, Charles Sutton, Andrew McCallum.
In ICML Workshop on Prior Knowledge for Text and Language Processing. 2008.
(Two Bayesian dependency parsing models: 1. Model with Pitman-Yor prior that significantly improves Eisner's classic model; 2. Latent-variable model that learns "syntactic" topics.)
[ .pdf
| bib
]
@InProceedings{ wallach08bayesian,
title = {Bayesian Modeling of Dependency Trees Using Hierarchical Pitman-Yor Priors},
author = {Hanna Wallach and Charles Sutton and Andrew McCallum},
booktitle = {ICML Workshop on Prior Knowledge for Text and Language Processing},
year = {2008}
}
2007
-
Response-Time Modeling for Resource Allocation and Energy-Informed SLAs. Peter Bodik, Charles Sutton, Armando Fox, David Patterson, Michael I. Jordan.
In NIPS Workshop on Statistical Learning Techniques for Solving Systems Problems (MLSys 07). 2007.
(Quantile regression (both parametric and non-) for predicting the performance of a web service as a function of workload and power consumption. Much better for voltage control than built-in frequency scaling.)
[ .pdf
| bib
]
@InProceedings{ bodik07response-time,
title = {Response-Time Modeling for Resource Allocation and Energy-Informed SLAs},
author = {Peter Bodik and Charles Sutton and Armando Fox and David Patterson and Michael I. Jordan},
booktitle = {NIPS Workshop on Statistical Learning Techniques for Solving Systems Problems (MLSys 07)},
year = {2007}
}
-
Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data. Charles Sutton, Andrew McCallum, Khashayar Rohanimanesh.
Journal of Machine Learning Research 8.
2007.
(Combination of dynamic Bayesian networks and conditional random fields. Also considers latent-variable model and cascaded training. Journal version of ICML and EMNLP papers below.)
[ .pdf
| bib
| abstract
]
In sequence modeling, we often wish to represent complex interaction between labels, such as when performing multiple, cascaded labeling tasks on the same sequence, or when long-range dependencies exist. We present dynamic conditional random fields (DCRFs), a generalization of linear-chain conditional random fields (CRFs) in which each time slice contains a set of state variables and edges---a distributed state representation as in dynamic Bayesian networks (DBNs)---and parameters are tied across slices. Since exact inference can be intractable in such models, we perform approximate inference using several schedules for belief propagation, including tree-based reparameterization (TRP). On a natural-language chunking task, we show that a DCRF performs better than a series of linear-chain CRFs, achieving comparable performance using only half the training data. In addition to maximum conditional likelihood, we present two alternative approaches for training DCRFs: marginal likelihood training, for when we are primarily interested in predicting only a subset of the variables, and cascaded training, for when we have a distinct data set for each state variable, as in transfer learning. We evaluate marginal training and cascaded training on both synthetic data and real-world text data, finding that marginal training can improve accuracy when uncertainty exists over the latent variables, and that for transfer learning, a DCRF trained in a cascaded fashion performs better than a linear-chain CRF that predicts the final task directly.
@Article{ sutton07dcrf,
title = {Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data},
month = March,
author = {Charles Sutton and Andrew McCallum and Khashayar Rohanimanesh},
journal = {Journal of Machine Learning Research},
volume = {8},
pages = {693--723},
year = {2007}
}
-
An Introduction to Conditional Random Fields for Relational Learning. Charles Sutton, Andrew McCallum.
In Lise Getoor and Ben Taskar, editors. Introduction to Statistical Relational Learning. MIT Press. 2007.
(Detailed tutorial on conditional random fields. Includes motivation, background, mathematical foundations, linear-chain form, general-structure form, inference, parameter estimation, and tips and tricks. NOTE: In Equation (1.22), there is a small error. There should not be a summation over k in the final term, just lambda_k / sigma_2.)
[ .pdf
| bib
]
@InCollection{ sutton07introduction,
title = {An Introduction to Conditional Random Fields for Relational Learning},
author = {Charles Sutton and Andrew McCallum},
editor = {Lise Getoor and Ben Taskar},
publisher = {MIT Press},
booktitle = {Introduction to Statistical Relational Learning},
year = {2007}
}
-
Piecewise Pseudolikelihood for Efficient CRF Training. Charles Sutton, Andrew McCallum.
In International Conference on Machine Learning (ICML). 2007.
(Train a large CRF in five times faster by dividing it into separate pieces and reducing numbers of predicted variable combinations with pseudolikelihood. Analysis in terms of belief propagation and Bethe energy.)
[ .pdf
| bib
| abstract
]
Discriminative training of graphical models can be expensive if the variables have large cardinality, even if the graphical structure is tractable. In such cases, pseudolikelihood is an attractive alternative, because its running time is linear in the variable cardinality, but on some data its accuracy can be poor. Piecewise training (Sutton & McCallum, 2005) can have better accuracy but does not scale as well in the variable cardinality. In this paper, we introduce piecewise pseudolikelihood, which retains the computational efficiency of pseudolikelihood but can have much better accuracy. On several benchmark NLP data sets, piecewise pseudolikelihood has better accuracy than standard pseudolikelihood, and in many cases nearly equivalent to maximum likelihood, with five to ten times less training time than batch CRF training.
@InProceedings{ sutton07pwpl,
title = {Piecewise Pseudolikelihood for Efficient CRF Training},
author = {Charles Sutton and Andrew McCallum},
booktitle = {International Conference on Machine Learning (ICML)},
year = {2007}
}
-
Improved Dynamic Schedules for Belief Propagation. Charles Sutton, Andrew McCallum.
In Conference on Uncertainty in Artificial Intelligence (UAI). 2007.
(Significantly faster version of loopy BP by selecting which messages to send based on an approximation to their residual.)
[ .pdf
| bib
| abstract
]
Belief propagation and its variants are popular methods for approximate inference, but their running time and even their convergence depend greatly on the schedule used to send the messages. Recently, dynamic update schedules have been shown to converge much faster on hard networks than static schedules, namely the residual BP schedule of Elidan et al. [2006]. But that RBP algorithm wastes message updates: many messages are computed solely to determine their priority, and are never actually performed. In this paper, we show that estimating the residual, rather than calculating it directly, leads to significant decreases in the number of messages required for convergence, and in the total running time. The residual is estimated using an upper bound based on recent work on message errors in BP. On both synthetic and real-world networks, this dramatically decreases the running time of BP, in some cases by a factor of five, without affecting the quality of the solution.
@InProceedings{ sutton07rbp0,
title = {Improved Dynamic Schedules for Belief Propagation},
author = {Charles Sutton and Andrew McCallum},
booktitle = {Conference on Uncertainty in Artificial Intelligence (UAI)},
year = {2007}
}
2006
-
Sparse Forward-Backward using Minimum Divergence Beams for Fast Training of Conditional Random Fields. Chris Pal, Charles Sutton, Andrew McCallum.
In International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 2006.
(New criterion for adaptive beam size within forward-backward, suggested by a variational perspective. Works well within CRF training.)
[ .pdf
| bib
| abstract
]
Hidden Markov models and linear-chain conditional random fields (CRFs) are applicable to many tasks in spoken language processing. In large state spaces, however, training can be expensive, because it often requires many iterations of forward-backward. Beam search is a standard heuristic for controlling complexity during Viterbi decoding, but during forward-backward, standard beam heuristics can be dangerous, as they can make training unstable. We introduce sparse forward-backward, a variational perspective on beam methods that uses an approximating mixture of Kronecker delta functions. This motivates a novel minimum-divergence beam criterion based on minimizing KL divergence between the respective marginal distributions. Our beam selection approach is not only more efficient for Viterbi decoding, but also more stable within sparse forward-backward training. For a standard text-to-speech problem, we reduce CRF training time fourfold---from over a day to six hours---with no loss in accuracy.
@InProceedings{ pal06sparse,
title = {Sparse Forward-Backward using Minimum Divergence Beams for Fast Training of Conditional Random Fields},
author = {Chris Pal and Charles Sutton and Andrew McCallum},
booktitle = {International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
year = {2006}
}
-
Reducing Weight Undertraining in Structured Discriminative Learning. Charles Sutton, Michael Sindelar, Andrew McCallum.
In Conference on Human Language Technology and North American Association for Computational Linguistics (HLT-NAACL). 2006.
(Trains multiple linear-chain CRFs with different subsets of features, in order to force dependent sets of features to be able to separately model the class label.)
(This is the published version. An early version had an error in Section 4, under Per-Sequence Mixtures.)
[ .pdf
| bib
| abstract
]
Discriminative probabilistic models are very popular in NLP because of the latitude they afford in designing features. But training involves complex trade-offs among weights, which can be dangerous: a few highly-indicative features can swamp the contribution of many individually weaker features, causing their weights to be undertrained. Such a model is less robust, for the highly-indicative features may be noisy or missing in the test data. To ameliorate this weight undertraining, we introduce several new feature bagging methods, in which separate models are trained on subsets of the original features, and combined using a mixture model or a product of experts. These methods include the logarithmic opinion pools used by Smith et al. (2005). We evaluate feature bagging on linear-chain conditional random fields for two natural-language tasks. On both tasks, the feature-bagged CRF performs better than simply training a single CRF on all the features.
@InProceedings{ sutton06reducing,
title = {Reducing Weight Undertraining in Structured Discriminative Learning},
author = {Charles Sutton and Michael Sindelar and Andrew McCallum},
booktitle = {Conference on Human Language Technology and North American Association for Computational Linguistics (HLT-NAACL)},
year = {2006}
}
2005
-
Joint Parsing and Semantic Role Labeling. Charles Sutton, Andrew McCallum.
In Conference on Natural Language Learning (CoNLL). 2005.
[ .pdf
| bib
]
@InProceedings{ sutton05joint,
title = {Joint Parsing and Semantic Role Labeling},
author = {Charles Sutton and Andrew McCallum},
pages = {225--228},
booktitle = {Conference on Natural Language Learning (CoNLL)},
year = {2005}
}
-
Piecewise Training of Undirected Models. Charles Sutton, Andrew McCallum.
In Conference on Uncertainty in Artificial Intelligence (UAI). 2005.
(Train large CRF by dividing into pieces and training independently. The explanation in this paper for why it works is somewhat unsatisfying. Consult journal version (2008) for a better story.)
[ .pdf
| bib
| abstract
]
For many large undirected models that arise in real-world applications, exact maximumlikelihood training is intractable, because it requires computing marginal distributions of the model. Conditional training is even more difficult, because the partition function depends not only on the parameters, but also on the observed input, requiring repeated inference over each training example. An appealing idea for such models is to independently train a local undirected classifier over each clique, afterwards combining the learned weights into a single global model. In this paper, we show that this piecewise method can be justified as minimizing a new family of upper bounds on the log partition function. Our bounds are derived from the tree-reweighted upper bounds of Wainwright, Jaakkola, and Willsky, where the component subgraphs are restricted to disjoint pieces of the model. The choice of disjoint subgraphs is especially suited to conditional training because it avoids the usual need to invoke a messagepassing algorithm many times during training. On three natural-language data sets, piecewise training is more accurate than pseudolikelihood, and often performs comparably to global training using belief propagation.
@InProceedings{ sutton05piecewise,
title = {Piecewise Training of Undirected Models},
author = {Charles Sutton and Andrew McCallum},
booktitle = {Conference on Uncertainty in Artificial Intelligence (UAI)},
year = {2005}
}
-
Composition of Conditional Random Fields for Transfer Learning. Charles Sutton, Andrew McCallum.
In Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT-EMNLP). 2005.
[ .pdf
| bib
]
@InProceedings{ sutton05transfer,
title = {Composition of Conditional Random Fields for Transfer Learning},
author = {Charles Sutton and Andrew McCallum},
booktitle = {Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT-EMNLP)},
year = {2005}
}
-
Learning in Markov Random Fields with Contrastive Free Energies. Max Welling, Charles Sutton.
In Conference on Artificial Intelligence and Statistics (AISTATS). 2005.
[ .pdf
| bib
| abstract
]
Learning Markov random field (MRF) models is notoriously hard due to the presence of a global normalization factor. In this paper we present a new framework for learning MRF models based on the contrastive free energy (CF) objective function. In this scheme the parameters are updated in an attempt to match the average statistics of the data distribution and a distribution which is (partially or approximately) "relaxed" to the equilibrium distribution. We show that maximum likelihood, mean field, contrastive divergence and pseudo-likelihood objectives can be understood in this paradigm. Moreover, we propose and study a new learning algorithm: the "kstep Kikuchi/Bethe approximation". This algorithm is then tested on a conditional random field model with "skip-chain" edges to model long range interactions in text data. It is demonstrated that with no loss in accuracy, the training time is brought down from 19 hours (BP based learning) to 83 minutes, an order of magnitude improvement.
@InProceedings{ welling05cf,
title = {Learning in Markov Random Fields with Contrastive Free Energies},
author = {Max Welling and Charles Sutton},
booktitle = {Conference on Artificial Intelligence and Statistics (AISTATS)},
year = {2005}
}
2004
-
Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data. Charles Sutton, Khashayar Rohanimanesh, Andrew McCallum.
In International Conference on Machine Learning (ICML). 2004.
(Combination of dynamic Bayesian networks and conditional random fields, with experiments in noun-phrase chunking.)
[ .pdf
| bib
| abstract
]
In sequence modeling, we often wish to represent complex interaction between labels, such as when performing multiple, cascaded labeling tasks on the same sequence, or when long-range dependencies exist. We present dynamic conditional random fields (DCRFs), a generalization of linear-chain conditional random fields (CRFs) in which each time slice contains a set of state variables and edges---a distributed state representation as in dynamic Bayesian networks (DBNs)---and parameters are tied across slices. Since exact inference can be intractable in such models, we perform approximate inference using several schedules for belief propagation, including tree-based reparameterization (TRP). On a natural-language chunking task, we show that a DCRF performs better than a series of linear-chain CRFs, achieving comparable performance using only half the training data.
@InProceedings{ sutton04dcrf,
title = {Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data},
author = {Charles Sutton and Khashayar Rohanimanesh and Andrew McCallum},
booktitle = {International Conference on Machine Learning (ICML)},
year = {2004}
}
-
Conditional probabilistic context-free grammars. Charles Sutton.
Synthesis project (Required for Ph.D. candidacy), University of Massachusetts, 2004.
[ .pdf
| bib
]
@MastersThesis{ sutton04masters,
school = {University of Massachusetts},
title = {Conditional probabilistic context-free grammars},
author = {Charles Sutton},
year = {2004}
}
2003
-
Information Theory and Representation in Associative Word Learning. Brendan Burns, Charles Sutton, Clayton Morrison, Paul R. Cohen.
In Third International Workshop on Epigenetic Robotics. 2003.
[ .pdf
| bib
]
@InProceedings{ burns03epirob,
title = {Information Theory and Representation in Associative Word Learning},
author = {Brendan Burns and Charles Sutton and Clayton Morrison and Paul R. Cohen},
publisher = {Lund University Cognitive Studies, Volume 101},
booktitle = {Third International Workshop on Epigenetic Robotics},
year = {2003}
}
-
Very Predictive N-grams for Space-Limited Probabilistic Models. Paul R. Cohen, Charles Sutton.
In International Symposium on Intelligent Data Analysis. 2003.
[ .pdf
| bib
| abstract
]
In sequential prediction tasks, one repeatedly tries to predict the next element in a sequence. A classical way to solve these problems is to fit an order-n Markov model to the data, but fixed-order models are often bigger than they need to be. In a fixed-order model, all predictors are of length n, even if a shorter predictor would work just as well. We present a greedy algorithm, vpr, for finding variable-length predictive rules. Although vpr is not optimal, we show that on English text, it performs similarly to fixed-order models but uses fewer parameters.
@InProceedings{ cohen03ngrams,
title = {Very Predictive N-grams for Space-Limited Probabilistic Models},
author = {Paul R. Cohen and Charles Sutton},
publisher = {Springer-Verlag},
booktitle = {International Symposium on Intelligent Data Analysis},
year = {2003}
}
-
Dynamic Conditional Random Fields for Jointly Labeling Multiple Sequences. Andrew McCallum, Khashayar Rohanimanesh, Charles Sutton.
In NIPS Workshop on Syntax, Semantics, and Statistics. 2003.
[ .pdf
| bib
| abstract
]
Conditional random fields (CRFs) for sequence modeling have several advantages over joint models such as HMMs, including the ability to relax strong independence assumptions made in those models, and the ability to incorporate arbitrary overlapping features. Previous work has focused on linear-chain CRFs, which correspond to finite-state machines, and have efficient exact inference algorithms. Often, however, we wish to label sequence data in multiple interacting ways---for example, performing part-of-speech tagging and noun phrase segmentation simultaneously, increasing joint accuracy by sharing information between them. We present dynamic conditional random fields (DCRFs), which are CRFs in which each time slice has a set of state variables and edges---a distributed state representation as in dynamic Bayesian networks---and parameters are tied across slices. (They could also be called conditionally trained Dynamic Markov Networks.) Since exact inference can be intractable in these models, we perform approximate inference using the tree-based reparameterization framework (TRP). We also present empirical results comparing DCRFs with linear-chain CRFs on natural-language data.
@InProceedings{ mccallum03dcrf,
title = {Dynamic Conditional Random Fields for Jointly Labeling Multiple Sequences},
month = December,
author = {Andrew McCallum and Khashayar Rohanimanesh and Charles Sutton},
booktitle = {NIPS Workshop on Syntax, Semantics, and Statistics},
year = {2003}
}
-
Guided Incremental Construction of Belief Networks. Charles Sutton, Brendan Burns, Clayton Morrison, Paul R. Cohen.
In International Symposium on Intelligent Data Analysis. 2003.
[ .pdf
| bib
| abstract
]
Because uncertain reasoning is often intractable, it is hard to reason with a large amount of knowledge. One solution to this problem is to specify a set of possible models, some simple and some complex, and choose which to use based on the problem. We present an architecture for interpreting temporal data, called AIID, that incrementally constructs belief networks based on data that arrives asynchronously. It synthesizes the opportunistic control of the blackboard architecture with recent work on constructing belief networks from fragments. We have implemented this architecture in the domain of military analysis.
@InProceedings{ sutton03aiid,
title = {Guided Incremental Construction of Belief Networks},
author = {Charles Sutton and Brendan Burns and Clayton Morrison and Paul R. Cohen},
publisher = {Springer-Verlag},
booktitle = {International Symposium on Intelligent Data Analysis},
year = {2003}
}
2002
-
Learning Effects of Robot Actions Using Temporal Associations. Paul R. Cohen, Charles Sutton, Brendan Burns.
In International Conference on Development and Learning (ICDL). 2002.
[ .pdf
| bib
| abstract
]
Agents need to know the effects of their actions. Strong associations between actions and effects can be found by counting how often they co-occur.We present an algorithm that learns temporal patterns expressed as emphfluents, propositions with temporal extent. The fluent-learning algorithm is hierarchical and unsupervised. It works by maintaining co-occurrence statistics on pairs of fluents. In experiments on a mobile robot, the fluent-learning algorithm found temporal associations that correspond to effects of the robot's actions.
@InProceedings{ cohen:fluents-action,
title = {Learning Effects of Robot Actions Using Temporal Associations},
author = {Paul R. Cohen and Charles Sutton and Brendan Burns},
booktitle = {International Conference on Development and Learning (ICDL)},
year = {2002}
}
-
Computers and Octi: Report from the 2001 Tournament. Charles Sutton.
ICGA Journal 25 (2).
2002.
[ .pdf
| bib
| abstract
]
Computers are strong players of many strategy games, but in some games, namely Go, success has been more elusive. Thus, it may be fruitful to explore games with intermediate complexity. Octi is such a game. I describe the game of Octi, report results from the 2001 Computer Octi Tournament, and explain why Octi may be a useful domain for research in game AI.
@Article{ sutton01octi,
number = {2},
title = {Computers and Octi: Report from the 2001 Tournament},
month = June,
author = {Charles Sutton},
journal = {ICGA Journal},
volume = {25},
pages = {105--112},
year = {2002}
}
Dissertation
-
Efficient Training Methods for Conditional Random Fields. Charles Sutton.
Ph.D. Dissertation, University of Massachusetts, 2008.
[ .pdf
| bib
]
@PhDThesis{ sutton:thesis,
school = {University of Massachusetts},
title = {Efficient Training Methods for Conditional Random Fields},
author = {Charles Sutton},
year = {2008}
}
Workshop Presentations
-
Piecewise Training with Parameter Independence Diagrams: Comparing Globally- and Locally-trained Linear-chain CRFs. Andrew McCallum, Charles Sutton.
In NIPS Workshop on Learning with Structured Outputs. 2004.
[ .pdf
| bib
| abstract
]
We present a diagrammatic formalism and practial methods for introducing additional independence assumptions into parameter estimation, enabling efficient training of undirected graphical models in locally-normalized pieces. On two real-world data sets we demonstrate our locally-trained linear-chain CRFs outperforming traditional CRFs, training in less than one-fifth the time, and providing a statisticallysignificant gain in accuracy.
@InProceedings{ mccallum04piecewise,
title = {Piecewise Training with Parameter Independence Diagrams: Comparing Globally- and Locally-trained Linear-chain CRFs},
author = {Andrew McCallum and Charles Sutton},
booktitle = {NIPS Workshop on Learning with Structured Outputs},
year = {2004}
}
-
Collective Segmentation and Labeling of Distant Entities in Information Extraction. Charles Sutton, Andrew McCallum.
In ICML Workshop on Statistical Relational Learning and Its Connections to Other Fields. 2004.
[ .pdf
| bib
]
@InProceedings{ sutton04skip,
title = {Collective Segmentation and Labeling of Distant Entities in Information Extraction},
author = {Charles Sutton and Andrew McCallum},
booktitle = {ICML Workshop on Statistical Relational Learning and Its Connections to Other Fields},
year = {2004}
}
Technical Reports
-
Local Training and Belief Propagation. Charles Sutton, Tom Minka.
Microsoft Research Technical Report, TR-2006-121, 2006.
[ .pdf
| bib
]
@TechReport{ sutton06local,
number = {TR-2006-121},
title = {Local Training and Belief Propagation},
institution = {Microsoft Research},
author = {Charles Sutton and Tom Minka},
year = {2006}
}
-
Fast, Piecewise Training for Discriminative Finite-state and Parsing Models. Charles Sutton, Andrew McCallum.
Center for Intelligent Information Retrieval Technical Report, IR-403, 2005.
[ .pdf
| bib
]
@TechReport{ sutton05fast,
number = {IR-403},
title = {Fast, Piecewise Training for Discriminative Finite-state and Parsing Models},
institution = {Center for Intelligent Information Retrieval},
author = {Charles Sutton and Andrew McCallum},
year = {2005}
}
This page was automatically generated from my BibTeX file using a Ruby script and ERb HTML template.
This process was far more pleasant than my previous one, which consisted of some XSLT, some Python
and OCaml programs from the net, duct tape, and spackle. The code to generate this page
is available
(requires bibtool).