Public datasets for symbolic pattern recognition

  1. TREC (Text REtrieval Conference) Datatsets, including Genomics, Ad hoc, QA, Web, Novelty, Legal, Spam, Terabyte data
  2. KDD Challenge Cup data
    1. Particle Physics Task
    2. Protein Homology Prediction Task
    3. Gene/Protein Binding, Function and Localization Task
    4. The charitable donations dataset
    5. Network Intrusion dataset
  3. Physiological Data Modeling Contest
  4. The 4 Universities Data Set, WWW pages
  5. Reuters-21578, text categorization dataset
  6. The TPTP Problem Library for Automated Theorem Proving, Geoff Sutcliffe and Christian Suttner, University of Miami
  7. CIA World Fact Book in Prolog
  8. UCI Machine Learning Repository Collection of benchmark datasets for regression and classification tasks
  9. CoIL Competition Data
    1. The Insurance Company (TIC) dataset
    2. River chemical concentrations and algae densities
  10. Colon cancer data

