Rafael-Michael Karampatsis

I am a PhD student in the ILCC, School of Informatics, University of Edinburgh, supervised by Dr. Charles Sutton. I am member of the MAST (Machine learning for the Analysis of Source code Text) and the CUP (Charle's Uncertain People) groups also headed by Dr. Charles Sutton.

My research interests span a broad range of applications of probabilistic methods for machine learning, including software engineering, natural language processing, social media analysis, and music generation

My current research focuses on automatic fault localization and program repair with deep learning. My PhD project tackles the vocabulary and scalability issues of neural language models for source code. It is also concerned with addressing the lack of datasets for real world SStuBs (Simple Stupid Bugs) in source code, which has resulted in the use of synthetic data for evaluating fault localization techniques. SStuBs consist of small semantics bugs which are syntactically correct and thus hard for a developer to manually spot. Lastly, the project's main goal is to build an end-to-end system for automatically repairing SStuBs.

Previously, I was working on applications of machine learning in social media analysis.

Short Bio

Studies

MSc by Research in Data Science, University of Edinburgh.

MSc in Computer Science, Athens University of Economics and Business.

BSc in Informatics, Athens University of Economics and Business.

Professional

Teaching Assistant, University of Edinburgh

Teaching Assistant, Athens University of Economics and Business

Intern, IMC Technologies

During my internship I worked on semantic web and entity disambiguation.

IMC Technologies is the leading technology and consulting company in Greece in Knowledge Management and eParticipation.

cv

Publications

Journals/Conferences

Karampatsis, R.M., Babii, H, Robbes, R., Sutton, C., and Janes, A. (2020). Big Code != Big Vocabulary: Open-Vocabulary Models for Source code.
Conference Link Preprint Code Models
Preprocessed Github Coprora:
Java Corpus C Corpus Python Corpus
Raw Github Coprora:
Java Corpus C Corpus Python Corpus

Karampatsis, R.M., and Sutton, C. (2019). SCELMo: Source Code Embeddings from Language Models.
Preprint Link

Karampatsis, R.M., and Sutton, C. (2019). Maybe Deep Neural Networks are the Best Choice for Modeling Source Code.
Link Code

Karampatsis, R.M., and Sutton, C. (2019). How Often Do Single-Statement Bugs Occur? The ManySStuBs4J Dataset.
Link Code Data

Karampatsis, R.M. (2015). CDTDS: Predicting paraphrases in Twitter via support vector regression. Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015).
Link

Karampatsis, R.M., Pavlopoulos, J., and Malakasiotis, P. (2014). AUEB: Two stage sentiment analysis of social network messages. Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014).
Link

Malakasiotis, P., Karampatsis, R.M., Makrynioti, K., and Pavlopoulos, J. (2013). AUEB: Two stage sentiment analysis. Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013).
Link

Theses & Projects

Karampatsis, R.M. (2015). Translating Natural Language to Code via Tree Transduction. MSc by Research Thesis, Centre for Doctoral Training in Data Science, University of Edinburgh.
pdf

Karampatsis, R.M. (2014). Social Media Sentiment Analysis. MSc in Computer Science Thesis. Athens University of Economics and Business.
pdf

Karampatsis, R.M. (2012). Named entity recognition in Greek texts of social media. BSc in Informatics Thesis. Athens University of Economics and Business.
pdf

Contact