My picture

Benedek Rozemberczki

Currently I am enjoying the first year of my PhD at the Centre for Doctoral Training in Data Science, University of Edinburgh. I am supervised by Rik Sarkar.

I am mainly interested in graph representation learning, large-scale network science and network analytics. Previously I have worked on computational social science and combinatorial game theory. I studied Economic Policy at Central European University and Applied Economics at Corvinus University of Budapest.



PhD in Data Science, University of Edinburgh, 2017-Now.

MSc by Research in Data Science, University of Edinburgh, 2016-2017.
Grade 80/100, with Distinction.

MA in Economic Policy, Central European University, 2014-2016.
Grade 3.89/4.00, with Distinction. Won the Stanislav Vidovic MA Thesis Award.

BSc in Applied Economics, Corvinus University of Budapest, 2011-2014.
Grade 4.87/5.00, with Distinction.


Data science intern, GE Digital, 2016.
I mainly worked on network analytics, NLP and OCR. I was part of the Predix team.

Teaching assistant, University of Edinburgh, 2017.
I've tutored and marked the following course: Social & Technological Networks.

Teaching assistant, Corvinus University of Budapest, 2012-2014.
I've tutored and/or marked the following courses: Programming for Mathematical Economics; Probability Theory; Calculus; Internatiomal Economics; Macroeconomics.

GEMSEC: Graph Embedding with Self Clustering


Modern graph embedding procedures can efficiently extract features of nodes from graphs with millions of nodes. The features are later used as inputs for downstream predictive tasks. In this paper we propose GEMSEC a graph embedding algorithm which learns a clustering of the nodes simultaneously with computing their features. The procedure places nodes in an abstract feature space where the vertex features minimize the negative log likelihood of preserving sampled vertex neighborhoods, while the nodes are clustered into a fixed number of groups in this space. GEMSEC is a general extension of earlier work in the domain as it is an augmentation of the core optimization problem of sequence based graph embedding procedures and is agnostic of the neighborhood sampling strategy. We show that GEMSEC extracts high quality clusters on real world social networks and is competitive with other community detection algorithms. We demonstrate that the clustering constraint has a positive effect on representation quality and also that our procedure learns to embed and cluster graphs jointly in a robust and scalable manner.

For more details you can have a look at the paper. Code that generates the embeddings can be found here.

Fast Sequence Based Embedding with Diffusion Graphs


A graph embedding is a representation of the vertices of a graph in a low dimensional space, which approximately preserves properties such as distances between nodes. Vertex sequence based embedding procedures use features extracted from linear sequences of vertices to create embeddings using a neural network. In this paper, we propose diffusion graphs as a method to rapidly generate vertex sequences for network embedding. Its computational efficiency is superior to previous methods due to simpler sequence generation, and it produces more accurate results. In experiments, we found that the performance relative to other methods improves with increasing edge density in the graph. In a community detection task, clustering nodes in the embedding space produces better results compared to other sequence based embedding methods.

For more details you can have a look at the paper. Code that generates the embeddings can be found here.


B. Rozemberczki, R. Davies R. Sarkar C. Sutton GEMSEC: Graph Embedding with Self Clustering.
pdf code

B. Rozemberczki, R. Sarkar Fast Sequence Based Embedding with Diffusion Graphs.
pdf code


B. Rozemberczki. Diffusion to Vector - Representation Learning of Graphs. MSc by Research Thesis, Centre for Doctoral Training in Data Science, University of Edinburgh. 2017.
pdf code

B. Rozemberczki. Homophily Rearrangement Algorithms and Similarity Based Diffusion on Networks. MA Thesis, Department of Economics, Central European University. 2016.
pdf slides

B. Rozemberczki. Regression Games and Applications. BSc Thesis, Faculty of Economics, Corvinus University of Budapest. 2014.
pdf slides

ENS Challenges 2016-2017 - Societe General Oil Extraction Prediction Task - 1st/2428.
competition code

ENS Challenges 2016-2017 - Regaind Photo Quality Assessment Task - 1st/2428.
competition code

CDMC 2016 - International Data Mining Competition - 2nd/189.
competition code

Driven Data - Data Mining the Water Table - 3rd/1917.
competition code

Work Address
Room 3.50, Informatics Forum
University of Edinburgh
10 Crichton Street, EH8 9AB
Edinburgh, UK