Milos Nikolic

MILOS NIKOLIC

Associate Professor (Reader)

School of Informatics, University of Edinburgh

milos.nikolic@ed.ac.uk

My research interest is in databases and large-scale data management systems.

My recent work studies in-database learning, stream processing, incremental computation, and query compilation. I am a member of the Database Group and the Laboratory for Foundations of Computer Science.

I am looking for a PhD student to work on topics related to data management. Feel free to contact me for more information.

bio | Google Scholar

PROJECTS

Adaptive Query Processing based on Degree Information

investigating trade-offs in static and dynamic evaluation of queries

Incremental Maintenance of Complex Data Analytics

maintaining a wide range of analytics using novel factorisation techniques

Declarative Data Pipelines (industry supported)

expressing complex data workflows in a clean and declarative way

TEACHING

Instructor

Advanced Database Systems (Edinburgh, Spring 2025, 2024, 2023, 2022, 2021, w/ Andreas Pieris)
Advanced Databases (Edinburgh, Spring 2020, 2019)
Databases (Oxford, Michaelmas 2017, w/ Tim Furche)
Object Oriented Programming (Oxford, Michaelmas 2016, w/ Joe Pitt-Francis)

Teaching Assistant

Database Systems Implementation (Oxford, Hilary 2018, 2017)
Concurrent Programming (Oxford, Hilary 2018)
Compilers (Oxford, Hilary 2017)

PUBLICATIONS

Conjunctive Queries with Free Access Patterns under Updates, Ahmet Kara, Milos Nikolic, Dan Olteanu, Haozhe Zhang. Logical Methods in Computer Science (LMCS) 2025.
We study the problem of answering conjunctive queries with free access patterns (CQAPs) under updates. A free access pattern is a partition of the free variables of the query into input and output. The query returns tuples over the output variables given a tuple of values over the input variables. We introduce a fully dynamic evaluation approach that works for all CQAPs and is optimal for two classes of CQAPs. This approach recovers prior work on the dynamic evaluation of conjunctive queries without access patterns. We first give a syntactic characterisation of all CQAPs that admit constant time per single-tuple update and whose output tuples can be enumerated with constant delay given a tuple of values over the input variables. We further chart the complexity trade-off between the preprocessing time, update time and enumeration delay for a class of CQAPs. For some of these CQAPs, our approach achieves optimal, albeit non-constant, update time and delay. This optimality is predicated on the Online Matrix-Vector Multiplication conjecture. We finally adapt our approach to the dynamic evaluation of tractable CQAPs over probabilistic databases under updates.

@article{kara2025lmcs,
title={{Conjunctive Queries with Free Access Patterns under Updates}},
author={Kara, Ahmet and Nikolic, Milos and Olteanu, Dan and Zhang, Haozhe},
journal = {{Logical Methods in Computer Science}},
volume={21},
number={2},
pages={1--50},
year={2025},
}
Tractable Conjunctive Queries over Static and Dynamic Relations, Ahmet Kara, Zheng Luo, Milos Nikolic, Dan Olteanu, Haozhe Zhang. ICDT 2025.
We investigate the evaluation of conjunctive queries over static and dynamic relations. While static relations are given as input and do not change, dynamic relations are subject to inserts and deletes.
We characterise syntactically three classes of queries that admit constant update time and constant enumeration delay. We call such queries tractable. Depending on the class, the preprocessing time is linear, polynomial, or exponential (under data complexity, so the query size is constant).
To decide whether a query is tractable, it does not suffice to analyse separately the sub-queries over the static relations and over the dynamic relations, respectively. Instead, we need to take the interaction between the static and the dynamic relations into account. Even when the sub-query over the dynamic relations is not tractable, the overall query can become tractable if the dynamic relations are sufficiently constrained by the static ones.

@inproceedings{kara2025statdyn,
title={{Tractable Conjunctive Queries over Static and Dynamic Relations}},
author={Ahmet Kara and Zheng Luo and Milos Nikolic and Dan Olteanu and Haozhe Zhang},
booktitle = {28th International Conference on Database Theory (ICDT 2025)},
year={2025},
pages = {12:1--12:21},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}
}
In-Database Data Imputation, Massimo Perini, Milos Nikolic. SIGMOD 2024.
Missing data is a widespread problem in many domains, creating challenges in data analysis and decision making. Traditional techniques for dealing with missing data, such as excluding incomplete records or imputing simple estimates (e.g., mean), are computationally efficient but may introduce bias and disrupt variable relationships, leading to inaccurate analyses. Model-based imputation techniques offer a more robust solution that preserves the variability and relationships in the data, but they demand significantly more computation time, limiting their applicability to small datasets.
This work enables efficient, high-quality, and scalable data imputation within a database system using the widely used MICE method. We adapt this method to exploit computation sharing and a ring abstraction for faster model training. To impute both continuous and categorical values, we develop techniques for in-database learning of stochastic linear regression and Gaussian discriminant analysis models. Our MICE implementations in PostgreSQL and DuckDB outperform alternative MICE implementations and model-based imputation techniques by up to two orders of magnitude in terms of computation time, while maintaining high imputation quality.

@article{perini2024pacmmod,
title={{In-Database Data Imputation}},
author={Perini, Massimo and Nikolic, Milos},
journal={Proc. ACM Manag. Data (PACMMOD)},
volume={2},
doi={10.1145/3639326},
year={2024},
publisher={Association for Computing Machinery},
}
F-IVM: Analytics over Relational Databases under Updates, Ahmet Kara, Milos Nikolic, Dan Olteanu, Haozhe Zhang. The VLDB Journal, 2023.
This article describes F-IVM, a unified approach for maintaining analytics over changing relational data. We exemplify its versatility in four disciplines: processing queries with group-by aggregates and joins; learning linear regression models using the covariance matrix of the input features; building Chow-Liu trees using pairwise mutual information of the input features; and matrix chain multiplication.
F-IVM has three main ingredients: higher-order incremental view maintenance; factorized computation; and ring abstraction. F-IVM reduces the maintenance of a task to that of a hierarchy of simple views. Such views are functions mapping keys, which are tuples of input values, to payloads, which are elements from a ring. F-IVM also supports efficient factorized computation over keys, payloads, and updates. Finally, F-IVM treats uniformly seemingly disparate tasks. In the key space, all tasks require joins and variable marginalization. In the payload space, tasks differ in the definition of the sum and product ring operations.
We implemented F-IVM on top of DBToaster and show that it can outperform classical first-order and fully recursive higher-order incremental view maintenance by orders of magnitude while using less memory.

@article{kara2023vldbj,
title={{F-IVM: Analytics over Relational Databases under Updates}},
author={Kara, Ahmet and Nikolic, Milos and Olteanu, Dan and Zhang, Haozhe},
journal={The VLDB Journal},
doi={10.1007/s00778-023-00817-w},
year={2023},
}
Trade-offs in Static and Dynamic Evaluation of Hierarchical Queries, Ahmet Kara, Milos Nikolic, Dan Olteanu, Haozhe Zhang. Logical Methods in Computer Science (LMCS) 2023.
We investigate trade-offs in static and dynamic evaluation of hierarchical queries with arbitrary free variables. In the static setting, the trade-off is between the time to partially compute the query result and the delay needed to enumerate its tuples. In the dynamic setting, we additionally consider the time needed to update the query result under single-tuple inserts or deletes to the database.
Our approach observes the degree of values in the database and uses different computation and maintenance strategies for high-degree (heavy) and low-degree (light) values. For the latter it partially computes the result, while for the former it computes enough information to allow for on-the-fly enumeration.
We define the preprocessing time, the update time, and the enumeration delay as functions of the light/heavy threshold. By appropriately choosing this threshold, our approach recovers a number of prior results when restricted to hierarchical queries.
We show that for a restricted class of hierarchical queries, our approach achieves worst-case optimal update time and enumeration delay conditioned on the Online Matrix-Vector Multiplication Conjecture.

@article{kara2023lmcs,
title={{Trade-offs in Static and Dynamic Evaluation of Hierarchical Queries}},
author={Kara, Ahmet and Nikolic, Milos and Olteanu, Dan and Zhang, Haozhe},
journal = {{Logical Methods in Computer Science}},
volume={19},
number={3},
pages={1--50},
year={2023},
}
Conjunctive Queries with Free Access Patterns under Updates, Ahmet Kara, Milos Nikolic, Dan Olteanu, Haozhe Zhang. ICDT 2023.
We study the problem of answering conjunctive queries with free access patterns under updates. A free access pattern is a partition of the free variables of the query into input and output. The query returns tuples over the output variables given a tuple of values over the input variables.
We introduce a fully dynamic evaluation approach for such queries. We also give a syntactic characterisation of those queries that admit constant time per single-tuple update and whose output tuples can be enumerated with constant delay given an input tuple. Finally, we chart the complexity trade-off between the preprocessing time, update time and enumeration delay for such queries. For a class of queries, our approach achieves optimal, albeit non-constant, update time and delay. Their optimality is predicated on the Online Matrix-Vector Multiplication conjecture. Our results recover prior work on the dynamic evaluation of conjunctive queries without access patterns.

@inproceedings{kara2023icdt,
title={{Conjunctive Queries with Free Access Patterns under Updates}},
author={Kara, Ahmet and Nikolic, Milos and Olteanu, Dan and Zhang, Haozhe},
booktitle={26th International Conference on Database Theory (ICDT 2023)},
year={2023},
organization={Schloss Dagstuhl-Leibniz-Zentrum f{\"u}r Informatik}
}
Evaluation Trade-Offs for Acyclic Conjunctive Queries, Ahmet Kara, Milos Nikolic, Dan Olteanu, Haozhe Zhang. CSL 2023.
We consider the evaluation of acyclic conjunctive queries, where the evaluation time is decomposed into preprocessing time and enumeration delay. In a seminal paper at CSL'07, Bagan, Durand, and Grandjean showed that acyclic queries can be evaluated with linear preprocessing time and linear enumeration delay. If the query is free-connex, the enumeration delay becomes constant. Further prior work showed that constant enumeration delay can be achieved for arbitrary acyclic conjunctive queries at the expense of a preprocessing time that is characterised by the fractional hypertree width.
We introduce an approach that exposes a trade-off between preprocessing time and enumeration delay for acyclic conjunctive queries. The aforementioned prior works represent extremes in this trade-off space. Yet our approach also allows for the enumeration delay and the preprocessing time between these extremes, in particular the delay may lie between constant and linear time.
Our approach decomposes the given query into subqueries and achieves for each subquery a trade-off that depends on a parameter controlling the times for preprocessing and enumeration. The complexity of the query is given by the Pareto optimal points of a bi-objective optimisation program whose inputs are possible query decompositions and parameter values.

@inproceedings{kara2023csl,
title={{Evaluation Trade-Offs for Acyclic Conjunctive Queries}},
author={Kara, Ahmet and Nikolic, Milos and Olteanu, Dan and Zhang, Haozhe},
booktitle={31st EACSL Annual Conference on Computer Science Logic (CSL 2023)},
year={2023},
organization={Schloss Dagstuhl-Leibniz-Zentrum f{\"u}r Informatik}
}
Scalable Analysis of Multi-Modal Biomedical Data, , Jaclyn Smith, Yao Shi, Michael Benedikt, Milos Nikolic. GigaScience journal, 2021.
Targeted diagnosis and treatment options are dependent on insights drawn from multi-modal analysis of large-scale biomedical datasets. Advances in genomics sequencing, image processing, and medical data management have supported data collection and management within medical institutions. These efforts have produced large-scale datasets and have enabled integrative analyses that provide a more thorough look of the impact of a disease on the underlying system. The integration of large-scale biomedical data commonly involves several complex data transformation steps, such as combining datasets to build feature vectors for learning analysis. Thus, scalable data integration solutions play a key role in the future of targeted medicine. Though large-scale data processing frameworks have shown promising performance for many domains, they fail to support scalable processing of complex datatypes. To address these issues and achieve scalable processing of multi-modal biomedical data, we present TraNCE, a framework that automates the difficulties of designing distributed analyses with complex biomedical data types. We outline research and clinical applications for the platform, including data integration support for building feature sets for classification. We show that the system is capable of outperforming the common alternative, based on "flattening" complex data structures, and runs efficiently when alternative approaches are unable to perform at all.

@article{smith2021scalable,
title={Scalable Analysis of Multi-Modal Biomedical Data},
author={Smith, Jaclyn and Shi, Yao and Benedikt, Michael and Nikolic, Milos},
journal={GigaScience},
volume={10},
number={9},
year={2021}
}
Machine Learning over Static and Dynamic Relational Data, Ahmet Kara, Milos Nikolic, Dan Olteanu, Haozhe Zhang. DEBS 2021 tutorial.
This tutorial overviews principles behind recent works on training and maintaining machine learning models over relational data, with an emphasis on the exploitation of the relational data structure to improve the runtime performance of the learning task. The tutorial has the following parts:
- Database research for data science
- Three main ideas to achieve performance improvements:
  - Turn the ML problem into a DB problem
  - Exploit structure of the data and problem
  - Exploit engineering tools of a DB researcher
- Avenues for future research
@inproceedings{kara2021machine,
title={Machine learning over static and dynamic relational data},
author={Kara, Ahmet and Nikolic, Milos and Olteanu, Dan and Zhang, Haozhe},
booktitle={Proceedings of the 15th ACM International Conference on Distributed and Event-based Systems},
pages={160--163},
year={2021}
}
TraNCE: Transforming Nested Collections Efficiently, Jaclyn Smith, Michael Benedikt, Brandon Moore, Milos Nikolic. VLDB 2021 demo.
Nested relational query languages have long been seen as an attractive tool for scenarios involving large hierarchical datasets. In recent years, there has been a resurgence of interest in nested relational languages. One driver has been the affinity of these languages for large-scale processing platforms such as Spark and Flink.
This demonstration gives a tour of TraNCE, a new system for processing nested data on top of distributed processing systems. The core innovation of the system is a compiler that processes nested relational queries in a series of transformations; these include variants of two prior techniques, shredding and unnesting, as well as a materialization transformation that customizes the way levels of the nested output are generated. The TraNCE platform builds on these techniques by adding components for users to create and visualize queries, as well as data exploration and notebook execution targets to facilitate the construction of large-scale data science applications. The demonstration will both showcase the system from the viewpoint of usability by data scientists and illustrate the data management techniques employed.

@article{smith2021trance,
title={TraNCE: Transforming Nested Collections Efficiently},
author={Smith, Jaclyn and Benedikt, Michael and Moore, Brandon and Nikolic, Milos},
journal={Proceedings of the VLDB Endowment (PVLDB)},
volume={14},
number={12},
pages={2727--2730},
year={2021},
publisher={VLDB Endowment}
}
Scalable Querying of Nested Data, Jaclyn Smith, Michael Benedikt, Milos Nikolic, Amir Shaikhha. VLDB 2021.
While large-scale distributed data processing platforms have become an attractive target for query processing, these systems are problematic for applications that deal with nested collections. Programmers are forced either to perform nontrivial translations of collection programs or to employ automated flattening procedures, both of which lead to performance problems. These challenges only worsen for nested collections with skewed cardinalities, where both handcrafted rewriting and automated flattening are unable to enforce load balancing across partitions.
In this work, we propose a framework that translates a program manipulating nested collections into a set of semantically equivalent shredded queries that can be efficiently evaluated. The framework employs a combination of query compilation techniques, an efficient data representation for nested collections, and automated skew-handling. We provide an extensive experimental evaluation, demonstrating significant improvements provided by the framework in diverse scenarios for nested collection programs.

@article{smith2020scalable,
title={Scalable Querying of Nested Data},
author={Smith, Jaclyn and Benedikt, Michael and Nikolic, Milos and Shaikhha, Amir},
journal={Proceedings of the VLDB Endowment (PVLDB)},
volume={14},
number={3},
pages={445--457},
year={2020},
publisher={VLDB Endowment}
}
Maintaining Triangle Queries under Updates, Ahmet Kara, Hung Ngo, Milos Nikolic, Dan Olteanu, Haozhe Zhang. ACM Transactions on Database Systems (TODS), 2020. arXiv 2004.03716
We consider the problem of incrementally maintaining the triangle queries with arbitrary free variables under single-tuple updates to the input relations.
We introduce an approach called IVMε that exhibits a trade-off between the update time, the space, and the delay for the enumeration of the query result, such that the update time ranges from the square root to linear in the database size while the delay ranges from constant to linear time.
IVMε achieves Pareto worst-case optimality in the update-delay space conditioned on the Online Matrix-Vector Multiplication conjecture. It is strongly Pareto optimal for the triangle queries with no or three free variables and weakly Pareto optimal for the remaining triangle queries with one or two free variables.
IVMε recovers prior work such as the suboptimal classical view maintenance approach that uses delta query processing and the worst-case optimal approach that computes all triangles in a static database.

@article{kara2020maintaining,
title={Maintaining triangle queries under updates},
author={Kara, Ahmet and Ngo, Hung Q and Nikolic, Milos and Olteanu, Dan and Zhang, Haozhe},
journal={ACM Transactions on Database Systems (TODS)},
volume={45},
number={3},
pages={1--46},
year={2020},
publisher={ACM New York, NY, USA}
}
F-IVM: Learning over Fast-Evolving Relational Data, Milos Nikolic, Haozhe Zhang, Ahmet Kara, Dan Olteanu. SIGMOD 2020 (demo).
F-IVM is a system for real-time analytics such as machine learning applications over training datasets defined by queries over fast-evolving relational databases. We will demonstrate F-IVM for three such applications: model selection, Chow-Liu trees, and ridge linear regression.

@inproceedings{nikolic2020f,
title={F-IVM: learning over fast-evolving relational data},
author={Nikolic, Milos and Zhang, Haozhe and Kara, Ahmet and Olteanu, Dan},
booktitle={Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data},
pages={2773--2776},
year={2020}
}
Trade-offs in Static and Dynamic Evaluation of Hierarchical Queries, Ahmet Kara, Milos Nikolic, Dan Olteanu, Haozhe Zhang. PODS 2020.
We investigate trade-offs in static and dynamic evaluation of hierarchical queries with arbitrary free variables. In the static setting, the trade-off is between the time to partially compute the query result and the delay needed to enumerate its tuples. In the dynamic setting, we additionally consider the time needed to update the query result under single-tuple inserts or deletes to the database.
Our approach observes the degree of values in the database and uses different computation and maintenance strategies for high-degree (heavy) and low-degree (light) values. For the latter it partially computes the result, while for the former it computes enough information to allow for on-the-fly enumeration.
The main result of this work defines the preprocessing time, the update time, and the enumeration delay as functions of the light/heavy threshold. By conveniently choosing this threshold, our approach recovers a number of prior results when restricted to hierarchical queries.
For a restricted class of hierarchical queries, our approach can achieve worst-case optimal update time and enumeration delay conditioned on the Online Matrix-Vector Multiplication Conjecture.

@inproceedings{kara2020trade,
title={Trade-offs in static and dynamic evaluation of hierarchical queries},
author={Kara, Ahmet and Nikolic, Milos and Olteanu, Dan and Zhang, Haozhe},
booktitle={Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems},
pages={375--392},
year={2020}
}
Counting Triangles under Updates in Worst-Case Optimal Time, Ahmet Kara, Hung Ngo, Milos Nikolic, Dan Olteanu, Haozhe Zhang. ICDT 2019. (Best Paper Award)
extended version arXiv 1804.02780

We consider the problem of incrementally maintaining the triangle count query under single-tuple updates to the input relations. We introduce an approach that exhibits a space-time tradeoff such that the space-time product is quadratic in the size of the input database and the update time can be as low as the square root of this size. This lowest update time is worst-case optimal conditioned on the Online Matrix-Vector Multiplication conjecture. The classical and factorized incremental view maintenance approaches are recovered as special cases of our approach within the space-time tradeoff. In particular, they require linear-time update maintenance, which is suboptimal. Our approach also recovers the worst-case optimal time complexity for computing the triangle count in the non-incremental setting.

@inproceedings{kara2019counting,
title={Counting Triangles under Updates in Worst-Case Optimal Time},
author={Kara, Ahmet and Ngo, Hung Q and Nikolic, Milos and Olteanu, Dan and Zhang, Haozhe},
booktitle={22nd International Conference on Database Theory (ICDT 2019)},
year={2019},
organization={Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik}
}

Counting Triangles under Updates (short paper), Ahmet Kara, Hung Ngo, Milos Nikolic, Dan Olteanu, Haozhe Zhang. Alberto Mendelzon Workshop (AMW) 2018.
Incremental View Maintenance with Triple Lock Factorization Benefits, Milos Nikolic, Dan Olteanu. SIGMOD 2018. (code)
Distributed Incremental View Maintenance, Milos Nikolic. Encyclopedia of Big Data Technologies, Springer, 2018.

Enabling Signal Processing over Data Streams, Milos Nikolic, Badrish Chandramouli, Jonathan Goldstein. SIGMOD 2017. (slides)
Enabling Deep Analytics in Stream Processing Systems, Milos Nikolic, Badrish Chandramouli, Jonathan Goldstein. BICOD 2017.

Efficient Incremental Data Analysis, Milos Nikolic. PhD thesis, 2016.
How to Win a Hot Dog Eating Contest: Distributed Incremental View Maintenance with Batch Updates, Milos Nikolic, Mohammad Dashti, Christoph Koch. SIGMOD 2016. (slides)

LINVIEW: Incremental View Maintenance for Complex Analytical Queries, Milos Nikolic, Mohammed ElSeidy, Christoph Koch. SIGMOD 2014. (slides) (video)
DBToaster: Higher-order Delta Processing for Dynamic, Frequently Fresh Views, Christoph Koch, Yanif Ahmad, Oliver Kennedy, Milos Nikolic, Andres Nötzli, Daniel Lupei, Amir Shaikhha. The VLDB Journal 2014. (extended version)

DBToaster: Higher-order Delta Processing for Dynamic, Frequently Fresh Views, Yanif Ahmad, Oliver Kennedy, Christoph Koch, Milos Nikolic. VLDB 2012. (slides)
Invited to the "Best of VLDB 2012" issue of the VLDB Journal.

Entangled transactions, Nitin Gupta, Milos Nikolic, Sudip Roy, Gabriel Bender, Lucja Kot, Johannes Gehrke, Christoph Koch. VLDB 2011. (slides)