Postdoctoral positions and PhD studentships on the ERC project "Skye: A programming language bridging theory and practice for scientific data curation"

Project: Skye: A programming language bridging theory and practice for scientific data curation
Supervisor:James Cheney
Deadline: June 17, 2016 (postdoctoral position)
Additional positions will be advertised in the next academic year.

We seek strong candidates for a postdoctoral positions on the project "Skye: A programming langauge bridging theory and practice for scientific data curation". The project will be supervised by Dr. James Cheney, Reader in the Laboratory for Foundations of Computer Science, School of Informatics, University of Edinburgh. Funding is provided by a five-year, €1.99M Consolidator Grant from the European Research Council.

Project description

Science is increasingly data-driven. Scientific research funders now routinely mandate open publication of publicly-funded research data. Safely reusing such data currently requires labour-intensive curation. Provenance recording the history and derivation of the data is critical to reaping the benefits and avoiding the pitfalls of data sharing. There are hundreds of curated scientific databases in biomedicine that need fine-grained provenance; one important example is the IUPHAR/BPS Guide to Pharmacology database (GtoPdb), a pharmacological database developed in Edinburgh.

Currently there are no reusable methodologies or practical tools that support provenance for curated databases, forcing each project to start from scratch. Research on provenance for scientific databases is still at an early stage, and prototypes have so far proven challenging to deploy or evaluate in the field. Also, most techniques to date focus on provenance within a single database, but this is only part of the problem: real solutions will have to integrate database provenance with the multiple tiers of web applications, and no-one has begun to address this challenge.

The Skye project will build support for curation into the programming language itself, building on recent research on the Links Web programming language, including advances in language-integrated query, and on provenance and data curation. Links is a strongly-typed language that provides state-of-the-art support for language-integrated query and Web programming. This project will build on Links and other recent language designs for heterogeneous meta-programming to develop a new language, called Skye, that can express modular, reusable curation and provenance techniques. To keep focus on the real needs of scientific databases, Skye will be evaluated in the context of GtoPdb and other scientific database projects. Bridging the gap between curation research and the practices of scientific database curators will catalyse a virtuous cycle that will increase the pace of breakthrough results from data-driven science.

Skye will draw on the best ideas developed in cutting-edge research on language-integrated query, Web programming, and heterogeneous meta-programming. Skye will provide dialects, or first-class client language definitions, along with translations that map programs written in one dialect to another, or (as a special case) that perform source-to-source translation on a single dialect, for optimisation or to add functionality such as provenance- tracking. These translations will be available as libraries that can change the behaviour of already-written applications by rewriting code, so scientific database developers using Skye will be able to reuse these features instead of having to reimplement them from scratch or make wholesale changes to existing applications.

The Skye project will support a group of PhD students and postdoctoral researchers under the leadership of Dr. Cheney to pursue research on programming language design for integrating Web programming and databases, in aid of scientific data management. Topics for research could include:

  • Language design: How can we program with first-class client languages (dialects) and translations flexibly and safely?
  • Expressing and optimising client languages: How can existing client languages be embedded as dialects and translated to efficient client language code?
  • Defining modular curation techniques: How can existing (or new) curation techniques be defined using type-safe translations among dialects?
  • Case studies: What are the benefits and costs of using Skye to develop curated scientific databases?

For additional information about the project background, please consult our recent publications on this subject here and here.

PhD studentships

Two 4-year PhD studentships will be supported by the project. One studentship will cover full fees and a stipend of approximately £14,000 per year for a student of any nationality and the other will cover fees and stipend for a student with UK or EU citizenship. Additional funding may be available for exceptional candidates.

The first PhD studentship (for a 2016 start) has been awarded. Another will be advertised during the 2016-2017 academic year for a start in 2017.

Postdoctoral positions

We are currently recruiting a Research Associate in Programming Languages on this project. The successful candidate will have expertise in foundations of programming languages and functional programming. The project will build on Links, a functional, typed, cross-tier Web programming language with strong support for database programming via language-integrated query. The project will involve practical systems development and evaluation informed by conceptual or foundational research, so an ideal candidate will have the ability to develop new foundational programming language concepts and carry them through to implementation.

The ultimate goal of this project is to design a new general-purpose language suitable for embedding a wide range of domain-specific languages, generalising metaprogramming capabilities found in existing systems to make it possible to define reusable data management techniques needed for the next generation of curated scientific databases. Familiarity with extensibility capabilities such as Template Haskell, Lightweight Modular Staging (Scala), "languages as libraries" (Racket), computation expressions (F#), or with dependently-typed programming (e.g. Agda or Idris) would be especially advantageous for this project.

The successful candidate will be expected to take a leading role in research on cross-tier Web programming, database programming, or data curation in the Skye project. In particular, the postholder will contribute to one or more of the following project tasks:

  • Designing a new programming language providing first-class, modular domain-specific sublanguages (dialects) and typed translations among them.
  • Defining existing language-integrated query, Web programming and other features as modules rather than ad hoc language extensions.
  • Using dialects and translations to implement previously-investigated and new curation techniques for scientific data as reusable modules.
  • Developing and evaluating case studies in the context of existing or new scientific database projects.

The position will involve a mix of prototype development, support of other researchers using the prototype implementation, and independent ideas-led research. You will be expected to work effectively with other researchers to produce prototypes and high quality publications and demonstrations, and contribute to dissemination activities for the project, e.g. participating in project meetings and publishing papers in top conferences and journals.

Background required

Applicants for the postdoctoral position should have, at a minimum, a PhD degree (or be close to completion) in computer science, with a track record of high quality publications, and preferably a strong background in foundations of programming languages or databases. Ideal candidates will have both solid theoretical grounding and experience applying principles in practical systems. Previous research experience concerning provenance or related topics (such as information flow security, program slicing, generative programming or metaprogramming) would be desirable but is not required.

About the position

The postdoctoral position is available for 24 months, starting on or as soon as possible after September 1, 2016.

Prospective applicants are encouraged to contact James Cheney (jcheney@inf.ed.ac.uk) before applying to discuss the position.

Application process and deadlines

A complete application consists of a CV and a 1-3 page research statement summarizing your background, previous research experience, and how they relate to this position.

Applications must be submitted by 5pm GMT on June 17, 2016, through the University of Edinburgh recruitment site:

https://www.vacancies.ed.ac.uk

Reference number: 036203

or directly by following this link:

direct link to the application site

Applicants must apply using the University jobs website above. This requires creating an account, and it is suggested that applicants complete this process well before the deadline. Applications submitted after the 5pm deadline may not be considered.

Interviews will likely be held (either in person or via Skype) in late June.

The Team

These positions will be under the supervision of the Skye project PI, Dr. James Cheney, whose group currently includes two postdoctoral researcher (Dr. Wilmer Ricciotti and Dr. Roly Perera) and three PhD students, all working on topics involving provenance, programming languages, security, and databases. The current team is supported by funding from AFOSR, Microsoft Research, Google, and the European Union. The Skye project will also benefit from strong collaborative links with other world-leading experts in LFCS, particularly in the Edinburgh Database Group and the Programming Languages and Foundations group.

Environment

The University of Edinburgh School of Informatics brings together world-class research groups in theoretical computer science, artificial intelligence and cognitive science. The School led the UK 2014 REF rankings in volume of internationally recognized or internationally excellent research. In 2013, the School of Informatics received an Athena Swan Silver Award, in recognition of its commitment to advancing the careers of women in science, technology, engineering, mathematics and medicine (STEMM) employment in higher education and research. Overall the University of Edinburgh has achieved a Silver Award.

The Laboratory for Foundations of Computer Science (LFCS) established by Burstall, Milner and Plotkin in 1986, is recognized worldwide for groundbreaking research on topics in programming languages, semantics, type theory, proof theory, algorithms and complexity, databases, security, and systems biology. Formal aspects of databases, XML and provenance (Libkin, Fan, Buneman), language-based security (Aspinall, Stark, Gordon), and Web programming languages (Wadler) are active areas of investigation in LFCS complementary to this project.

The Edinburgh Database Group is part of the Laboratory for the Foundations of Computer Science and includes six faculty members, five postdoctoral researchers, and six PhD students. Interests of the group span all aspects of database systems and theory. Topics of current interest include graph databases, XML, data integration, novel approaches to query processing and storage, data provenance, archiving and annotation. Many of these topics are relevant to scientific data management, an area in which Edinburgh has unique strengths.

Programming Languages and Foundations is one of the largest research activities in LFCS, including 15 academic staff, 8 postdoctoral researchers, and 10 current PhD students, working on functional programming, types, verification, semantics, software engineering, language-based security and new programming models. We participate in a thriving PL research community across Scotland, with Scottish Programming Languages Seminars hosted every 3-4 months by PL groups at Glasgow, Strathclyde, Heriot-Watt, St. Andrews, Dundee and Edinburgh.

For more information about study in Edinburgh and the School of Informatics, see these pages:


Last modified: Mon May 9 18:11:36 BST 2016