pocker: A docker container to integrate R and Python in CI/CD frameworks

Genesis

I started to use continuous integration with gitlab a few weeks ago and up to a few days was really happy with rocker image (basically docker + R).

I became ambitious and started to write a markdown that was comparing R and Python speed on simple operations. It was working fine on my laptop (anaconda is installed). However, because anaconda is not available in rocker image, markdown compilation naturally failed. I thus started the project to create a docker image that would do the job, i.e. that would integrate Python and R together. The container I propose is not well-suited for Python only repository, its goal is to ease the pass-through between Python and R

Since I am a beginner in docker ecosystem, it has not been an easy path. When I was thinking the solution would be trivial to implement I was planning to make the repository private. However, I think now that the solution produced can help people. I decided to make it public. To make the project as reproducible as possible, I ended up with that complex workflow:

  • github connected to dockerhub to build image base from DockerFile
  • gitlab with continuous integration using /gitlabCI/.simple_configuration.yml example file as a reproducible workflow
  • dockerhub that builds automatically from github repository the docker image

Complex workflow, simple image

This is not the most natural workflow. If you go into project history, you might see that I did not adopt initially that workflow. I adopted it after merging branches from two separated project that were pursuing the same goal. This complex set up presents an advantage for reproducibility: each time project updates are pushed, the code used to build pocker image and the example of use from continuous integration is updated.

I should warn people used to create docker image that I might not have created the most parsimonious image necessary to run R and Python together. I would welcome pull request to improve pocker repository

Some explanations

DockerFile is used to build the image. The main steps are the following:

  • Start from rocker/verse container that avoids re-installing tidyverse each time a CI/CD job is ran.
  • Install python 3 and anaconda
  • Add conda binary directory in path
  • Install reticulate package

In gitlabCI directory, you will find scripts useful for continuous integration related to docker project:

  • complete_configuration.yml: the gitlab CI/CD configuration file I was using before building my own docker image. It starts from rocker/verse and follows the same steps that the Dockerfile that has been presented
  • simple_configuration.yml: gitlab CI/CD configuration I use now that pocker container is built

The other scripts build.R, scripts/* are here to propose tests for the configuration obtained from gitlab CI/CD.

Avatar
Lino Galiana
Data Scientist

I am data scientist in the Department of Economic Studies at the French national statistical institute, Insee. I study Big Data and computational methods related to microeconometric and data science fields.

Related