TL;DR - Can we really get the bots to help us in an online school, or are we just going to create more work for ourselves ?
We are facing a small dilemma in the development of a short technology-heavy course which we are going to run online for the Sci-GaIA project winter school which starts on the 1st of April. The dilemma goes to the heart of why there is so much inertia in our methods - whether those be training, writing documentation, or actually designing new services and tools - whilst we on the other hand continually strive to stay at the cutting edge of e-Science. It’s almost easy to stay up to date with products out there - many of the great tools that we use daily are designed to be easy to use and adopt, but it’s worthwhile remembering that we are also in the game of developing tools and services which we want research communities to adopt. The case in point is the Science Gateway framework. This is a framework for developing web-portals for allowing researchers to conduct their workflows by exposing user interfaces to their respective applications, and which is properly integrated into a federated computing backend. But how appealing, how useful, how functional, how relevant is this approach really, in the real world ? Are research groups really able to build their environments around the Science Gateway concept, with the science gateway ? It’s always good to be skeptical about the greatness of your own ideas and the usefulness of your own tools. And I think the only way we can really be unbiased is by putting our philosophy into action, and testing it in the crucible of independent adoption.
A lab in a school
I hope to be able to use this winterschool as a testing ground for some ideas that I have about how we should be teaching research software engineers to use tools. What exactly we teach them to use - which specific tool is the right one for a particular job - is determined by the scope of the course, but I’m referring rather to how we actually use the tools we’re talking about.
I would like to focus on two things which I consider as fundamental disruptions to the status quo :
These are at the heart of the the philosophy and practice of Continuous Integrationwhich will, for the first time at least in my experience, take centre-stage in a course.
I touched briefly and prematurely perhaps on the benefits of adopting continuous integration tools during the CHAIN-REDS/RECAS Summer School in 2014. This was the last time we’ve actually run a training event on developing science gateways, and I hope a lot has changed in the course of the last two years…
Collaboration and Code from the start
A fundamental difference between the “old” way of doing things and the “new” way of doing things is the expression of everything as code. The idea that the development environment can be expressed as code, and can be created via the execution of that code is a way of re-enforcing the principle :
If something is worth doing, it’s worth keeping
“Keeping” in this case means putting your work into a change-controlled repository and working in a methodical way with this, using the version and change control tools at your disposal. In my opinion, this means “Use Git, duh.”, which has the corollary of “Put your repo on Github”. This automatically provides you with an environment conducive to collaboration. Who knows, chances are that nobody but you, the author of the code, will ever look at - much less use ! - the code that you write, but working from the start in a manner which makes future re-use likely is a very good insurance policy.
Cruise Missiles for Miggies ?
It may seem like a lot of overhead to get started - perhaps a whole day may need to be spent on setting up the various tools necessary for working on a portlet. However, I reject this point of view. In my opinion, getting to work on a project like this without an environment that is conducive to, and actively supports, future use is probably only going to create future headaches. It’s better, in my opinion, to spend time on a supportive environment than to introduce Technical Debt at a later stage.
This is the first time I will be running a course entirely online. We will be using the EdX open courseware platform. We will not have the ability to work side-by-side with the course students and see what they are seeing most of the time, as was the case in the previous face-to-face schools. How are we going to be “on the same page” as the students in this case ? Either we could be on call during the entire duration of the school, over videoconference, and fall back to screen-sharing, and real-time information exchange… Guess how that’s going to go !
Or… we could provide a tool as part of the school which conducted the same kinds of checks which we human “teachers” would do. Except, these checks would be done automatically and consistently, and every time the student requested. It would and allow both sets of humans (the teachers and the students) to look at the same code and resulting artifacts (or errors), even asynchronously. Asynchronicity in this case is important, since time is at a premium and needs to be dedicated when it’s avaiable - which is probably not going to be synchronous between the student and the teacher.
Indeed, this is what we will do, with Jenkins - a dedicated instance of Jenkins will be use to run tests and compilation checks on the portlets developed by the students.
Another piece of shiny which we’ll be bringing to the table during the development of the course is the use of Docker. Docker will be used to provide the students with preconfigured development environments - which have been tested ! - the mission which was previously fulfilled by virtual machine images. This in itself will not make a substantial difference to the students, I think. However, the AUFS filesystem and overlay capabilities of Docker will provide us with a means to make atomic changes to the environment which can be expressed as code we will also be using Docker to reproduce various stages of the course, and compare student work with the reference material more easily. I think that using containers to express differences in the environment in a more atomic way, instead of providing the students with a perfectly configured, pre-prepared virtual machine will not only increase their confidence (since they will have to do some of the work, as tutorials, in getting to these states), but also make the lessons far more transparent and expose holes in their understanding of the various components of the framework. This will help not only them to learn, but also us to teach better.
The Jenkins instance will also be used to run the deployment into the testing environment of the new portlets, to estimate any negative impact on a production environment, should they eventually be deployed. It’s important that all of this extra “infrastructure” is itself reproducible, should we or someone else want to reproduce the course. For this reason, I’ve been working on the Ansible role for the Jenkins installation and configuration. Ansible is very capable when it comes to orchestrating services on new infrastructure, and we’re going to be using it to set up and maintain server and build slaves which will be used by the student projects.
In terms of running the tests, the idea is to run these tests in the containers that have been provided for the school. After the initial material has been covered, and the practical fundamentals have been taught, the development/hacking phase can begin. By running continuous integration on their code in an environment which is as similar as possible the real world portals where these portlets will be deployed, will allow bugs and errors to be caught early.
This will be a learning curve for all involved. Hopefully, we can stick to our philosophy of doing things right even when the temptation is strongest to just get a dirty hack out the way.
Discussion and critical thinking will be very important during the course of the school… either way, it’s going to be fun !