Sunday, March 30, 2014

cron as a file system

I read The Styx Architecture for Distributed Systems over a decade ago. The central idea of the paper is that "representing a computing resource as a form of file system, [makes] many of the difficulties of making that resource available across the network disappear". By resource they mean any resource. For example in Plan9 window system 8½ windows and even the mouse are implemented as files; in Inferno and Plan9 the interface to the TCP/IP network is presented as a file system hierarchy. The idea is elegant and practical and the paper is a must read. Unfortunately, for a very long time the only way to actually experiment with its ideas was to install Plan9 or Inferno.

Happily, things have changed in over the last few years. Several former Bell Labs and/or Plan9 folk implemented 9p as a Linux kernel module, v9fs, and got it integrated into the 3.x Linux kernels. As a result 9p is available out of the box in modern Linux distros - at least it's there on Ubuntu 12.04 and later and the latest Fedora releases. Having it in the kernel is fine but to actually make use of it you need user space libraries. Intrepid developers have been busy implementing the necessary libraries in languages from C to Haskell. My personal favorite is go9p. In what follows I'll use it to implement a simple cron service as a 9p file system.

When designing a system service or application using 9p we begin by designing a suitable name space. Often there are two parts to such a name space, the static part created when the application or service starts and the dynamic part that gets filled in as users interact with the system. In our cron service the static part of the name space will consist of a clone file that is used to create new jobs and a jobs directory that individual jobs will live under:

.../clone
.../jobs/

To create a new job the user opens the clone file and writes a job definition string to it. A job definition has the form:

<job name>':'<cron expr>':'<command>

so a "hello world" job definition would be:

hello:0 0/1 * * * ? *:echo hello world

If we write that job definition to the clone file it will create a job named 'hello' that prints "hello world" every minute.  How is the new 'hello' job represented in the name space? Like so:

.../clone
.../jobs/
         hello/
               ctl
               log
               cmd
               schedule

Jobs are the dynamic part of the name space. Each one is represented by a directory corresponding to the name of the job. Under each job directory is a collection of four files that allow users to control and monitor the job.
  • writing the text string 'start', upper or lower case, to the ctl file, e.g., echo start > ctl starts the job and writing 'stop' to the ctl file stops the job
  • reading from the log file yields the last results of the last N executions of the job, where N is configurable at job daemon start up
  • reading the cmd file returns the command associated with the job, in case you forgot
  • finally, reading the schedule file returns the cron expression that determines the execution schedule of the job and, if the job is started, the next time it will execute
So that's the design of the name space of our cron service. I've also specified the behavior of the system as the results of reading and writing the various files in the name space. Note that none of the files or directories in the name space actually exist on disk; they are synthetic files not unlike the files in the /proc file system of a Linux system. The name space is presented to the system by a daemon, jobd, that responds appropriately to the various messages of the 9p protocol. The jobd daemon listens on a TCP/IP port, while applications that understand 9p could connect directly to jobd it's much simpler to just mount jobd and use the standard Linux file system interface to interact with it.

We'll look at the actual code in a minute but for now assume that we've built the jobd binary and started it up either from the command line or via a service manager like systemd or supervisord. Then we can mount it via the mount command:

$ mount -t 9p -o protocol=tcp,port=5640 192.168.0.42 /mnt/jobs.d

and we can create and manage jobs using standard file system calls from any programming language, or straight from the shell:


$ ls /mnt/jobs.d
clone jobs
$ echo -n 'hello:0 0/1 * * * ? *:echo hello world' > /mnt/job.d/clone
$ ls /mnt/jobs.d/jobs
hello
$ cd /mnt/jobs.d/jobs/hello
$ echo -n start > ctl

The jobd source lives in the wkharold/jobd github repo. It's relatively small and, once you have a basic grasp of the go9p machinery, fairly simple. One quick meta comment on the repo: there are a couple of approaches to dealing with package dependencies in go, vendoring and for lack of a more concise term what I'll call tool-based version control. I've chosen the former because the packages jobd depends on: go9p, Rob Pike's glog, and Raymond Hill's cronexpr, are stable and I find the vendoring approach simpler to understand and manage - your milage may vary.

Jobd consists of three components:
  1. the network server
  2. the clone file that creates jobs
  3. the per job collection of files that control and provide information about jobs
These components are all part of jobd's main package which is composed of four source files: jobd.go, clone.go, jobs.go, and job.go.


Let's look at the network server first. The points of interest here are the creation of the static portion of the jobd name space and the firing up of the network listener. The gist below shows the mkjobfs function which creates the jobd clone file and jobs directory.

The go9p srv package exposes a File type. As might be expected this is one of the key components of a 9p based system. At the root of the jobd name space is a directory named / which is created on line 11. Once the root is created the jobs directory is added via mkJobsDir; on lines 63 and 64 a jobsdir struct is instantiated and then made a child of the jobd root, which was passed in as dir. Similarly, mkCloneFile instantiates a clonefile struct and makes it a child of the jobd root. Note, however, that since it's just a regular file the p.DMDIR bits aren't a part of its permissions mask (the 5th parameter of the Add invocation).

Starting up the network listener is pretty simple. Line 11 of the gist below instantiates a struct that holds all the fields necessary for handling the 9p protocol. Line 16 initializes it and starts up the goroutines it uses, and finally on line 18 jobd starts listening for incoming connections.

OK, on to the clone file. To add behavior to a 9p you implement one or more of the standard file operations: Create, Open, Read, Write, Remove, etc. In the case of the clone file the only supported operation is Write - writing a job definition string to the clone file creates the corresponding job subtree in the jobs directory. Here's the pertinent code.

A Write method is defined for the clonefile type. The mkCloneFile function mentioned above instantiated a clonefile and made it a child of the jobd root. When the jobd file system is mounted by the OS write operations will end up invoking our Write method. It will receive the job definition string in the data parameter as a slice of bytes. Those bytes get turned into a jobdef on line 23 which is then used to create the job subtree in the jobs directory on line 28.

Finally, let's look at the creation of the job subtree. A job is represented by a directory containing four files: ctl, cmd, log, schedule. The following gist contains the code used to create those four files.

Each of the job files is an instance of the jobfile type. A jobfile embeds the go9p srv.File type and adds reader, and writer fields which are used by the jobfile.Read and jobfile.Write methods. The directory that holds them all is an instance of the job type. The code on lines 61, 80, 94, and 116 that Adds the ctl, schedule, cmd, and log files to the subdirectory should be familiar by now. The behavior Read/Write behavior of each file is defined by the function literals assigned to each file's reader/writer field.

To see how the jobfile reader and writer fields are used checkout the code from job.go below.

The Read and Write methods contain the boiler plate and the reader/writer fields supply the file specific behavior.

Oh, and last but not least there's the function which actually runs the jobs.

It's a pretty straight forward go timer loop. You might be wondering if all the code I've wrapped around it is worth the bother. Obviously I think it is because once a machine has mounted the jobd file system anything that can read and write files can schedule jobs on the jobd host.

1 comment:

  1. Possibly interesting: this blog article claims that *not* representing something as a filesystem makes things easier, at least in the case of distributed locking:

    http://hackingdistributed.com/2013/12/26/introducing-replicant/

    ReplyDelete