Sometimes, $DAYJOB can get kindof technical. For reasons I won’t go into here because NDA, the following axioms are true for this puzzle:
- we have to work in JRuby
- we are in a plugin within a larger framework providing a service
- we have to restart the entire service
- we don’t have a programmatic way to do so
- we don’t want to rely on external artifacts and cron
Now, this isn’t the initial framing set of axioms you understand; this is what we’re facing into after a few weeks of trying everything else first.
So; obvious solution, system('/etc/init.d/ourService restart')
.
Except that JRuby doesn’t do system()
. Or fork()
, exec()
, daemon()
, or indeed any kind of process duplication I could find. Oh-kay, so we can write to a file, have a cronjob watch for the file and restart the service and delete the file if it finds it. Except that for Reasons (again, NDA), that’s not possible because we can’t rely on having access to cron on all platforms.
Okay. Can we cheat?
Well, yes… allegedly. We can use the Foreign Function Interface to bind to libc and access the functions behind JRuby’s back.
require 'ffi' module Exec extend FFI::Library attach_function :my_exec, :execl, [:string, :string, :varargs], :int attach_function :fork, [], :int end vim1 = '/usr/bin/vim' vim2 = 'vim' if Exec.fork == 0 Exec.my_exec vim1, vim2, :pointer, nil end Process.waitall
Of course, I’m intending to kill the thing that fires this off, so a little more care is needed. For a start, it’s not vim I’m playing with. So…
module LibC extend FFI::Library ffi_lib FFI::Library::LIBC # Timespec struct datatype class Timespec < FFI::Struct layout :tv_sec, :time_t, :tv_nsec, :long end # stat struct datatype # (see /usr/include/sys/stat.h and /usr/include/bits/stat.h) class Stat < FFI::Struct layout :st_dev, :dev_t, :st_ino, :ino_t, :st_nlink, :nlink_t, :st_mode, :mode_t, :st_uid, :uid_t, :st_gid, :gid_t, :__pad0, :int, :st_rdev, :dev_t, :st_size, :off_t, :st_blksize, :long, :st_blocks, :long, :st_atimespec, LibC::Timespec, :st_mtimespec, LibC::Timespec, :st_ctimespec, LibC::Timespec, :__unused0, :long, :__unused1, :long, :__unused2, :long, :__unused3, :long, :__unused4, :long end # Filetype mask S_IFMT = 0o170000 # File types. S_IFIFO = 0o010000 S_IFCHR = 0o020000 S_IFDIR = 0o040000 S_IFBLK = 0o060000 S_IFREG = 0o100000 S_IFLNK = 0o120000 S_IFSOCK = 0o140000 attach_function :getpid, [], :pid_t attach_function :setsid, [], :pid_t attach_function :fork, [], :int attach_function :execl, [:string, :string, :string, :varargs], :int attach_function :chdir, [:string], :int attach_function :close, [:int], :int attach_function :fstat, :__fxstat, [:int, :int, :pointer], :int end
So that’s bound a bunch of libc functions for use in JRuby. But why __fxstat()
instead of fstat()
? Interesting detail; the stat()
function family aren’t in libc, at least not on most modern linux platforms. They’re in a small static library (libc_unshared.a in centOS). There’s usually a linker directive that makes that transparent but here we’re acting behind the scenes so we don’t get that niceity so we directly access the underlying xstat()
functions instead.
I need to close some network ports (or the restart goes badly because the child process inherits the ports’ file descriptors and someone didn’t set them to close on exec()
). A small helper function is useful here:
# Helper function to check if a file descriptor is a socket or not def socket?(fd) # data structure to hold the stat_t data stat = LibC::Stat.new # JRuby's IO object types can't seem get a grip on fd's inherited from # another process correctly in a forked child process so we have # to FFI out to libc. rc = LibC.fstat(0, fd, stat.pointer) if rc == -1 errno = FFI::LastError.error false else # Now we do some bit twiddling. In Octal, no less. filetype = stat[:st_mode] & LibC::S_IFMT if filetype == LibC::S_IFSOCK true else false end end rescue => e false end
And now the actual restart function itself:
def restart pid = LibC.getpid rc = LibC.chdir('/') if rc == -1 errno = FFI::LastError.error return errno end # close any open network sockets so the restart doesn't hang fds = Dir.entries("/proc/#{pid}/fd") fds.each do |fd| # skip . and .. which we pick up because of the /proc approach to # getting the list of file descriptors next if fd.to_i.zero? # skip any non-network socket file descriptors as they're not going to # cause us any issues and leaving them lets us log a little longer. next unless socket?(fd.to_i) # JRuby's IO objects can't get a handle on these fd's for some reason, # possibly because we're in a child process. So we use libc's close() rc = LibC.close(fd.to_i) next if rc.zero? errno = FFI::LastError.error return errno end # We're now ready to fork and restart the service rc = LibC.fork if rc == -1 # If fork() failed we're probably in a world of hurt errno = FFI::LastError.error return errno elsif rc.zero? # We are now the daemon. We can't hang about (thanks to # JRuby's un-thread-safe nature) so we immediately swap out our # process image with that of the service restart script. # This marks the end of execution of this thread and there is no return. LibC.execl '/etc/init.d/ourService', 'ourService', 'restart', :pointer, nil end rescue => e # Handle errors here (removed for clarity) end
An interesting problem to solve, this one. And by “interesting” I mean “similar to learning how to pull teeth while only able to access the mouth via the nose”. But in case it’s of use to someone…