Two shebang papercuts
2019 Nov 08
You’ve written a Ruby script on your Mac. The script demands great
performance, so you add --enable=jit
to its shebang:
#!/usr/bin/env ruby --enable=jit
print "Hello, world!"
You test on your Mac and the program greets everybody in record time. Ship it!
Some time later, a user reports a problem. When they run the script, it seems rather slow; in fact, they’ve waited for hours and no one has been greeted.
The script still works (and it’s fast!) on your machine, so you ask the user
about their environment. One difference stands out: they’re using Linux. So
you spin up a Linux box and try the script. Sure enough, it seems stuck. You
reach for strace
, and see the same spew looping ad nauseam:
you@yourbox:~$ sudo strace ./hello
execve("./hello", ["./hello"], 0x7ffe721b38b0 /* 52 vars */) = 0
brk(NULL) = 0x55b53fddf000
arch_prctl(0x3001 /* ARCH_??? */, 0x7fff6a434040) = -1 EINVAL (Invalid argument)
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=211342, ...}) = 0
mmap(NULL, 211342, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f4088423000
close(3) = 0
openat(AT_FDCWD, "/usr/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`r\2\0\0\0\0\0"..., 832) = 832
lseek(3, 64, SEEK_SET) = 64
read(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784) = 784
lseek(3, 848, SEEK_SET) = 848
read(3, "\4\0\0\0\20\0\0\0\5\0\0\0GNU\0\2\0\0\300\4\0\0\0\3\0\0\0\0\0\0\0", 32) = 32
lseek(3, 880, SEEK_SET) = 880
read(3, "\4\0\0\0\24\0\0\0\3\0\0\0GNU\0003\321\363P\3617(e\35t\335*V\272\321\344"..., 68) = 68
fstat(3, {st_mode=S_IFREG|0755, st_size=2149496, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4088421000
lseek(3, 64, SEEK_SET) = 64
read(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784) = 784
lseek(3, 848, SEEK_SET) = 848
read(3, "\4\0\0\0\20\0\0\0\5\0\0\0GNU\0\2\0\0\300\4\0\0\0\3\0\0\0\0\0\0\0", 32) = 32
lseek(3, 880, SEEK_SET) = 880
read(3, "\4\0\0\0\24\0\0\0\3\0\0\0GNU\0003\321\363P\3617(e\35t\335*V\272\321\344"..., 68) = 68
mmap(NULL, 1860536, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f408825a000
mprotect(0x7f408827f000, 1671168, PROT_NONE) = 0
mmap(0x7f408827f000, 1363968, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x25000) = 0x7f408827f000
mmap(0x7f40883cc000, 303104, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x172000) = 0x7f40883cc000
mmap(0x7f4088417000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bc000) = 0x7f4088417000
mmap(0x7f408841d000, 13240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f408841d000
close(3) = 0
arch_prctl(ARCH_SET_FS, 0x7f4088422580) = 0
mprotect(0x7f4088417000, 12288, PROT_READ) = 0
mprotect(0x55b53f654000, 4096, PROT_READ) = 0
mprotect(0x7f4088481000, 4096, PROT_READ) = 0
munmap(0x7f4088423000, 211342) = 0
brk(NULL) = 0x55b53fddf000
brk(0x55b53fe00000) = 0x55b53fe00000
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=3035216, ...}) = 0
mmap(NULL, 3035216, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f4087f74000
close(3) = 0
execve("./hello", ["./hello"], 0x7ffe721b38b0 /* 52 vars */) = 0
brk(NULL) = 0x55b53fddf000
arch_prctl(0x3001 /* ARCH_??? */, 0x7fff6a434040) = -1 EINVAL (Invalid argument)
...
What’s going on? The script is just exec
-ing itself in a seemingly infinite
loop, and not really doing anything else. Why?
It turns out this comes from combining two papercuts, one which applies to
shebangs in general, and another which applies to any which invoke
/usr/bin/env
.
the multiple argument shebang papercut
You can think of the shebang #!/usr/bin/env ruby --enable=jit
as having two parts:
- The “interpreter name”,
/usr/bin/env
, which is a path to the script’s interpreter - The remainder,
ruby --enable=jit
Among other differences in shebangs, Linux and macOS treat the remainder
differently. Linux parses the remainder into exactly one argument. In our
./hello
example, Linux parses the shebang’s remainder to "ruby --enable=jit"
, combines it with the user’s arguments {"./hello"}
, and
executes "/usr/bin/env"
with arguments {"/usr/bin/env", "ruby --enable=jit", "./hello"}
.
macOS splits the remainder into several arguments if there are spaces. For
./hello
, macOS parses the remainder to {"ruby", "--enable=jit"}
and
executes "/usr/bin/env"
with arguments {"env", "ruby", "--enable=jit", "./hello"}
.
It seems like the script would fail on Linux with this shebang, because it’s
not asking env
to find and run ruby
. But it’s not saying something like,
program not found: ruby --enable=jit
, it’s hanging in that loop. Something
else is afoot.
the #!/usr/bin/env shebang papercut
The first papercut doesn’t fully explain this loopy behavior. Another
paper-sharp shebang, which also tries to pass two arguments to /usr/bin/env
,
fails differently on Linux:
#!/usr/bin/env ruby --verbose
echo "Goodbye, world"
you@yourbox:~$ ./goodbye
/usr/bin/env: ‘ruby --verbose’: No such file or directory
The difference is in the =
character. Besides finding the first interpreter
on a user’s PATH
, env
has another use: to “run a program in a modified
environment”. One way it does this is to take arguments of the form
VARIABLE=VALUE
before the program and its arguments, and set those
environment variables in the executed program’s process. For instance:
you@yourbox:~$ env DISPLAY=:0 xeyes -fg dodgerblue
executes {"xeyes", "-fg", "dodgerblue"}
with the shell’s environment, plus
the environment variable "DISPLAY"
set to ":0"
.
Combining these papercuts, when Linux comes to execute ./hello
:
- An
exec
-family function reads"./hello"
’s contents and parses the shebang into{"/usr/bin/env", "ruby --enable=jit"}
. - The parsed shebang is combined with the
argv
passed toexec*()
to ultimately execute{"/usr/bin/env", "ruby --enable=jit", "./hello"}
. env
parses its arguments, sets the environment variable"ruby --enable"
to the value"jit"
and executes the program"./hello"
.GOTO 1
./goodbye
fails differently because env
interprets the argument ruby --verbose
as the program to run, since there’s no =
character in it.
a solution: -S
Fortunately, modern versions of env
have a solution. macOS and FreeBSD
support the -S
option, which does, quote the macOS manual:
-S string
Split apart the given string into multiple strings, and process each of the
resulting strings as separate arguments to the env utility. The -S option
recognizes some special character escape sequences and also supports environment-
variable substitution, as described below.
On Linux, GNU’s coreutils added the same flag to env
in version 8.30,
released July 2018.
So you can write a fancy, optionful shebang like:
#!/usr/bin/env -S ruby --enable=jit --verbose
puts "Hello, world!"
And ./hello
will execute what you wanted.
references
Andries E. Brouwer posted a good writeup of shebang behavior on many unix-ish platforms.