Issue
I'm working with the githooks pre-commit.sample that ships with my version on OSX (git version 2.24.3 (Apple Git-128)
). There is something peculiar in the code, namely to do with a seemingly spurious exec
.
The pre-commit sample contains the following code (irrelevant lines/blocks removed):
#!/bin/sh
against=HEAD
# Redirect output to stderr.
exec 1>&2
# If there are whitespace errors, print the offending file names and fail.
exec git diff-index --check --cached $against --
If I attempt to modify this code by appending validation after the last exec
call, it never runs. Per a relevant AskUbuntu post, I understand what it is about exec
that makes this happen.
However, what I don't understand is why the exec
needs to happen in the first place. This line has the hook fail if there's trailing whitespace, but it appears to behave the same if I remove the exec
and just directly call git diff-index ...
.
In other words, this:
git diff-index --check --cached $against --
...appears to behave like this:
exec git diff-index --check --cached $against --
...except the latter seems more restrictive. I can't find a difference between this file with or without the exec
, except that the exec
makes it so that the whitespace checking has to happen last.
Why would the sample creators choose theexec
option, when it appears to behave the same as the ostensibly less restrictive direct call?
Solution
This could be a (perhaps misguided) attempt at being more efficient.
In general, in shell scripts, the return value from the script is that of the last command run, as Ronald noted in a comment. So:
#! /bin/sh
cmd1
cmd2
cmd3
exit $?
is just a long-winded / explicit way of doing:
#! /bin/sh
cmd1
cmd2
cmd3
The general rule here is that the shell takes each "pipeline"—a pipeline being defined as a series of commands with |
symbols, which pipe to each other—and runs that pipeline within a fork-then-exec of the main shell process. So:
cmd1 | cmd2
cmd3
causes the main shell to fork once to run cmd1 | cmd2
(which, internally, requires another fork for each of the two commands), then fork again to run cmd3. Then, having run out of commands, the shell would exit with $?
—the last pipeline's exit status—as its own status.
Adding redirections, such as:
cmd1 | cmd2 > file
"means" that the shell should fork, then run the pipeline cmd1 | cmd2
with its output redirected to that file. Of course cmd1
's output is already redirected to cmd2
's input so only cmd2
's output is affected here—but we can see that cmd3
's output is not redirected, so clearly, the redirection did not happen at the shell level, but rather within the sub-shell it forked to run the pipeline.1
What the exec
keyword does is, in effect, prevent the fork. That is:
exec cmd > out
has the redirection take place in the top level shell, which then runs the given command with an exec
system call without first calling fork
. This replaces the shell with the command that is run (but hangs on to the process ID and all open file descriptors, until the command that is run here finishes).
If we leave out the command itself, we get:
exec >out
which means that no command gets run, but the redirection takes place in the shell itself rather than in some sub-shell. So now every subsequent command, which does get a fork-and-exec, has its output sent to file out
.
We see something like that in your own script:
exec 1>&2
which forces all subsequent commands' stdout to go to the same file descriptor as stderr.
Oddly, there's then only one subsequent command, which means that if the goal was efficiency, they could have used:
exec git diff-index --check --cached $against -- 1>&2
to put everything on a single line.
1In practice, shells actually do the file opening early, and have to do a whole lot of fancy footwork to shuffle file descriptors around between the fork
and exec
calls. With POSIX style job control, it's even worse: the shell has to do a lot of signal-directing work, making process groups, and so on. Writing a shell is hard, and as the V8 Unix and Plan 9 guys saw it, this meant that the overall OS design needed some reworking.
Exit status in general
As you noted in a reply:
Hence, if I have validation after a non-execed command, I'd need to make sure check for a non-0 result from the
git diff-index
.
Yes. Note that shells in general (and /bin/sh
in particular) have interesting flags that you can set from the command or #!
line, or with the set
command. One of these flags is the e
flag, which makes the shell exit if a command has a non-zero exit code:2
#! /bin/sh -e
cmd1
cmd2
cmd3
is roughly equivalent to:
#! /bin/sh
cmd1 || exit
cmd2 || exit
cmd3
(we don't need the || exit
on the last one, although we could use it harmlessly). The -e
flag is often a good idea.
2Note that tested commands do not make the shell exit immediately, so that we can write:
if grep ...; then
thing to run when regexp is found
else
thing to run when regexp is not found
fi
There was a bug in some early versions of /bin/sh
where this didn't work right: I remember fixing it, then discovering I'd either over-fixed or under-fixed it for cases like a && b || c
and having to re-fix it.
Answered By - torek Answer Checked By - Mildred Charles (PHPFixing Admin)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.