Issue

I'm working with the githooks pre-commit.sample that ships with my version on OSX (git version 2.24.3 (Apple Git-128)). There is something peculiar in the code, namely to do with a seemingly spurious exec.

The pre-commit sample contains the following code (irrelevant lines/blocks removed):

#!/bin/sh

against=HEAD

# Redirect output to stderr.
exec 1>&2

# If there are whitespace errors, print the offending file names and fail.
exec git diff-index --check --cached $against --

If I attempt to modify this code by appending validation after the last exec call, it never runs. Per a relevant AskUbuntu post, I understand what it is about exec that makes this happen.

However, what I don't understand is why the exec needs to happen in the first place. This line has the hook fail if there's trailing whitespace, but it appears to behave the same if I remove the exec and just directly call git diff-index ....

In other words, this:

git diff-index --check --cached $against --

...appears to behave like this:

exec git diff-index --check --cached $against --

...except the latter seems more restrictive. I can't find a difference between this file with or without the exec, except that the exec makes it so that the whitespace checking has to happen last.

Why would the sample creators choose theexec option, when it appears to behave the same as the ostensibly less restrictive direct call?

Solution

This could be a (perhaps misguided) attempt at being more efficient.

In general, in shell scripts, the return value from the script is that of the last command run, as Ronald noted in a comment. So:

#! /bin/sh
cmd1
cmd2
cmd3
exit $?

is just a long-winded / explicit way of doing:

#! /bin/sh
cmd1
cmd2
cmd3

The general rule here is that the shell takes each "pipeline"—a pipeline being defined as a series of commands with | symbols, which pipe to each other—and runs that pipeline within a fork-then-exec of the main shell process. So:

cmd1 | cmd2
cmd3

causes the main shell to fork once to run cmd1 | cmd2 (which, internally, requires another fork for each of the two commands), then fork again to run cmd3. Then, having run out of commands, the shell would exit with $?—the last pipeline's exit status—as its own status.

Adding redirections, such as:

cmd1 | cmd2 > file

"means" that the shell should fork, then run the pipeline cmd1 | cmd2 with its output redirected to that file. Of course cmd1's output is already redirected to cmd2's input so only cmd2's output is affected here—but we can see that cmd3's output is not redirected, so clearly, the redirection did not happen at the shell level, but rather within the sub-shell it forked to run the pipeline.¹

What the exec keyword does is, in effect, prevent the fork. That is:

exec cmd > out

has the redirection take place in the top level shell, which then runs the given command with an exec system call without first calling fork. This replaces the shell with the command that is run (but hangs on to the process ID and all open file descriptors, until the command that is run here finishes).

If we leave out the command itself, we get:

exec >out

which means that no command gets run, but the redirection takes place in the shell itself rather than in some sub-shell. So now every subsequent command, which does get a fork-and-exec, has its output sent to file out.

We see something like that in your own script:

exec 1>&2

which forces all subsequent commands' stdout to go to the same file descriptor as stderr.

Oddly, there's then only one subsequent command, which means that if the goal was efficiency, they could have used:

exec git diff-index --check --cached $against -- 1>&2

to put everything on a single line.

¹In practice, shells actually do the file opening early, and have to do a whole lot of fancy footwork to shuffle file descriptors around between the fork and exec calls. With POSIX style job control, it's even worse: the shell has to do a lot of signal-directing work, making process groups, and so on. Writing a shell is hard, and as the V8 Unix and Plan 9 guys saw it, this meant that the overall OS design needed some reworking.

Exit status in general

As you noted in a reply:

Hence, if I have validation after a non-execed command, I'd need to make sure check for a non-0 result from the git diff-index.

Yes. Note that shells in general (and /bin/sh in particular) have interesting flags that you can set from the command or #! line, or with the set command. One of these flags is the e flag, which makes the shell exit if a command has a non-zero exit code:²

#! /bin/sh -e
cmd1
cmd2
cmd3

is roughly equivalent to:

#! /bin/sh
cmd1 || exit
cmd2 || exit
cmd3

(we don't need the || exit on the last one, although we could use it harmlessly). The -e flag is often a good idea.

²Note that tested commands do not make the shell exit immediately, so that we can write:

if grep ...; then
    thing to run when regexp is found
else
    thing to run when regexp is not found
fi

There was a bug in some early versions of /bin/sh where this didn't work right: I remember fixing it, then discovering I'd either over-fixed or under-fixed it for cases like a && b || c and having to re-fix it.

Answered By - torek

Answer Checked By - Mildred Charles (PHPFixing Admin)

Wednesday, September 14, 2022

[FIXED] Why is exec used (seemingly unnecessarily) at the end of this git hook sample?

Issue

Solution

Exit status in general

0 Comments:

Post a Comment

Total Pageviews

Featured Post

Why Learn PHP Programming

Wednesday, September 14, 2022

Issue

Solution

Exit status in general

0 Comments:

Post a Comment

Total Pageviews

Featured Post

Why Learn PHP Programming

Subscribe To