Discussion:
why generate output to tmp file and mv to actual file?
John Calcote
2018-06-01 18:45:28 UTC
Permalink
I recently ran across some sample code in section 19.4 of the Autoconf
manual (modified slightly to reduce example):

$(TESTSUITE): $(srcdir)/testsuite.at $(srcdir)/package.m4
autom4te --language=autotest -I '$(srcdir)' -o $@.tmp $@.at
mv $@.tmp $@

This question isn't about autotest, but rather about the two commands in
this rule - why generate the output into $@.tmp and then mv $@.tmp into $@?
Is there some power mv has over autom4te that allows it better access to
the target under some conditions?

Thanks in advance,
John
Bob Friesenhahn
2018-06-01 18:55:05 UTC
Permalink
Post by John Calcote
I recently ran across some sample code in section 19.4 of the Autoconf
$(TESTSUITE): $(srcdir)/testsuite.at $(srcdir)/package.m4
This question isn't about autotest, but rather about the two commands in
Is there some power mv has over autom4te that allows it better access to
the target under some conditions?
A reason to do such a thing is to avoid leaving a
partial/corrupt output file in place due to the command writing it
encountering an error. The assumption is the the 'mv' command would
not be executed if the previous command encountered an error.

Another reason would be in a multi-processing environment (e.g.
parallel compile) where the file content (or its timestamp) might be
consumed before it is ready. The 'mv' is an atomic operation so once
the file is in its final location, it is assured to be completely
written.

Bob
--
Bob Friesenhahn
***@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Eric Blake
2018-06-01 19:00:50 UTC
Permalink
Post by John Calcote
I recently ran across some sample code in section 19.4 of the Autoconf
$(TESTSUITE): $(srcdir)/testsuite.at $(srcdir)/package.m4
This question isn't about autotest, but rather about the two commands in
Is there some power mv has over autom4te that allows it better access to
the target under some conditions?
Atomicity. autom4te generates its output piecemeal (while it is
running, you can see an incomplete version of $@.tmp); such an
incomplete file will probably fail miserably but in an unpredictable
manner when run as a shell script. As an example of the damage possible
with an incomplete script? Suppose you are generating a shell script
that states:

rm -rf *.tmp

but due to buffering constraints on stdio, the kernel happens to have
flushed "rm -rf *" to disk (perhaps because it hit a 64k boundary, or
so), but still has ".tmp" buffered up for a later write. In the common
case, executing an incomplete script will hopefully cause a syntax
error; but in cases like I just described, it can cause data loss.

Thus, we generate into a temporary file (which no one will read in an
intermediate state), then use mv to the final location (which has atomic
semantics - anyone open()ing the file will see either the old inode
which was presumably complete, or the new inode that we just completed),
so that the target file is always a valid and complete file for reading
or execution.

In fact, the notion of generate to a temporary then move into place is a
rather common idiom.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
John Calcote
2018-06-01 20:13:10 UTC
Permalink
Thanks everyone - great responses! Answers all my questions.
Post by John Calcote
I recently ran across some sample code in section 19.4 of the Autoconf
$(TESTSUITE): $(srcdir)/testsuite.at $(srcdir)/package.m4
This question isn't about autotest, but rather about the two commands in
Is there some power mv has over autom4te that allows it better access to
the target under some conditions?
Atomicity. autom4te generates its output piecemeal (while it is running,
probably fail miserably but in an unpredictable manner when run as a shell
script. As an example of the damage possible with an incomplete script?
rm -rf *.tmp
but due to buffering constraints on stdio, the kernel happens to have
flushed "rm -rf *" to disk (perhaps because it hit a 64k boundary, or so),
but still has ".tmp" buffered up for a later write. In the common case,
executing an incomplete script will hopefully cause a syntax error; but in
cases like I just described, it can cause data loss.
Thus, we generate into a temporary file (which no one will read in an
intermediate state), then use mv to the final location (which has atomic
semantics - anyone open()ing the file will see either the old inode which
was presumably complete, or the new inode that we just completed), so that
the target file is always a valid and complete file for reading or
execution.
In fact, the notion of generate to a temporary then move into place is a
rather common idiom.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
Loading...