Discussion:
Mangled argument vector choking on spaces?
Kip Warner
2014-12-30 00:27:51 UTC
Permalink
Hey list,

My configure.ac updates CXXFLAGS periodically during execution, such as
via pkg-config. Some of the include paths returned from pkg-config's
--cflags contain spaces which I require. It's paths are properly
escaped, so this is not a problem thus far.

$ pkg-config --cflags libfoo
-I/some/path\ space

The problem is then when CXXFLAGS is amended with what is returned
breaking further conftests. This is fine:

CXXFLAGS="$CXXFLAGS -I/some/path/no_spaces"

This breaks:

CXXFLAGS="$CXXFLAGS -I/some/path\ space"

All succeeding conftests appear to bail. The config.log is revealing:

...
configure:9444: g++ -c -g3 -std=c++11 ... -I/some/path\ spaces conftest.cpp >&5
g++: error: spaces: No such file or directory
...

If I run what appears to be what configure executed (g++ ...), it
executes fine. However, by the looks of it, the argument vector was
broken up unintentionally.

I'm assuming I am not doing something correctly. Any help appreciated.

Respectfully,
--
Kip Warner -- Senior Software Engineer
OpenPGP encrypted/signed mail preferred
http://www.thevertigo.com
Kip Warner
2014-12-30 00:35:11 UTC
Permalink
The solution is to assure that all paths used do not include spaces.
If you use Cygwin or MSYS where paths with spaces often occur, then
fix the paths via mounts (/etc/fstab). On Unix type systems, symbolic
links should work but these will fail if a script needs to visit the
directory and does `pwd` to collect information on the current
directory.
Hey Bob,

Unfortunately I need spaces on GNU systems and possibly elsewhere. I'm
sure there is a way to do this and it's simply a matter of coalescing
parts of the argument vector. From what I can see configure has the
correct arguments, but some were accidentally split. Remember that the
arguments are already correctly escaped and if I run what was logged in
config.log, it runs fine with the spaces.
--
Kip Warner -- Senior Software Engineer
OpenPGP encrypted/signed mail preferred
http://www.thevertigo.com
Bob Friesenhahn
2014-12-30 14:53:01 UTC
Permalink
Post by Kip Warner
Unfortunately I need spaces on GNU systems and possibly elsewhere. I'm
sure there is a way to do this and it's simply a matter of coalescing
parts of the argument vector. From what I can see configure has the
correct arguments, but some were accidentally split. Remember that the
arguments are already correctly escaped and if I run what was logged in
config.log, it runs fine with the spaces.
The problem is that configure is a huge shell script and any arguments
passed to subordinate utilities need to be escaped properly to avoid
accidental splitting. There is also the problem that 'make'
automatically splits on white space and there is no portable way to
make it not do that.

Bob
--
Bob Friesenhahn
***@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Kip Warner
2014-12-30 15:15:44 UTC
Permalink
Post by Bob Friesenhahn
Post by Kip Warner
Unfortunately I need spaces on GNU systems and possibly elsewhere. I'm
sure there is a way to do this and it's simply a matter of coalescing
parts of the argument vector. From what I can see configure has the
correct arguments, but some were accidentally split. Remember that the
arguments are already correctly escaped and if I run what was logged in
config.log, it runs fine with the spaces.
The problem is that configure is a huge shell script and any arguments
passed to subordinate utilities need to be escaped properly to avoid
accidental splitting. There is also the problem that 'make'
automatically splits on white space and there is no portable way to
make it not do that.
Hey Bob,

After getting tired banging my head on the desk, I came to the same
conclusion. Users will have to build from within a path without spaces.

Since inevitably someone might try, do you recommend a recipe for
checking at configure time so I can issue an AC_MSG_ERROR?
--
Kip Warner -- Senior Software Engineer
OpenPGP encrypted/signed mail preferred
http://www.thevertigo.com
David A. Wheeler
2014-12-31 02:09:38 UTC
Permalink
Post by Kip Warner
My configure.ac updates CXXFLAGS periodically during execution, such as
via pkg-config. Some of the include paths returned from pkg-config's
--cflags contain spaces which I require.
...
Post by Kip Warner
The solution is to assure that all paths used do not include spaces.
Unfortunately I need spaces on GNU systems and possibly elsewhere.
I think "don't have spaces in paths" is increasingly impractical advice.
Mac OS X and Windows systems (msys, cygwin) *normally* have
directory names with spaces, and these names *are* the correct probe results.

Autoconf is intended to be "a tool for producing shell scripts that automatically
configure software source code packages to adapt to many kinds of Posix-like systems...
The configuration scripts produced by Autoconf require no manual user intervention when run".
But this is *NOT* true for the large number of systems where spaces occur in directory names,
and users cannot control this; they expect a tool like autoconf to handle it.

The fundamental problem is that space has *TWO* conflicting meanings in autoconf:
1. Part of a path name.
2. Separator between arguments (including options)

It's reasonable to tell developers "do not put spaces in your source code filenames", but users
generally can't prevent spaces from being in the directory names in their environment.

I propose developing and documenting a standard convention for spaces in pathnames that
focuses on making things easier for *users* - the builders of the software.
This convention could be enabled by some statement in autoconf.ac, and perhaps
it could be enabled by default some time in the future.
If autoconf defined a simple convention for differentiating spaces in pathnames
from spaces in argument separators, it might "just work". It'd be tricky to find such a convention,
but if one can be found, autoconf would be able to adapt to many common systems.

I propose the following convention, as a first cut:
1. Unquoted spaces continue to mean "argument separator". This is by far the most common
use of spaces today, so making that the default meaning seems sensible.
2. Pathnames with spaces must use double-quotes in a way that enclose the spaces
(if they're passed in as configuration values or provided as probe results).

Any automatic search for a program (e.g., AC_CHECK_PROG and AC_CHECK_PROGS)
that found a pathname with spaces would surround the pathname
using double-quotes, but it would *not* surround the first few characters - instead, it could insert a double-quote
after the path separator that first *began* a component with spaces.
Since an automatically-generated pathname won't begin with double-quotes,
current scripts that look for "absolute pathnames"
by looking for "/" or "C:" at the beginning would keep working.
It'd be possible to use "\" in front of each space instead, but since variables go through
multiple processing steps through different tools, and "\" is a pathname separator
on Windows, I expect that backslash would be less robust than double-quotes.

The idea would be that generated Makefiles (for example) could look like this:
SBCL = /cygdrive/c/"Program Files/Steel Bank Common Lisp/1.2.6/sbcl"
EGREP = /usr/bin/grep -E
...
something.dest: something.src
$(EGREP) ...
$(SBCL) ...

I think this would require a relatively minor change in AC_CHECK_PROG,
and the result would be much better automation on many systems.

This is a slightly tricky problem; comments welcome. It'd be nice to see
a simple and relatively clean solution to the problem. I think "use double-quotes when
pathnames contain spaces" is a plausible answer.

--- David A. Wheeler
Bob Friesenhahn
2014-12-31 02:16:26 UTC
Permalink
Post by David A. Wheeler
This is a slightly tricky problem; comments welcome. It'd be nice to see
a simple and relatively clean solution to the problem. I think "use double-quotes when
pathnames contain spaces" is a plausible answer.
The POSIX-style shell tosses the double quotes at first point of use.
Arguments must be escaped varying amounts of times to survive a given
data path, and the data path may not be a fixed one.

I agree that Windows is even more difficult, particlarly when there is
a POSIX to Windows translation layer.

Bob
--
Bob Friesenhahn
***@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Paul Eggert
2014-12-31 07:38:20 UTC
Permalink
SBCL =/cygdrive/c/"Program Files/Steel Bank Common Lisp/1.2.6/sbcl"
EGREP = /usr/bin/grep -E
That wouldn't work for commands that do stuff like this:

$(MAKE) SBCL="$(SBCL)"

which is a common idiom.

There is no simple and clean solution to this problem, I'm afraid. The best
place to start fixing it would be with GNU Make, and work is proceeding in this
area but as far as I know it's not ready for prime time yet. In the meantime,
people will just have to live with not having white space in their development
sources' directory names.
David A. Wheeler
2014-12-31 17:02:19 UTC
Permalink
Post by Paul Eggert
SBCL =/cygdrive/c/"Program Files/Steel Bank Common Lisp/1.2.6/sbcl"
$(MAKE) SBCL="$(SBCL)"
Sure, but this minor variation WOULD work in a makefile, and it's an easy fix:
$(MAKE) SBCL='$(SBCL)'

You can also embed double-quotes when setting environment variables and invoking make directly:
ENV_SBCL='/cygdrive/c/"Program Files/Steel Bank Common Lisp/1.2.6/sbcl"' make demo3

Another possible option (noted below) is to use single quotes instead of double quotes.
Maybe that would be the "easiest way forward".
Post by Paul Eggert
There is no simple and clean solution to this problem, I'm afraid.
That may be true, but maybe we can find "simple and clean enough".
There are really only 3 main options that I see to disambiguate
"spaces used as argument separators" and "spaces in pathnames" in input:
1. backslash prefixing (but this is heavily overloaded because this data goes a varying number of unquoting processes),
2. double quotes
3. single quotes
Post by Paul Eggert
The best place to start fixing it would be with GNU Make...
But that only works with GNU make. GNU make is great, but it's not the only make.
Autoconf (and automake) are supposed to work with other makes, so any
solution needs to *NOT* depend on GNU make.
Post by Paul Eggert
In the meantime, people will just have to live with not having
white space in their development sources' directory names.
The problem is that that's not the problem :-).
I agree that developers should avoid having whitespace in their source dirnames;
typically this is easy to do.

The problem is that in many *USER* (builder) environments,
tools and configuration information have pathnames with spaces.
This is NOT the developer's environment at all, and thus the
developer CANNOT control this. Users can't control it either, actually; it is
a current fact of their environment, especially if they use Mac OS X or Windows,
and not something they can change.
They can do workarounds with symlinks, etc., to try to *hide* the reality, but
since autoconf is supposed to do things *automatically*, that means that
autoconf fails to do the one job it's supposed to do: automatic configuration.

--- David A. Wheeler
Paul Eggert
2014-12-31 19:31:58 UTC
Permalink
Post by David A. Wheeler
$(MAKE) SBCL='$(SBCL)'
That won't work if SBCL contains single quotes, another common practice.
Post by David A. Wheeler
Post by Paul Eggert
The best place to start fixing it would be with GNU Make...
But that only works with GNU make.
GNU make would be just the first step. The next step would be to get the fix
into POSIX and into other 'make' implementations. This sort of thing has been
done before.
Post by David A. Wheeler
They can do workarounds with symlinks
How about fixing Autoconf to create a symlink from /tmp to srcdir if srcdir
contains a space, and using the symlink instead? That should work too. Please
feel free to propose a patch along those lines.
Kip Warner
2015-01-01 21:57:14 UTC
Permalink
Post by David A. Wheeler
They can do workarounds with symlinks, etc., to try to *hide* the reality, but
since autoconf is supposed to do things *automatically*, that means that
autoconf fails to do the one job it's supposed to do: automatic configuration.
Agreed. The whole purpose of autoconf is to help write portable scripts.
Paths with spaces in them, whether we like them or not, is often
uncontrollable and a fact of reality.
--
Kip Warner -- Senior Software Engineer
OpenPGP encrypted/signed mail preferred
http://www.thevertigo.com
Eric Blake
2015-01-02 23:06:41 UTC
Permalink
Post by Kip Warner
Post by David A. Wheeler
They can do workarounds with symlinks, etc., to try to *hide* the reality, but
since autoconf is supposed to do things *automatically*, that means that
autoconf fails to do the one job it's supposed to do: automatic configuration.
Agreed. The whole purpose of autoconf is to help write portable scripts.
Paths with spaces in them, whether we like them or not, is often
uncontrollable and a fact of reality.
While paths with spaces in them may be a fact of reality, they are
non-portable according to POSIX. It is going to be an uphill battle if
you want to submit patches to make autoconf automatically handle file
names with spaces, as there is nothing portable about them.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
Kip Warner
2015-01-02 23:28:06 UTC
Permalink
Post by Eric Blake
While paths with spaces in them may be a fact of reality, they are
non-portable according to POSIX. It is going to be an uphill battle if
you want to submit patches to make autoconf automatically handle file
names with spaces, as there is nothing portable about them.
We all agree they're not portable. We also all agree autoconf is
intended to generate portable configure scripts.

If this were ever patched, it would probably involve some clever way of
distinguishing spaces in paths from argument separators in the argument
vector. The only way I can see this ever happening is if argument
vectors were prepared through higher level functions, such as M4 macros,
with different functions to distinguish the two semantics.
--
Kip Warner -- Senior Software Engineer
OpenPGP encrypted/signed mail preferred
http://www.thevertigo.com
Bob Friesenhahn
2015-01-03 00:50:04 UTC
Permalink
Post by Eric Blake
While paths with spaces in them may be a fact of reality, they are
non-portable according to POSIX. It is going to be an uphill battle if
you want to submit patches to make autoconf automatically handle file
names with spaces, as there is nothing portable about them.
Even if Automake, Autoconf, libtool, intrinsics can be proven to work
with spaces, many significant build environments would still fail.
This is because autotools provide just part of the build
configuration. The project developer adds the rest.

Bob
--
Bob Friesenhahn
***@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
David A. Wheeler
2015-01-03 10:04:41 UTC
Permalink
Post by Eric Blake
While paths with spaces in them may be a fact of reality, they are
non-portable according to POSIX. It is going to be an uphill battle if
you want to submit patches to make autoconf automatically handle file
names with spaces, as there is nothing portable about them.
POSIX does not forbid the use of spaces, either, and they are a widespread fact of reality (as you noted). There ARE standard ways in POSIX to include such filenames in shell scripts: single quotes, double quotes, and backslash.

It is true that there is no standard way to create dependencies on them in makefiles. But I do not know of a user who needs that.

All I need is a way to invoke programs that include spaces in their name. That is a narrower need. There are already ways to do that in the standard. Perhaps autoconf could learn how to do just that.

--- David A.Wheeler
David A. Wheeler
2015-01-03 10:28:32 UTC
Permalink
Post by Eric Blake
While paths with spaces in them may be a fact of reality, they are
non-portable according to POSIX. It is going to be an uphill battle if
you want to submit patches to make autoconf automatically handle file
names with spaces, as there is nothing portable about them.
I just thought of an alternative. What if program searching did NOT insert the directory path in front? At least if the path includes a space and the directory is part of the PATH? Seems to me that AC_CHECK_PROG could be modified to allow something like this as this as an option.

E.g. when using AC_CHECK_PROG to search for "sbcl", the generated makefile would say:
SBCL = sbcl

If this was done, then the fact that the directory path includes a space is no longer mentioned.

--- David A.Wheeler
Bob Friesenhahn
2015-01-03 16:14:26 UTC
Permalink
Post by David A. Wheeler
I just thought of an alternative. What if program searching did NOT insert the directory path in front? At least if the path includes a space and the directory is part of the PATH? Seems to me that AC_CHECK_PROG could be modified to allow something like this as this as an option.
SBCL = sbcl
This introduces security/reliability issues and also hazards if one of
the build products was called 'sbcl' or if PATH is updated as part of
the build/test process.

Do you know what the output of AC_CHECK_PROG is used for? If you
don't know the answer, then the output should not be changed.

I do know that this would cause reliability/security problems for the
package I maintain.

Bob
--
Bob Friesenhahn
***@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Paul Eggert
2015-01-03 23:37:14 UTC
Permalink
Post by David A. Wheeler
SBCL = sbcl
That's easy to do without changing Autoconf. Put this in configure.ac:

AC_CHECK_PROG([SBCL], [sbcl], [sbcl], [false])

and this in Makefile:

SBCL = @SBCL@

Unfortunately, one often needs the absolute file name of the command, and that's
where the problem lies.
David A. Wheeler
2014-12-31 20:39:43 UTC
Permalink
Post by Paul Eggert
Post by David A. Wheeler
$(MAKE) SBCL='$(SBCL)'
That won't work if SBCL contains single quotes, another common practice.
That's pretty rare in my experience. In any case,
that is just the symptom of the problem. Because there is no standard
way to distinguish spaces as argument separators vs. spaces in pathnames,
people feel free to do anything. By defining a convention, people can use the
sequence in the ways intended, and avoid them for other reasons.
Post by Paul Eggert
GNU make would be just the first step. The next step would be to get the fix
into POSIX and into other 'make' implementations. This sort of thing has been
done before.
That certainly sounds promising. What's the plan? I know there was some
discussion about how to handle spaces in GNU make, but I haven't been following it closely
for a while.
Post by Paul Eggert
How about fixing Autoconf to create a symlink from /tmp to srcdir if srcdir
contains a space, and using the symlink instead? That should work too. Please
feel free to propose a patch along those lines.
That is irrelevant for my use case, if I understand you correctly.
In all the cases I'm concerned about, spaces are *NOT* in any path
inside a source directory srcdir. So "fixing the distributed filenames"
or "creating a symlink to srcdir" does NOTHING useful for my users.

In all cases I'm concerned about, the spaces are the correct results
of system probing by "./configure"; these values are NOT in any sense within srcdir.
E.G., when installing an application that requires sbcl, the autoconf-generated
"configure.ac" needs to find where sbcl is installed. On Windows systems
with Cygwin that value will normally be this (note the spaces):
/cygdrive/c/Program Files/Steel Bank Common Lisp/1.2.6/sbcl
Thus, autoconf+automake might put this in the Makefile:
SBCL = /cygdrive/c/Program Files/Steel Bank Common Lisp/1.2.6/sbcl
Similarly, autoconf+automake might decide to put this in the Makefile for EGREP:
EGREP = /usr/bin/grep -E
The cause is that the space character has two *incompatible* meanings when autoconf
returns probe results.
These makefile values need to be used during build, and often end up getting
stored in generated executables that are then shared across the system.
Creating a symlink from /tmp to srcdir does nothing useful in these cases.

--- David A. Wheeler
Bob Friesenhahn
2014-12-31 22:28:46 UTC
Permalink
Post by David A. Wheeler
Post by Paul Eggert
How about fixing Autoconf to create a symlink from /tmp to srcdir if srcdir
contains a space, and using the symlink instead? That should work too. Please
feel free to propose a patch along those lines.
That is irrelevant for my use case, if I understand you correctly.
In all the cases I'm concerned about, spaces are *NOT* in any path
inside a source directory srcdir. So "fixing the distributed filenames"
or "creating a symlink to srcdir" does NOTHING useful for my users.
It is a common problem for Windows users that their default directory
(equivalent to a home directory) has spaces in it.

Bob
--
Bob Friesenhahn
***@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Paul Eggert
2015-01-01 02:03:09 UTC
Permalink
Post by David A. Wheeler
Post by Paul Eggert
That won't work if SBCL contains single quotes, another common practice.
That's pretty rare in my experience.
Shrug. I do it all the time. I prefer single-quotes, anyway, as they're safer
for shell quoting.
Post by David A. Wheeler
I know there was some
discussion about how to handle spaces in GNU make, but I haven't been following it closely
for a while.
Likewise. But that's the plan, anyway.
Post by David A. Wheeler
E.G., when installing an application that requires sbcl, the autoconf-generated
"configure.ac" needs to find where sbcl is installed. On Windows systems
/cygdrive/c/Program Files/Steel Bank Common Lisp/1.2.6/sbcl
Sure. So create a symlink to that, e.g.,

ln -s '/cygdrive/c/Program Files/Steel Bank Common Lisp/1.2.6/sbc' /tmp/xyz

Then configure with the equivalent of SBCL=/tmp/xyz.

Obviously the idea needs some elaboration (e.g., if the file name is needed at
runtime) but it would work. All it needs is Somebody to write and test and
document the patch.
David A. Wheeler
2015-01-01 05:55:36 UTC
Permalink
Post by Paul Eggert
Sure. So create a symlink to that, e.g.,
ln -s '/cygdrive/c/Program Files/Steel Bank Common Lisp/1.2.6/sbc' /tmp/xyz
Then configure with the equivalent of SBCL=/tmp/xyz.
Obviously the idea needs some elaboration (e.g., if the file name is needed at
runtime) but it would work. All it needs is Somebody to write and test and
document the patch.

That will not work well in many cases. /tmp often doesn't stick around, nor is there any standard safe system-wide place anything could write to. How do you handle containers where the software is preinstalled and provided on readonly mounts and with new blank /tmp? This approach is also nasty to package. The whole approach is a weird kludge.

I do grant you points for being clever, but isn't there a better way?

--- David A.Wheeler
Fotis Georgatos
2015-01-05 10:41:56 UTC
Permalink
Hi David,

(this is a belated reply :)
Basically, ` ` becomes `_space_` and so on, for many more potentially tricky characters.
The aim was to have freedom in relation to packages/filenames. It seems to work OK! (*)
That is an interesting idea. I do worry that it too easily collides with normal names. Perhaps a double underscore before and after the key word would counter that problem.
Double underscores indeed would reduce the probability of human misunderstanding;

However, the transformation is unambiguous already once it is *known* if a string
is in its expanded form or not, since all underscores themselves are also expressed; see:
https://github.com/hpcugent/easybuild-framework/blob/master/easybuild/tools/filetools.py#L87
You still need to generate the correct values in makefiles and elsewhere. But clearly the first step is to have an unambiguous representation.
Exactly, that is the point. The whole aim of that function was to permit
a representation that has the best chance of success across different situations
(fi. name ` Ήλεκτρον ` would be transformed sufficiently, to make it nicely usable).

<debate zone>
With all their great benefits, Unix shells have forced upon as a view of the world
whereby arbitrary limitations of build processes are always faced as after-thoughts.
May be one day we will be able to run complex builds w/out worrying on directory names...
</debate zone>

best,
Fotis
--
echo "sysadmin know better bash than english" | sed s/min/mins/ \
| sed 's/better bash/bash better/' # signal detected in a CERN forum
David A. Wheeler
2015-01-05 22:17:07 UTC
Permalink
Post by Paul Eggert
Post by David A. Wheeler
SBCL = sbcl
AC_CHECK_PROG([SBCL], [sbcl], [sbcl], [false])
Fair point. I've reflexively been using AC_PATH_PROG, but in my case AC_CHECK_PROG
is the better choice.

I think the autoconf documentation should encourage its users to *NOT* embed full pathnames
if they can avoid them, e.g., they should be encouraged to use AC_CHECK_PROG instead of
AC_PATH_PROG. Then spaces-in-directories would bite users (builders) less often.

It'd still be good to have a solution for spaces-in-directories.

--- David A. Wheeler
Kip Warner
2015-01-05 22:31:41 UTC
Permalink
Post by David A. Wheeler
It'd still be good to have a solution for spaces-in-directories.
I think the best way I can think of is via an M4 implemented API to
provide all path related functions for both creating and modifying as
well as "rendering" out into an argument vector.
--
Kip Warner -- Senior Software Engineer
OpenPGP encrypted/signed mail preferred
http://www.thevertigo.com
Loading...