Discussion:
parallelized configure
Alexander Holler
2014-01-14 12:56:38 UTC
Permalink
Hello,

I've seen there was some discussion about parallelizing configure in
2011 which seems to have come up through a talk at fosdem 2011 I've
missed: http://lists.gnu.org/archive/html/autoconf/2011-02/msg00036.html

Because in most of todays build processes configure is one of the few
parts which still don't take advantages through multiple processors, I
want to ask if there has been some progress on that topic?

Regards,

Alexander Holler
Eric Blake
2014-01-14 15:44:38 UTC
Permalink
Post by Alexander Holler
Hello,
I've seen there was some discussion about parallelizing configure in
2011 which seems to have come up through a talk at fosdem 2011 I've
missed: http://lists.gnu.org/archive/html/autoconf/2011-02/msg00036.html
Because in most of todays build processes configure is one of the few
parts which still don't take advantages through multiple processors, I
want to ask if there has been some progress on that topic?
No one has come forth with patches, and the problem is a lot trickier
than you would think. I'm not holding my breath for this to happen any
time soon.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
Alexander Holler
2014-01-14 19:13:33 UTC
Permalink
Post by Eric Blake
Post by Alexander Holler
Hello,
I've seen there was some discussion about parallelizing configure in
2011 which seems to have come up through a talk at fosdem 2011 I've
missed: http://lists.gnu.org/archive/html/autoconf/2011-02/msg00036.html
Because in most of todays build processes configure is one of the few
parts which still don't take advantages through multiple processors, I
want to ask if there has been some progress on that topic?
No one has come forth with patches, and the problem is a lot trickier
than you would think. I'm not holding my breath for this to happen any
time soon.
How do you know what I would think?
Eric Blake
2014-01-14 19:24:25 UTC
Permalink
Post by Alexander Holler
Post by Eric Blake
Post by Alexander Holler
Hello,
I've seen there was some discussion about parallelizing configure in
2011 which seems to have come up through a talk at fosdem 2011 I've
missed: http://lists.gnu.org/archive/html/autoconf/2011-02/msg00036.html
Because in most of todays build processes configure is one of the few
parts which still don't take advantages through multiple processors, I
want to ask if there has been some progress on that topic?
No one has come forth with patches, and the problem is a lot trickier
than you would think. I'm not holding my breath for this to happen any
time soon.
How do you know what I would think?
Until you post patches for us to pick apart, we don't. But the fact
that you are posting a request, rather than a patch, is relatively
decent evidence that you are aware that it is tough enough that you
aren't appearing to volunteer for the job.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
Alexander Holler
2014-01-14 19:42:01 UTC
Permalink
Post by Eric Blake
Post by Alexander Holler
Post by Eric Blake
Post by Alexander Holler
Hello,
I've seen there was some discussion about parallelizing configure in
2011 which seems to have come up through a talk at fosdem 2011 I've
missed: http://lists.gnu.org/archive/html/autoconf/2011-02/msg00036.html
Because in most of todays build processes configure is one of the few
parts which still don't take advantages through multiple processors, I
want to ask if there has been some progress on that topic?
No one has come forth with patches, and the problem is a lot trickier
than you would think. I'm not holding my breath for this to happen any
time soon.
How do you know what I would think?
Until you post patches for us to pick apart, we don't. But the fact
that you are posting a request, rather than a patch, is relatively
decent evidence that you are aware that it is tough enough that you
aren't appearing to volunteer for the job.
Hmm, I'm no native english speaker, but it shouldn't have sounded like a
request. I'm pretty aware how open/free source works and wouldn't
request anything from any free project.

I just was curious if there was some progress on that topic besides what
Ralf Wildenhues seemed to have tried out.

But you right with your assumption that I won't volunteer, but I assume
it's still ok to ask nevertheless.

Regards,

Alexander Holler
Bob Friesenhahn
2014-01-15 00:20:56 UTC
Permalink
I just was curious if there was some progress on that topic besides what Ralf
Wildenhues seemed to have tried out.
The most challenging aspect is because configure scripts have a huge
amount of dependencies (e.g. shell variable definitions) which only
work due to the sequential nature of the script. In order to
parallize configure, one would need to somehow assure that results are
available in the correct order. Such logic is normal in make files
but not in shell scripts.

Part of configure scripts is written by the developers of a package
and not from Autoconf and Autoconf has no way to predict the behavior
of code which is outside of its control.

Regardless, configure scripts are not actually the main problem with
package build times on modern hardware. The main problem is that most
free software packages are not constructed correctly so that they can
take advantage of parallel builds.

Bob
--
Bob Friesenhahn
***@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Mike Frysinger
2014-01-15 00:32:56 UTC
Permalink
Post by Bob Friesenhahn
Post by Alexander Holler
I just was curious if there was some progress on that topic besides what
Ralf Wildenhues seemed to have tried out.
The most challenging aspect is because configure scripts have a huge
amount of dependencies (e.g. shell variable definitions) which only
work due to the sequential nature of the script. In order to
parallize configure, one would need to somehow assure that results are
available in the correct order. Such logic is normal in make files
but not in shell scripts.
Part of configure scripts is written by the developers of a package
and not from Autoconf and Autoconf has no way to predict the behavior
of code which is outside of its control.
there's semi-precedence though with introducing new macros when there's no
confidence in safely converting existing one. consider:
AC_CHECK_FUNC beget
AC_CHECK_FUNCs beget
AC_CHECK_FUNCS_ONCE
same for HEADERS and DECLS. maybe time to beget a
AC_CHECK_FUNCS_ONCE_PARALLEL ? :) or maybe enshrine the ONCE behavior and
call it AC_CHECK_FUNCS_PARA. that'd cover a decent amount of ground (albeit,
not as much as would truly be possible from an interlocked pipeline) without
too much pain.
-mike
Bob Friesenhahn
2014-01-15 01:11:34 UTC
Permalink
Post by Mike Frysinger
there's semi-precedence though with introducing new macros when there's no
AC_CHECK_FUNC beget
AC_CHECK_FUNCs beget
AC_CHECK_FUNCS_ONCE
same for HEADERS and DECLS. maybe time to beget a
AC_CHECK_FUNCS_ONCE_PARALLEL ? :) or maybe enshrine the ONCE behavior and
call it AC_CHECK_FUNCS_PARA. that'd cover a decent amount of ground (albeit,
not as much as would truly be possible from an interlocked pipeline) without
too much pain.
Unfortunately, that is only part of the problem. The main problem is
the cascading shell variable definitions which appear, are modified,
or even disappear as configure script statements are executed. Some
of these variable modifications are done as actions of the macros and
others are done by code added by the package developer.

Configure could be sped up by using a shared caching system for common
tests (e.g. standard header file existence) or perhaps even by
creating a new shell which is a closer fit to what configure needs.

Bob
--
Bob Friesenhahn
***@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Mike Frysinger
2014-01-15 03:12:44 UTC
Permalink
Post by Bob Friesenhahn
Post by Mike Frysinger
there's semi-precedence though with introducing new macros when there's no
AC_CHECK_FUNC beget
AC_CHECK_FUNCs beget
AC_CHECK_FUNCS_ONCE
same for HEADERS and DECLS. maybe time to beget a
AC_CHECK_FUNCS_ONCE_PARALLEL ? :) or maybe enshrine the ONCE behavior
and call it AC_CHECK_FUNCS_PARA. that'd cover a decent amount of ground
(albeit, not as much as would truly be possible from an interlocked
pipeline) without too much pain.
Unfortunately, that is only part of the problem. The main problem is
the cascading shell variable definitions which appear, are modified,
or even disappear as configure script statements are executed. Some
of these variable modifications are done as actions of the macros and
others are done by code added by the package developer.
right. my point is that the new macros come with new semantics that say "hey,
you can't do X or rely on Y". that way existing semantics stay the same.
Post by Bob Friesenhahn
Configure could be sped up by using a shared caching system for common
tests (e.g. standard header file existence) or perhaps even by
creating a new shell which is a closer fit to what configure needs.
config.site deployment runs into the same problem. some configure packages like
to test headers with varying defines and test the resulting behavior. we
prototyped a scaled/large deployment in Gentoo ... worked great most of the
time, but these edge cases kept us from deploying further :(.
-mike
Alexander Holler
2014-01-15 04:12:02 UTC
Permalink
Post by Bob Friesenhahn
Post by Alexander Holler
I just was curious if there was some progress on that topic besides
what Ralf Wildenhues seemed to have tried out.
Btw. I haven't found a recording of his talk at FOSDEM 2011, but I
assume it was similiar to that one:



with these slides:

http://www.gnu.org/ghm/2010/denhaag/slides-autotools-ghm-2010-talk.pdf
Post by Bob Friesenhahn
The most challenging aspect is because configure scripts have a huge
amount of dependencies (e.g. shell variable definitions) which only work
due to the sequential nature of the script. In order to parallize
configure, one would need to somehow assure that results are available
in the correct order. Such logic is normal in make files but not in
shell scripts.
Part of configure scripts is written by the developers of a package and
not from Autoconf and Autoconf has no way to predict the behavior of
code which is outside of its control.
Sure, people do all kind of stuff in build systems. But a lot of tests
which do fly by when configure is called do look like taken from a
standard repository. And many of those tests are pretty simple and
without any dependencies.
Post by Bob Friesenhahn
Regardless, configure scripts are not actually the main problem with
package build times on modern hardware. The main problem is that most
Depends on what you call modern hardware. If you build e.g. Squid (the
proxy) on some ARM while using a sd-card, it isn't really fast. Of
course, it's questionable if such a system would benefit a lot from
parallelized configure tests, but I still would assume such.
Post by Bob Friesenhahn
free software packages are not constructed correctly so that they can
take advantage of parallel builds.
Hmm, I'm using the parallel feature from gnu make very successful since
ext4 with support for nanosecond timestamps appeared (so since many
years). Without a fs with high resolution timestamps I had a lot of
problems, but with high resolution timestamps it works most of the time
like a charm.

Regards,

Alexander Holler
Bob Friesenhahn
2014-01-15 15:06:18 UTC
Permalink
Post by Alexander Holler
Sure, people do all kind of stuff in build systems. But a lot of tests
which do fly by when configure is called do look like taken from a
standard repository. And many of those tests are pretty simple and
without any dependencies.
It seems that all tests have dependencies. What tests do you think
don't have any dependencies? If I specify CC on the configure command
line, that may influence the outcome and demonstrates that there are
dependencies.
Post by Alexander Holler
Hmm, I'm using the parallel feature from gnu make very successful since
ext4 with support for nanosecond timestamps appeared (so since many
years). Without a fs with high resolution timestamps I had a lot of
problems, but with high resolution timestamps it works most of the time
like a charm.
This is perhaps off-topic for the Autoconf list. Packages with a lot
of recursion and less than 64 target object files per directory, or
with some targets which take substantially more time than other
targets to compile, do not perform as well as they should for parallel
builds on modern CPUs (with 4-64 cores available). If the object
files compile very quickly, then there is also less gain from parallel
builds since the build framework may take more time than the actual
compiles. Lastly, if there are many linking steps, the build time is
dramatically increased since linking is sequential in nature, and if
there is needless re-linking (a common problem with recursive builds)
then the problem is multiplied.

Bob
--
Bob Friesenhahn
***@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Alexander Holler
2014-01-15 19:54:59 UTC
Permalink
Post by Bob Friesenhahn
Post by Alexander Holler
Sure, people do all kind of stuff in build systems. But a lot of tests
which do fly by when configure is called do look like taken from a
standard repository. And many of those tests are pretty simple and
without any dependencies.
It seems that all tests have dependencies. What tests do you think
don't have any dependencies? If I specify CC on the configure command
line, that may influence the outcome and demonstrates that there are
dependencies.
I don't think it makes much sense to talk about the basic stuff. Of
course, running tests which do need the configured compiler options
(like testing for includes), should be done after the tests have run
which do discover the compiler options. So you will need some basic
blocks which are serialized. But after configure has found out compiler
options (which can be tested parallel too), many, many tests could be
parallelized. So, yes, you are right, there are dependencies I should
have mentioned. (Maybe the tests should just be organized by make to
have an easy way to declare dependencies ;) )

Anyway I didn't wanted to talk about how to do it, I was interested on
the state of such a feature I've seen some comments about 2 years ago.
Post by Bob Friesenhahn
Post by Alexander Holler
Hmm, I'm using the parallel feature from gnu make very successful since
ext4 with support for nanosecond timestamps appeared (so since many
years). Without a fs with high resolution timestamps I had a lot of
problems, but with high resolution timestamps it works most of the time
like a charm.
This is perhaps off-topic for the Autoconf list. Packages with a lot of
recursion and less than 64 target object files per directory, or with
some targets which take substantially more time than other targets to
compile, do not perform as well as they should for parallel builds on
modern CPUs (with 4-64 cores available). If the object files compile
very quickly, then there is also less gain from parallel builds since
the build framework may take more time than the actual compiles.
Lastly, if there are many linking steps, the build time is dramatically
increased since linking is sequential in nature, and if there is
needless re-linking (a common problem with recursive builds) then the
problem is multiplied.
I would assume it is off-topic. And talking about that doesn't make much
sense. Most people don't care about the build time when they organize
the files of their projects. And for a good reason. And it is granted
that make cant't invoke 64 compile threads, if there are only 4 source
files. ;)

But basically parallel builds do nowadays work just fine with make.

To come back on-topic, I have the feeling that many packages already do
spent more time in configure (if they use autotools), than they need to
actually build. Thats why I ended up here, asking for the state of
parallelized configure,

So for me everything is answered, thanks for the answers.

Regards,

Alexander Holler

Loading...