I took a close look at how Debian builds the Python interpreter,
because I noticed it ran substantially faster than the one in nixpkgs
and I was curious why.
One thing that I found made a material difference in performance was
this pair of linker flags (passed to the compiler):
-Wl,-O1 -Wl,-Bsymbolic-functions
In other words, effectively the linker gets passed the flags:
-O1 -Bsymbolic-functions
Doing the same thing in nixpkgs turns out to make the interpreter
run about 6% faster, which is quite a big win for such an easy
change. So, let's apply it.
---
I had not known there was a `-O1` flag for the *linker*!
But indeed there is.
These flags are unrelated to "link-time optimization" (LTO), despite
the latter's name. LTO means doing classic compiler optimizations
on the actual code, at the linking step when it becomes possible to
do them with cross-object-file information. These two flags, by
contrast, cause the linker to make certain optimizations within the
scope of its job as the linker.
Documentation is here, though sparse:
https://sourceware.org/binutils/docs-2.31/ld/Options.html
The meaning of -O1 was explained in more detail in this LWN article:
https://lwn.net/Articles/192624/
Apparently it makes the resulting symbol table use a bigger hash
table, so the load factor is smaller and lookups are faster. Cool.
As for -Bsymbolic-functions, the documentation indicates that it's a
way of saving lookups through the symbol table entirely. There can
apparently be situations where it changes the behavior of a program,
specifically if the program relies on linker tricks to provide
customization features:
https://bugs.launchpad.net/ubuntu/+source/xfe/+bug/644645https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=637184#35
But I'm pretty sure CPython doesn't permit that kind of trick: you
don't load a shared object that tries to redefine some symbol found
in the interpreter core.
The stronger reason I'm confident using -Bsymbolic-functions is
safe, though, is empirical. Both Debian and Ubuntu have been
shipping a Python built this way since forever -- it was introduced
for the Python 2.4 and 2.5 in Ubuntu "hardy", and Debian "lenny",
released in 2008 and 2009. In those 12 years they haven't seen a
need to drop this flag; and I've been unable to locate any reports
of trouble related to it, either on the Web in general or on the
Debian bug tracker. (There are reports of a handful of other
programs breaking with it, but not Python/CPython.) So that seems
like about as thorough testing as one could hope for.
---
As for the performance impact: I ran CPython upstream's preferred
benchmark suite, "pyperformance", in the same way as described in
the previous commit. On top of that commit's change, the results
across the 60 benchmarks in the suite are:
The median is 6% faster.
The middle half (aka interquartile range) is from 4% to 8% faster.
Out of 60 benchmarks, 3 come out slower, by 1-4%. At the other end,
5 are at least 10% faster, and one is 17% faster.
So, that's quite a material speedup! I don't know how big the
effect of these flags is for other software; but certainly CPython
tends to do plenty of dynamic linking, as that's how it loads
extension modules, which are ubiquitous in the stdlib as well as
popular third-party libraries. So perhaps that helps explain why
optimizing the dynamic linker has such an impact.
Without this flag, the configure script prints a warning at the end,
like this (reformatted):
If you want a release build with all stable optimizations active
(PGO, etc), please run ./configure --enable-optimizations
We're doing a build to distribute to people for day-to-day use,
doing things other than developing the Python interpreter. So
that's certainly a release build -- we're the target audience for
this recommendation.
---
And, trying it out, upstream isn't kidding! I ran the standard
benchmark suite that the CPython developers use for performance
work, "pyperformance". Following its usage instructions:
https://pyperformance.readthedocs.io/usage.html
I ran the whole suite, like so:
$ nix-shell -p ./result."$variant" --run '
cd $(mktemp -d); python -m venv venv; . venv/bin/activate
pip install pyperformance
pyperformance run -o ~/tmp/result.'"$variant"'.json
'
and then examined the results with commands like:
$ python -m pyperf compare_to --table -G \
~/tmp/result.{$before,$after}.json
Across all the benchmarks in the suite, the median speedup was 16%.
(Meaning 1.16x faster; 14% less time).
The middle half of them ranged from a 13% to a 22% speedup.
Each of the 60 benchmarks in the suite got faster, by speedups
ranging from 3% to 53%.
---
One reason this isn't just the default to begin with is that, until
recently, it made the build a lot slower. What it does is turn on
profile-guided optimization, which means first build for profiling,
then run some task to get a profile, then build again using the
profile. And, short of further customization, the task it would use
would be nearly the full test suite, which includes a lot of
expensive and slow tests, and can easily take half an hour to run.
Happily, in 2019 an upstream developer did the work to carefully
select a more appropriate set of tests to use for the profile:
https://github.com/python/cpython/commit/4e16a4a31https://bugs.python.org/issue36044
This suite takes just 2 minutes to run. And the resulting final
build is actually slightly faster than with the much longer suite,
at least as measured by those standard "pyperformance" benchmarks.
That work went into the 3.8 release, but the same list works great
if used on older releases too.
So, start passing that --enable-optimizations flag; and backport
that good-for-PGO set of tests, so that we use it on all releases.
The ./configure script prints a warning when passed this flag,
starting with 3.7:
configure: WARNING: unrecognized options: --with-threads
The reason is that there's no longer such a thing as a build
without threads.
Eliminate the warning, by only passing the flag on the older releases
that accept it.
Upstream change and discussion:
https://github.com/python/cpython/commit/a6a4dc816https://bugs.python.org/issue31370
This reverts commit 5e8545e72341887bb371407a71a723bc0e9c7844.
It breaks eval:
attribute 'rev' missing, at /var/lib/ofborg/checkout/repo/38dca4e3aa6bca43ea96d2fcc04e8229/mr-est/eval-0-gleber.ewr1.nix.ci/pkgs/top-level/make-tarball.nix:106:39
New features
core: add variable "old_full_name" in buffer, set during buffer renaming (issue weechat/weechat#1428)
core: add debug option "-d" in command /eval (issue weechat/weechat#1434)
api: add functions crypto_hash and crypto_hash_pbkdf2
api: add info "auto_connect" (issue weechat/weechat#1453)
api: add info "weechat_headless" (issue weechat/weechat#1433)
buflist: add pointer "window" in bar item evaluation
irc: add support of fake servers (no I/O, for testing purposes)
relay: accept hash of password in init command of weechat protocol with option "password_hash" (PBKDF2, SHA256, SHA512)
relay: reject client with weechat protocol if password or totp is received in init command but not set in WeeChat (issue weechat/weechat#1435)
Bug fixes
core: fix memory leak in completion
core: flush stdout/stderr before forking in hook_process function (issue weechat/weechat#1441)
core: fix evaluation of condition with nested "if" (issue weechat/weechat#1434)
irc: split AUTHENTICATE message in 400-byte chunks (issue weechat/weechat#1459)
irc: copy temporary server flag in command /server copy
irc: add nick changes in the hotlist (except self nick change)
irc: case-insensitive comparison on incoming CTCP command, force upper case on CTCP replies (issue weechat/weechat#1439)
irc: fix memory leak when the channel topic is changed
logger: fix crash when logging is disabled on a buffer and the log file was deleted in the meanwhile, when option logger.file.info_lines is on (issue weechat/weechat#1444)
php: fix crash when loading script with PHP 7.4 (issue weechat/weechat#1452)
relay: update buffers synchronization when buffers are renamed (issue weechat/weechat#1428)
script: fix memory leak in read of script repository file if it has invalid content
script: fix unexpected display of scripts list in buffer with command /script list -i
xfer: send signal "xfer_ended" after the received file has been renamed (issue weechat/weechat#1438)
Tests
scripts: fix generation of test scripts with Python 3.8
unit: add tests on IRC protocol functions and callbacks
unit: add tests on function secure_derive_key
unit: add tests on functions util_get_time_diff and util_file_get_content
Build
core: fix Cygwin build
guile: add detection of Guile 3.0.0 (issue weechat/weechat#1442)
irc: fix build with GnuTLS < 3.1.0 (issue weechat/weechat#1431)
php: add detection of PHP 7.4
ruby: add detection of Ruby 2.7 (issue weechat/weechat#1455)