From 480c8d199166b2f8cd20e6e245d8a019329ec466 Mon Sep 17 00:00:00 2001 From: Greg Price Date: Sun, 29 Mar 2020 15:46:12 -0700 Subject: [PATCH] cpython: Optimize dynamic symbol tables, for a 6% speedup. I took a close look at how Debian builds the Python interpreter, because I noticed it ran substantially faster than the one in nixpkgs and I was curious why. One thing that I found made a material difference in performance was this pair of linker flags (passed to the compiler): -Wl,-O1 -Wl,-Bsymbolic-functions In other words, effectively the linker gets passed the flags: -O1 -Bsymbolic-functions Doing the same thing in nixpkgs turns out to make the interpreter run about 6% faster, which is quite a big win for such an easy change. So, let's apply it. --- I had not known there was a `-O1` flag for the *linker*! But indeed there is. These flags are unrelated to "link-time optimization" (LTO), despite the latter's name. LTO means doing classic compiler optimizations on the actual code, at the linking step when it becomes possible to do them with cross-object-file information. These two flags, by contrast, cause the linker to make certain optimizations within the scope of its job as the linker. Documentation is here, though sparse: https://sourceware.org/binutils/docs-2.31/ld/Options.html The meaning of -O1 was explained in more detail in this LWN article: https://lwn.net/Articles/192624/ Apparently it makes the resulting symbol table use a bigger hash table, so the load factor is smaller and lookups are faster. Cool. As for -Bsymbolic-functions, the documentation indicates that it's a way of saving lookups through the symbol table entirely. There can apparently be situations where it changes the behavior of a program, specifically if the program relies on linker tricks to provide customization features: https://bugs.launchpad.net/ubuntu/+source/xfe/+bug/644645 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=637184#35 But I'm pretty sure CPython doesn't permit that kind of trick: you don't load a shared object that tries to redefine some symbol found in the interpreter core. The stronger reason I'm confident using -Bsymbolic-functions is safe, though, is empirical. Both Debian and Ubuntu have been shipping a Python built this way since forever -- it was introduced for the Python 2.4 and 2.5 in Ubuntu "hardy", and Debian "lenny", released in 2008 and 2009. In those 12 years they haven't seen a need to drop this flag; and I've been unable to locate any reports of trouble related to it, either on the Web in general or on the Debian bug tracker. (There are reports of a handful of other programs breaking with it, but not Python/CPython.) So that seems like about as thorough testing as one could hope for. --- As for the performance impact: I ran CPython upstream's preferred benchmark suite, "pyperformance", in the same way as described in the previous commit. On top of that commit's change, the results across the 60 benchmarks in the suite are: The median is 6% faster. The middle half (aka interquartile range) is from 4% to 8% faster. Out of 60 benchmarks, 3 come out slower, by 1-4%. At the other end, 5 are at least 10% faster, and one is 17% faster. So, that's quite a material speedup! I don't know how big the effect of these flags is for other software; but certainly CPython tends to do plenty of dynamic linking, as that's how it loads extension modules, which are ubiquitous in the stdlib as well as popular third-party libraries. So perhaps that helps explain why optimizing the dynamic linker has such an impact. --- .../python/cpython/2.7/default.nix | 7 ++++++ .../interpreters/python/cpython/default.nix | 25 +++++++++++++++++++ 2 files changed, 32 insertions(+) diff --git a/pkgs/development/interpreters/python/cpython/2.7/default.nix b/pkgs/development/interpreters/python/cpython/2.7/default.nix index e9d6c97cfd4..ca4fae51269 100644 --- a/pkgs/development/interpreters/python/cpython/2.7/default.nix +++ b/pkgs/development/interpreters/python/cpython/2.7/default.nix @@ -100,6 +100,13 @@ let # libuuid, slowing down program startup a lot). ./no-ldconfig.patch + # Optimize symbol tables for the sake of dynamic linking. + # Significant for Python because of extension modules. + (fetchpatch { + url = "https://salsa.debian.org/cpython-team/python3/-/raw/27103a32e/debian/patches/link-opt.diff"; + sha256 = "0vp36276ndbrwr7882vg7vjd61c8mv7bqgal6bbh2fimp6zlkdhv"; + }) + ] ++ optionals stdenv.hostPlatform.isCygwin [ ./2.5.2-ctypes-util-find_library.patch ./2.5.2-tkinter-x11.patch diff --git a/pkgs/development/interpreters/python/cpython/default.nix b/pkgs/development/interpreters/python/cpython/default.nix index b860e357b04..dc3997481be 100644 --- a/pkgs/development/interpreters/python/cpython/default.nix +++ b/pkgs/development/interpreters/python/cpython/default.nix @@ -97,6 +97,31 @@ in with passthru; stdenv.mkDerivation { # (since it will do a futile invocation of gcc (!) to find # libuuid, slowing down program startup a lot). (./. + "/${sourceVersion.major}.${sourceVersion.minor}/no-ldconfig.patch") + ] ++ optionals stdenv.isLinux [ + # Optimize symbol tables for the sake of dynamic linking. + # Significant for Python because of extension modules. + ( + if pythonAtLeast "3.8" then + fetchpatch { + url = "https://salsa.debian.org/cpython-team/python3/-/raw/3.8.3rc1-1/debian/patches/link-opt.diff"; + sha256 = "0va85318nahnqgydwjs7723h8gx41inbdawdy6v4hiykzgc8s7vs"; + } + else if isPy37 then + fetchurl { + url = "https://salsa.debian.org/cpython-team/python3/-/raw/3.7.6-1/debian/patches/link-opt.diff"; + sha256 = "1aqvsc0p3sxnfsi8jz7537wl6v95v26ba4nflwvmn5lxlc3y3g13"; + } + else if isPy36 then + fetchpatch { + url = "https://salsa.debian.org/cpython-team/python3/-/raw/3.6.8-1/debian/patches/link-opt.diff"; + sha256 = "1nhdrgla75ily9gk7xx0crxa7ynqzks0djxk36sa3lgg5w8vjvyr"; + } + else + fetchpatch { + url = "https://salsa.debian.org/cpython-team/python3/-/raw/27103a32e/debian/patches/link-opt.diff"; + sha256 = "0vp36276ndbrwr7882vg7vjd61c8mv7bqgal6bbh2fimp6zlkdhv"; + } + ) ] ++ optionals (isPy35 || isPy36) [ # Determinism: Write null timestamps when compiling python files. ./3.5/force_bytecode_determinism.patch