Fixing fine-grained power management for NVIDIA laptops running Wayland

2024-03-02

Are you tired of your laptop's dedicated GPU drawing too much power when idle? As of writing this, the Arch wiki says: "Too bad, downgrade your GPU drivers", but I couldn't take that for an answer, SPOILER: because it didn't work. Allow me to share with you, a summary of my 4-day long journey troubleshooting and researching this issue, so that you can finally have proper battery life.

Hardware

I have a Late 2020 model Razer Blade Stealth 13. It has an Intel Core i7-1165G7 and Nvidia GeForce GTX 1650 Ti. Not very high end, but it suits my needs, and runs Hyprland, my compositor of choice, without a struggle. However, if you are in the market for a Linux laptop, do not buy a Razer laptop. Why? No support for changing Secure Boot signing keys, BIOS upgrades are managed via Razer Synapse on Windows which requires a registered Razer account to use, and many other reasons.

On the bright side, this machine's GPU supports PCI-Express Runtime D3. This means it can enter a low-power state known as "D3cold", where the GPU will fully power down until it is ready to be used again. We can easily check our GPU's current power state, but first we must determine the GPU's PCI bus ID with this command: lspci -nn | grep NVIDIA

Image of terminal displaying the output of the lspci command

Now we can check the power state with this command: cat /sys/bus/pci/devices/0000:<your_gpu_bus_id>/power_state
(Make sure to escape colons with a backslash)

Image of terminal displaying the GPU currently in the D0 state

For me it returns D0 , meaning the GPU is awake. If it returns D3cold, it is powered down. It most likely says D0 for you right now, otherwise why are you reading this? just kidding

The problem

As noted previously, the Arch Linux wiki states that NVIDIA's 535xx drivers have a bug that prevents the dGPU from shutting down. However, I have experienced this bug on versions before 535xx.

The problem lies within NVIDIA's EGL library. Wayland compositors automatically load EGL libraries via ICD loader files listed in the __EGL_VENDOR_LIBRARY_FILENAMES environment variable. These ICD loader files point to the location of their corresponding shared object libraries. NVIDIA's EGL library is loaded regardless if it is the primary GPU or not, keeping it in an active state, letting it continuously draw power (10-12W on my machine). The default value of this environment variable a typical distro with this hybrid graphics configuration would be: /usr/share/glvnd/egl_vendor.d/50_mesa.json:/usr/share/glvnd/egl_vendor.d/10_nvidia.json

The solution

The funny thing is, we don't actually need NVIDIA's EGL library, so we can just exclude it from the environment variable. However, if you're a NixOS user like me, its a bit different.

On NixOS, the paths for both the loader files and the shared objects are different because they are contained in the Nix store, and as a result, will change every time the package is upgraded. With the command find /nix/store -type f -name "50_mesa.json", we can find where 50_mesa.json located.

Image of terminal displaying the locations of the ICD loader files

On my machine, this file in two different locations. If we list the contents of either of these files, you will see the location of the shared objects are also in the Nix store.

Image of my terminal displaying the contents of the ICD loader files

So how do we set the environment variable declaratively? Make the following changes to /etc/nixos/configuration.nix,

environment.sessionVariables = {
  "__EGL_VENDOR_LIBRARY_FILENAMES" = "${pkgs.mesa.drivers}/share/glvnd/egl_vendor.d/50_mesa.json";
};

Since the format of the Nix store path is <hash>-mesa-<version_number>-drivers, the output here would be drivers. Package outputs are better explained at this section of the NixOS manual. Once we've set this, we need to rebuild our configuration with sudo nixos-rebuild switch, if your system configuration is a flake, make sure to include it in the --flake parameter. After the rebuild, reboot your system to boot the new generation with the changes.

Now lets check the environment variable. You will notice that the entry containing 10_nvidia.json is no longer included, meaning our changes were applied.

Image of terminal displaying the changes made to the environment variables

Now lets check our GPU's power state again.

Image of terminal displaying the GPU currently in the D3cold state
We have achieved the coveted D3Cold state. Now we can enjoy the benefits of hybrid graphics while maintaining a longer lasting battery.