🛠️ Fixing a broken Fedora 38 to 40 upgrade

I recently upgraded my gaming computer from Fedora 38 to 40 and it didn't go as planned. I went through a bunch of pain to attempt to fix my partially-completed upgrade, but luckily I was able to complete the upgrade. Here, I took some notes of the process I used to rescue my system.

More regular programming will resume eventually. I have another Fedora-related bugbear to write about (related to Asahi Linux)... but that's for another day.

Disclaimer

This is a very specific set of steps that I took to fix my system. I can't guarantee that it will work for you, but it might be worth a shot if you're in a similar situation. Your mileage may vary. If you're in doubt, I would ask for help on the Fedora forums.

Waiting for an upgrade to complete

First, I booted the system and ran dnf upgrade to see if there were any packages that needed to be updated. There were (the machine hadn't been powered on since February), and I installed those and rebooted. So far, so good.

Now, we need to begin the actual upgrade process using the GNOME Software application, as recommended in the Fedora documentation. Everything was downloaded successfully and I was able to reboot the machine. The upgrade started but did not progress much beyond 52% completion. I let the upgrade sit there for 45 minutes. After about 10 minutes of waiting, I decided to power off the machine and reboot, since the upgrade appeared stuck with no output or progress.

Big mistake.

The reboot

After rebooting, the system was in a bad state. The system booted with a Fedora 38 kernel. I could not log into the desktop. When I switched to a shell with Ctrl+Alt+F2, the system claimed to be on Fedora Linux 40. I was able to log in as my user on the shell successfully, which did tell me that there was a chance to recover the system, or at the very least, backup my data before a reinstall.

So we have a system in a partially upgraded state: some chunk of the system is from Fedora 38, some other chunk is from Fedora 40. Let's try to fix it.

Fixing the partially upgraded system

My first step was to try the obvious thing, and run dnf distro-sync --releasever=40. This command should sync the system to the Fedora 40 release. Unfortunately, this command failed with a deceptively simple error:

Traceback (most recent call last):
  File "/usr/bin/dnf", line 57, in <module>
    from dnf.cli import main
ModuleNotFoundError: No module named 'dnf'

Oh dear. Our package manager is broken. Are we screwed?

A Python scavenger hunt

I guessed that in the intervening period, the system Python version had been upgraded. (It turns out that Fedora 39 upgraded to Python 3.12, up from Python 3.11 in Fedora 38. But it was late into the night and looking over changelogs was the last thing I wanted to do.) I opened a Python shell and confirmed that the system Python version was indeed 3.12.

Now we just need to make sure the dnf module actually existed somewhere, otherwise we would need to manually re-install dnf via rpm before we made any progress. While there's a more elegant, Pythonic way of doing it, I examined /usr/lib/python3.12/site-packages and confirmed that the dnf module was absent, and then tried /usr/lib/python3.11/site-packages, and discovered that the module existed. I didn't know this at the time, but we now know that dnf was not yet upgraded at the time I rebooted the system. From there, I made my first leap of faith: I patched /usr/bin/dnf to invoke python3.11 instead of python.

Now we can run dnf again. This time, the command runs successfully. Our system is far from fixed, but we now have a working package manager, which we will need to fix the rest of the system.

Fooling the system into thinking it's Fedora 40

Now that dnf works, we can attempt to fix the system. That is where I found a really helpful forum post from a user experiencing a rather similar issue to me, except I had the dnf hoop to jump through.

First, we need to reinstall the fedora-release package using the "correct" version. This package contains the release information for the system, and since it's probably not correct, it's important for what we need to do in order to proceed. We can do this with the following command:

dnf --releasever=40 reinstall fedora-release-\*

Now, dnf should believe that we are "supposed" to be on Fedora 40. I'm still using --releasever=40 to ensure that we're installing the Fedora 40 packages, but it should not be necessary once you run this command.

Note that you will need internet access for this to work. Luckily, the system was connected via Ethernet, so I didn't have to worry about configuring that myself, presumably as NetworkManager was working correctly. (If you are on Wi-Fi, you might have to configure the network manually.)

The strategy

The system has duplicate packages (incompletely-removed Fedora 38 packages that were upgraded to Fedora 40, that dnf hasn't cleaned up), and packages that simply haven't been upgraded to Fedora 40. Our goal is to remove all the duplicate Fedora 38 packages, finish up the upgrade using dnf distro-sync, and then reboot to get to a hopefully-working Fedora 40 system.

Removing duplicate packages

First, let's remove all the duplicate packages. We can do this with the following command:

dnf remove --duplicates --releasever=40

In theory, this command should remove all the Fedora 38 packages and install the Fedora 40 equivalents. However, it didn't work for me, initially due to multi-arch conflicts. At this point, I wanted to prioritize safety over speed, so I opted to resolve each conflict by hand to ensure that I knew what I did, instead of specifying any dnf options to go "nuclear" (like permitting broken dependencies).

After all these steps, I was able to run the dnf remove --duplicates --releasever=40 command successfully. This command removed all the duplicate Fedora 38 packages from the system, and installed the Fedora 40 equivalents.

The distro-sync and solving the mystery of my long upgrade

Now that we have removed all the duplicate packages, we can run the dnf distro-sync --releasever=40 command. This command will upgrade all the packages that have yet to be upgraded to Fedora 40. This command ran successfully for me, and I was able to reboot the system.

But it was taking quite a while for the upgrade to complete. In particular, it was stuck on running a script for smartmontools-selinux. After a few minutes, I open up another shell and run htop.

Turns out, 100% of a CPU core is being pegged, running restorecon. That means the system is relabeling the entire filesystem. Luckily, patience was key, and after another 30 minutes, the relabeling process completed and the rest of the upgrade proceeded smoothly. I did wonder whether or not this was why the upgrade had taken so long, since there was essentially no feedback for a lengthy period of time.

Once the command completed, I rebooted the system, hoping for a working system.

Almost there

The reboot was successful, however the dracut initramfs let me know that the system was still not fully upgraded, letting me know that Fedora 38 was end-of-life. However, I had a fully working system at this point. I still wanted to fix this issue, though.

I had to find the latest kernel-core version (ref) and reinstall it with dnf reinstall kernel-core. After this, I rebooted the system and saw that the new kernel was present and ready to be booted into.

My nightmare is over. I have a fully-upgraded Fedora 40 system.

So, what did I learn?