File set-numa-node.patch of Package irqbalance.13274

References: bsc#1119461
Author: Long Li <longli@microsoft.com>

Each NVMe device has information indicating which NUMA node it's attached.

e.g. /sys/block/nvmeXn1/device/device/numa_node

On Azure, the "numa_node" is set to 0 for all NVMe devices.

This is correct as they attached to the 1st NUMA node, but irqbalance uses this information to try to assign the all IRQs for this device within this NUMA node. This is not correct because kernel has already assigned the IRQ affinity and hints.

I'm working on the following patch to fix irqbalance. This can't go upstream as it depends on the reverted commit to irqbalance. This doesn't affect all other devices where affinity_hint is not set. Please take a look and comment if I'm on the right track. This patch needs more testing.


Patch:

For MSI interrupt with affinity hint, do not use device numa node as kernel already allocates its affinity mask and hint

--- a/classify.c
+++ b/classify.c
@@ -669,6 +669,13 @@ static void build_one_dev_entry(const ch
 				if (!new)
 					continue;
 				new->type = IRQ_TYPE_MSIX;
+				/*
+				 * for MSI interrupt with affinity hint set, do
+				 * notuse device's numa node, use affinity hint
+				 * instead
+				 */
+				if (!cpus_empty(new->affinity_hint))
+					new->numa_node = get_numa_node(-1);
 			}
 		} while (entry != NULL);
 		closedir(msidir);
openSUSE Build Service is sponsored by