File set-numa-node.patch of Package irqbalance.13274
Author: Long Li <email@example.com>
Each NVMe device has information indicating which NUMA node it's attached.
On Azure, the "numa_node" is set to 0 for all NVMe devices.
This is correct as they attached to the 1st NUMA node, but irqbalance uses this information to try to assign the all IRQs for this device within this NUMA node. This is not correct because kernel has already assigned the IRQ affinity and hints.
I'm working on the following patch to fix irqbalance. This can't go upstream as it depends on the reverted commit to irqbalance. This doesn't affect all other devices where affinity_hint is not set. Please take a look and comment if I'm on the right track. This patch needs more testing.
For MSI interrupt with affinity hint, do not use device numa node as kernel already allocates its affinity mask and hint
@@ -669,6 +669,13 @@ static void build_one_dev_entry(const ch
new->type = IRQ_TYPE_MSIX;
+ * for MSI interrupt with affinity hint set, do
+ * notuse device's numa node, use affinity hint
+ * instead
+ if (!cpus_empty(new->affinity_hint))
+ new->numa_node = get_numa_node(-1);
} while (entry != NULL);