Just to make it clear why memory bandwidth is so important:
While the AI / LLM token generation (tg) (the generation of text) depends on the memory bandwidth, the prompt processing (pp) (the processing of input text) depends on the compute of the GPU (basically, the GPU's 3DMark score / FPS performance), but this, again, depends on the memory bandwidth (as you can see in the quote above).
That doesn't mean that a workstation GPU, like a RTX PRO 6000 (96 GB VRAM) (same GB202 chip as a 5090 (32 GB VRAM)), can't have 3 times the VRAM memory, which it has, but it also has 1.8 TB/s of memory bandwidth (thanks to its 512-bit chip and using the, much faster per bit, GDDR7 VRAM chips).
If this N1X or Strix Halo had a bigger variant, using the same 512-bit memory bus width, its memory bandwidth would only be:
546 GB/s = 256-bit * 8533 MT/s / 8 / 1000.
(again, a RTX 5090, using the same 512-bit wide memory bus, has 1.8 TB/s)
This APU, on the other hand, uses LPDDR5X memory, so while it can have up to 128 GB RAM (not VRAM, that uses GDDR6 or GDDR7), its system memory bandwidth is still rather (very) slow at 273 GB/s.
As such, depending on how big your input text size is, 273 GB/s, or a 4060 Laptop of performance that this memory bandwidth allows for, may take a while to process the amount of text.
On reddit.com/r/localllama many users don't seem very happy with their Strix Halo' (=256 GB/s memory bandwidth) prompt processing speed in e.g. agentic workloads.
What you instead may want to build for AI / LLMs is a simple desktop gaming PC with as fast of a GPU and as of VRAM that you want to afford. If AI / LLMs are a priority, look at the workstation cards (RTX PRO 4000, PRO 4500, PRO 5000 and the mentioned PRO 6000, tho they are expensive), because they have double or more VRAM (see given example of a 5090 (32GB VRAM) vs PRO 6000 (96GB VRAM), but it's the same GB202 chip (see en.wikipedia.org/wiki/Blackwell_(microarchitecture))), than the desktop mainstream gaming GPUs using the same GPU chips.