I see a gulf between "traditional musicians" and AI specialists working on AI music. It seems to me that the AI types haven't got their head round what makes music what it is. I'll give a simple example: Avid (for some reason) introduced AI chord recognition in a recent version of Sibelius. It looks 60 bars back for context. Why 60? (64 would make more sense).
What's needed to achieve what CyberGene suggested is appropriate convolution/feature engineering that's aligned to musical (say Jazz) features: phrases, 2-5-1, tritone subs, bracketing/enclosures, voicings, choice of extensions, repetition a semitone away etc. An AI should be able to learn that a dominant chord built on a degree other than V often has a #11 - but it has to be presented with data in that form, that makes those features visible.
Next step after that is for the AI to learn these features itself (deep learning), given just the raw data in say MIDI format. This would be equivalent (in a very abstract way) to some of the convolutional "pre-processing" that's used in front of neural networks.
The recent example on AI "Jazz" had none of this "feature awareness". I would bet three digits of currency that the AI was trained on 100msec snippets of audio, and it's just shuffling those around, like a LLM (ScatGPT?)
Cheers, Mike.