This could be what I've been looking for. I got a response within an hour of emailing, and they said that as long as all participants had solid/fast Internet speed, it is possible to have a producer, artist, engineer and 4-6 musicians, all in their own spaces, TRACKING!!!
"What's the point of living longer if you have to give up everything that makes life worth living to do so?" Einstein (supposedly)
Yes. Even assuming a hypothetically perfect connection--e.g., gigabit speed, no packet loss, tiny buffers at both endpoints and every switch in between--there is the matter of travel time from end to end. Under again perfect theoretical conditions (no latency introduced by buffers and switches, and a physical connection moving data at the speed of light), a packet would take 13 ms to cross the country. Figure in the round trip for real-time collaboration, and I'm hearing your response to my playing no less than 26 ms later than I would were we next to one another.* Real-world considerations like buffers and switches add to the figure considerably. As a starting point to understand those real-world conditions, most popular benchmarking tools (Speedtest et al) give a ping result in milliseconds which provides a baseline for round trip data travel time from your computer to the Internet. Ping runs pretty close to the metal compared to TCP packets, and the additional buffering introduced at the music application level will add more time still.
(*This comparison handwaves a bit. Even in the same room on acoustic instruments there is latency introduced by the time it takes sound to travel from your instrument to my ears, which is about 1 foot per millisecond.)
A few years ago I worked on a project where we tested how various latencies impacted musicians' experience playing live together. It wasn't all that scientific, but for me shed some light on the point at which the experience starts to feel compromised. We chose music where we thought timing was likely to matter most (e.g., keyboardist and bass player vamping on a funk pattern more so than, say, a rubato chamber music passage). We isolated the two musicians, gave them headphone feeds of one another, and used hardware delays (TC Electronic stompboxes) with millisecond precision to introduce latency. We randomly chose different values between 0 and 50 ms in each direction (rather than say, walking up from zero ms), and after 8 bars or so asked the musicians to evaluate the experience. Full disclosure: in several of the tests I was one of the test subjects,
While this subjective approach and a limited number of test subjects obviously wouldn't lead to any airtight conclusions, we found we were happiest under 10ms, and found the latency annoying above 20 ms or so. "Annoying" is the operative word here. Much like muscling through rough stage situations with a bad monitor mix, with concentration we felt like we were able to stay in the pocket at higher latencies. Even 25 ms wasn't a dealkiller for recording a serviceable groove once we got used to the lag, but one has to wonder if this would bring out the best in musicians. Speaking for myself, I felt much happier and less inhibited when the latency was 10ms or less, but I think in a pinch it's certainly possible to make music at higher latencies. Maybe just not as much fun.
I pitched this to my original band mostly for mixdown.
We all live in wildly different places geographically....ranging from Godfrey, IL to Washington, MO (google map it....roughly 80 miles) but meet centrally for rehearsal and writing. When it starts to come down to finalizing tracks, mixdown, etc, we've gotten pretty good at file sharing and working individually. I regularly put the latest rough mix on a track, write my own either new parts or parts intended to replace what's in the mix, isolate and export, and send off to be imported into the official project. We are at a stage now where we can't get together and everybody has opinions on the 99.9% done mixes of our first couple tracks. It was suggested we just all get together for a mix session rather than keep going back and forth, but something like this would be perfect.
While we get together as needed throughout the process, often it's the singer and bass player getting together for mixing/editing. Even just for that, the 2 of them could probably benefit from this.
Acoustic/Electric stringed instruments ranging from 4 to 230 strings, hammered, picked, fingered, slapped, and plucked. Analog and Digital Electronic instruments, reeds, and throat/mouth.
Yes, that's very interesting! Is that published anywhere?
Originally Posted by Irena
Ping runs pretty close to the metal compared to TCP packets, and the additional buffering introduced at the music application level will add more time still.
I thought real-time audio/video protocols were all built on top of UDP. And we know that audio interfaces can manage pretty low latencies. At least for, say, people in the same metro area, 20ms seems like it could be doable if everything was carefully set up?
Come to think of it, you are right bfields. Thanks for clarifying that. Right, UDP is a lean-and-mean asynchronous transport layer which leaves any needed reliability logic to whatever sits on top of it. I think one general takeaway though is that any "reliable" transport feature set--whether it's general-purpose TCP or something designed specifically for real-time audio/video like RTP--is likely to introduce additional latency beyond a benchmarking tool like ping's lightweight ICMP stack. Avoiding starved buffers, sequencing arriving datagrams, etc. all end up costing a little bit of time.
Maybe 20ms is possible if people are geographically close, though I can't say I'm optimistic based on my own experience. For the project i was working on, building an audio stack with the lowest latency possible was in fact the primary objective. In that case the two principal audio engineers chose UDP and spent several months on a bespoke ring buffer design (NB: I am on the outer envelope of my technical understanding here!) to see how quickly they could move 16-bit uncompressed audio from end to end. The stack was tested and refined over a period of years, and used with participants (spoken conversations more so than making music) geographically close to one another, halfway around the world, and many scenarios in between. I can't say the engineers did everything that could possibly be done to reduce latency, but I will say were specialists in this area and that they plugged away at it for a long stretch.
Whenever we would test that protocol, we stacked the deck as much as possible for speed (e.g., no bluetooth headsets, and even Wifi was verboten; all participants were required to be hardwired), and I can recall instances where we were excited to see end-to-end latency drop under 40ms without noticeable dropout from starved buffers. That's still an eternity for a musician, but compared to 200-400 ms latency on a cell phone call, it's a magical experience for conversation. In those <40ms cases, the participants were in the same metro area, and showed <15ms ping times on both ends.
It's possible that this protocol stack left a few milliseconds on the table that some other smart engineers can address, but to me it still seems like a stretch to get things fast enough to make real-time playing fun and natural with other musicians. But back to the spirit of this thread, under these unprecedented circumstances I can easily imagine putting up with longer latency times if rehearsals are important, and it's great to see the industry stepping up their efforts right now to give us viable options..