Recording Virtual Choirs / Performances

JD

Well-Known Member
With most of the entertainment industry in shut-down, I have found myself doing a lot of video and audio recordings for virtual groups and choirs. Although I have some background in video, most of my abilities are in the live entertainment world. After doing a short search, I did not see a thread for what is becoming (hopefully temporary) new area in our world. So, here is the thread. Maybe we can share tricks, such as how to keep everyone in sync, as well as other technical issues.
So far, my setup is very basic, a few cameras, a laptop, Pinnacle software, the usual microphones, and a lot of effort and time. Some tricks are simple (get everyone to clap at the same point in time for sync.) Others, such as people who vary their pace are a bit more difficult.
Let's share!
 
I've jumped into video with both feet during Covid 19, for church services. The other day, someone hinted at doing a virtual choir. Normally, I'm fairly accommodating to their needs and wishes, but I responded that I would not take that project on. I'm sure it takes a considerable amount of time and I'm already spending my entire Saturdays shooting and editing video.
 
Trying to get the editing time below 30 hours! I figure there are a lot of shortcuts I don't know about, but so far what I am doing is exporting the audio, trying to fix that in an audio editor while trying to do something with around 30 video tracks. I break it up into a number of sub-projects, but I don't see any good shortcuts as there is just a lot of manual labor.
 
There is no trick to editing it is tedious and time consuming. The only trick is the more you do it the faster you get with the mouse clicks.
 
I know of one thing that might help. I've been using Volume Normalizer Master, from A4Video.com. It adjusts the audio embedded in video files to the proper LUFS level (loudness units relative to full scale), using the EBU R128 standard. This is normalization of true, perceived loudness, not peak normalization. LUFS is the standard for broadcast audio now, especially since mechanical VU and PPM meters have largely disappeared in most studios. I use it as a batch converter to do a bunch of files before starting an editing project.

Some video editiors do peak normalization. Peak normalization is nearly useless because two files with the same peak level can sound totally different in terms of loudness, especially when compression or peak limiting is applied. If there's video editing software that can normalize with LUFS, I'd love to know about it.

For doing a choir, I'd normalize all the files to -20 LUFS, and the singers should blend very well. VNM works fastest when converting the video file output to AVI. Some output file types don't seem to work at all. The software has some rough edges, but it's the best I've found for audio in video files, and the price is reasonable.
 
I know of one thing that might help. I've been using Volume Normalizer Master, from A4Video.com. It adjusts the audio embedded in video files to the proper LUFS level (loudness units relative to full scale), using the EBU R128 standard. This is normalization of true, perceived loudness, not peak normalization. LUFS is the standard for broadcast audio now, especially since mechanical VU and PPM meters have largely disappeared in most studios. I use it as a batch converter to do a bunch of files before starting an editing project.

Some video editiors do peak normalization. Peak normalization is nearly useless because two files with the same peak level can sound totally different in terms of loudness, especially when compression or peak limiting is applied. If there's video editing software that can normalize with LUFS, I'd love to know about it.

For doing a choir, I'd normalize all the files to -20 LUFS, and the singers should blend very well. VNM works fastest when converting the video file output to AVI. Some output file types don't seem to work at all. The software has some rough edges, but it's the best I've found for audio in video files, and the price is reasonable.

Late to the game on this thread, but I'll pop in to say you're right. I don't believe there is a platform that natively normalizes to LUFS, but there are plugins to monitor on a number of scales for just about any DAW. Adobe Audition has a native plugin monitor that's not terrible and there are plenty available for Pro-Tools. Your solution from A4Video is probably the best quick and dirty solution without doing it in the box in the mix. The biggest mistake I see with greenhorn engineers in the studio is the misconception between headphone volume, live monitoring volume and output volume. Always keep a watchful eye on your output meter and especially when mixing you should switch monitoring sources between your input, your mix and your output and how to understand the differences. That's the process of mastering and if one of those is out of balance you'll be fighting uphill on a slippery slope trying to fix it.

I'm right there with you on the front lines trying to synchronize voip sessions for remote theater style production and without breaking the bank it's not easy.... all the while my 4k studio and sound stage lies dormant and hybernating while keeping everyone socially distant....
 
I did about 5 of these in the last couple months. Getting better at it, but each project is a grind.

Here are a few of my tips:

The singers should NOT record the music into their own track. They wear headphones, and sing into another device. For my high school kids, it was listen on a laptop with earbuds or headphones and sing into their phones. We also asked them to record in the bathroom to get the similar room sound. Get them away from windows!

Prepping the source music track is a very important part. Instead of hand clapping, have the accompanist tap on a hard surface a time measure, 1,2,3,4, pause for a measure, and then start playing. Have all the singers use that same click file to sing to. They should wear headphones so that the accompaniment audio is not in their singing file EXCEPT for the clicks, which they can record by holding their headphone (earbud most likely) to the microphone, which is likely their phone. This way you get a file that is MUCH easier to line up with the music track. The clicks are easier to see and find than just sliding the start of their singing to where it probably should start.

We also tried adding a click track over the entire source music. Helped with timing but kids did not like it.

We also experimented with having the source music include the conductor as a video. We had quite a few problems with the singers getting off time, particularly after a music solo, and watching the conductor helped that.

We also tried having having each section leader sing their part first, and had the rest of the section use that as the source audio. This worked better to get the students to sustain notes in unison. This worked pretty well.

I found that there were variations in time with many of the files. This may be compression, or just a variance in how the students were playing or recording the file. In any case, having the clicks helped register the files into the editing software.

It is still a grind though because then you still have to make it sound good.
 
We had quite a few problems with the singers getting off time, particularly after a music solo, and watching the conductor helped that.
Yes, that is what I have been doing. Start with a video master of the conductor and accompaniment on piano. That track then goes to the section leaders which supply the SATB. Those then become submasters for the singers to follow. Biggest pain is singers who lag/lead/lag/lead as the song goes on! If they lag the whole time it can be fixed by nudging the whole track, but this variance is a pain. Best I can do is take the audio track into an audio editor and see what I can do during the breaths! Sure is a learning experience! Working on #6 here.
 
Start with a video master of the conductor

Haven't made one, but the ones I've observed in my habitat had the most success when there was a conductor video they could watch as well as listen. Non verbal/aural cues are a big part of making music with other people.
 
We did a family one (singing is in our DNA and one uncle decided we could take on Pentatonix Sound of Silence) - we found the trick was a complete reference track - we sourced a score, fed it into Sibelius and then exported reference tracks from that - each vocal line had a specific track with that line louder than the rest. Easier to sing along with than a click and also meant pitch was good.

My sister took part in a massive, massive one for ANZAC Day (similar to Memorial Day I think) with 134 participants. Apparently it took 36 hours to render.
 
Render time and previewing definitely gets messy with that many different source videos. One trick I stumbled on was to standardize all the individual videos first. I kind of arbitrarily decided everything would be cropped to a portrait 3:4 aspect ratio, so I did that, added the person's name, and rendered that as a new file. Then in the main project I could focus on the grid layouts & transitions, and so forth. I originally tried it just because it was an easier approach to make the titles work, but it ended up making a huge difference in rendering time. The project had... I think 15-17 singers and an initial draft was around 4hrs rendering. Pre-processing all the videos took 1-2 hours total, but the main project render time dropped down to a little over an hour. Dealing with all the different aspect ratios, resolutions, formats, codecs, and random other crap that everyone's phone happens to spit out adds a lot more overhead than I realized going into it.

As far as process goes, we did something that combines some of the ideas already mentioned here. The choir teacher created a video that the students would play on their laptop or chromebook (with headphones) while recording with their phones. The video included the score and a small view of the conductor in one corner. It would start with a "One, two, three, *clap* " followed by a pause, and then the lead in to the music--which included a metronome, piano, and one person singing each voice part. We had one person pause their phone between the clap and the start of the music, but for the most part it was pretty easy to get everything lined up, and they mostly stayed in time with each other.

I used Vegas Pro for the video editing--which has some good and bad characteristics. I felt like the process of deciding, "Okay, I've got this many people, so I need to make them all this size and do x rows of y" was pretty clumsy. Instead I actually used Inkscape to just draw out some squares to get the size and spacing correct. There was a little math involved because Vegas counts from the center of everything and Inkscape uses the bottom left corner, but once I had that figured out it was pretty quick and easy to create position presets and assign the source videos to them.

The choir teacher did the audio editing separately while I worked on the video. It was easier for me to time-align everything in Vegas first, export WAV files to send to him from that, and then add the mixed version as a new stereo track when he was done. I can't speak much to his process other than that we both acknowledged one of the hardest parts is deciding what "good enough" means under the circumstances. It's a bunch of mediocre phone mics in terrible acoustical environments, so it's never going to sound as good as you'd like. At some point you have to accept what's realistic with what you have and not waste endless hours trying to make it just a little bit better.
 
it's never going to sound as good as you'd like. At some point you have to accept what's realistic with what you have and not waste endless hours trying to make it just a little bit better.

100% agree.
 
It was easier for me to time-align everything in Vegas first, export WAV files to send to him
It is much easier to export audio and do that in an audio editor when you have a lot of tracks, as long as you have time markers so it can be knitted together again. Audio files are a lot smaller so on any given processor they will be easier to handle.

One trick I have learned is to process the raw files and drop the resolution if they will not be used full-frame. Grab the audio first, then export them at the appropriate resolution. For example, a source file may be a 1.6 gig MOV, but trimmed and dropped to 720p it will save as a 250 meg mp4.

The second trick has to do with the grid. Old, but lets face it, there needs to be some sections where you are showing everyone. I do a two step process. I break the screen into 4 sections and then do 4 pre-renderings, then bring those together and render the full grid and save it as a file. I then bring this in as one track in the background layer and add all the effects, inserts, overlays, whatever into the final edit. The final rendering is pretty easy.
 
It is much easier to export audio and do that in an audio editor when you have a lot of tracks, as long as you have time markers so it can be knitted together again.

I think this probably ends up being a matter of preference (or possibly software limitations) as much as anything. One aspect is a slight shift in emphasis: it was easier for me to do it in Vegas instead of the choir teacher doing it. Also, the video has to be aligned either way, so if you're starting by doing both together, then there isn't the separate step of later making all the video match up. I was still using the audio tracks to do the actual aligning and the interface for that in Vegas is pretty similar to typical DAW software. That's one of the things I really like about it. The only minor difference is that it limits you to moving in 1-frame increments, and there were a couple cases where my internal perfectionist was a little annoyed by that.

But yes to the general approach of rendering it in pieces and assembling those pieces into the final project. That was the main thing I learned from this process as my assumption going in would be that rendering the little pieces individually and then rendering the final project would take the same total amount of time as doing it all in one big render. It turns out that doing it all at once took significantly more processing time than the sum of doing the pieces individually.
 

Users who are viewing this thread

Back