Conversion of Audio Samples to Video Frames Brooks Harris May 8, 1998 Converting between digital audio sampling rates and video frames requires calculations sufficiently accurate to ensure maintenance of the audio/video synch relationship. Computations of this nature are best performed on computers using integer math wherever possible. The ratios between NTSC frame and audio sampling frequencies present the most demanding calculation because of the intentionally odd relationships of the NTSC frequencies. The form of PAL to audio sampling calculations can follow the NTSC examples. NTSC Frame Rate The NTSC standard for color transmission was originally designed to be compatible with existing B&W equipment. In particular a Color Subcarrier frequency needed to be selected which A) did not interfere with the 4.5 Mhz audio subcarrier, B) would minimize visible artifacts from color subcarrier, C) would result in vertical and horizontal scanning rates very near the existing B&W rates, and D) would conform to the constraints of existing transmission channels.The original RS-170A publication showed the results of these calculations rounded to an approximation deemed appropriate for tolerances achievable by calculators and hardware of that era. Here, the frequencies are shown to 15 points of precision. The choices used to derive the NTSC frequencies are given by: 4.5 x 106 Audio Subcarrier line 13 x 7 x 5 Selected sub-multiples or: line The locked relationship between Color Subcarrier and Horizontal Scan Frequency is illustrated by: 2 x sc We are particularly concerned with Vertical Frame Frequency, or 1/2 the Vertical Field Frequency. This can be calculated by reducing the appropriate terms in the formulas above, demonstrating how to use integer math to perform these calculations. From the formulas above we have: 4500000 Audio subcarrier Hz The 2 can be eliminated because we are interested in frames, not fields, and this can be reduced to: 4500000 4500000 30000 Maximum audio samples and video frames To evaluate the requirements for performing these calculations on a computer we may ask "What is the largest audio sampling number we must handle?". Put another way, "What is the number of the last audio sample in 24 hours?".The NTCS frame rate is 29.970029970029970 FPS (30000 / 1001), or slower than nominal 30 FPS. When each frame is labeled incrementally, as with Non-drop Frame, the video frame labeled with 24:00:00:00 (23:59:59:29 + 1) will occur later than a true 24 hours.The NTSC Non-drop Frame labeling scheme labels every video frame to 23:59:59:29 + 1: The true elapsed seconds of 2,592,000 NTSC frames is: These are the numbers we must handle. The largest of them (4,151,347,200 - 48K NTSC samples) will fit in a 32 bit unsigned long integer variable (ULONG_MAX = 4,294,967,295) and so will accommodate the storage of these numbers. (Note 32 bits will NOT handle the storage of 96K sampling: 86486.4 x 96000 = 8,302,694,400). Summary - 2,592,000 NTSC video frames in Non-drop Frame "24:00:00:00" 48000 Sample Rate to NTSC frames Deriving our formulas from basic principles we begin with the integer values contained in the NTSC formulas and factor to reduce the equation to the simplest possible form. Beginning with the formulas above and introducing the terms for 48K sampling, we have: 4500000 1 This can be reduced to: 4500 4500 45 5 This then represents the ratio between 48K sampling and NTSC frame rate. As stated above, the last 48K sample in 24 hours of Non-drop frame NTSC is 4,151,347,200. To convert this sample number to the NTSC video frame we have: 4151347200 x 5 20,756,736,000 The product of the first multiplication is a number greater than 32 bits. This can be performed with integer math if a 64 bit integer type is available on the platform.The Microsoft c/c++ compiler supports __int64. On this platform the calculation can be written: unsigned long 48KToNTSC(unsigned long 48K_Sample_Input) unsigned long NTSCTo48K(unsigned long NTSC_Frames) 44100 Sample Rate to NTSC frames Similarly for 44100 Sampling, we have: 1 4500000 45000 4500 100 The last 44100 sample in 24 hours of Non-drop frame NTSC is 3,814,050,240. 3814050240 x 100 381,405,024,000 In Microsoft c/c++: unsigned long 441ToNTSC(unsigned long 441_Sample_Input) unsigned long NTSCTo441(unsigned long NTSC_Frames) 48000 Sample Rate to PAL frames The same computational approach can be applied to converting audio samples to PAL (25 FPS) frames. The ratio of 48000 Sampling to PAL 25 FPS is: 25 1 From above, the number of 48000 samples in 24 hours is 4,147,200,000 . 4,147,200,000 This is a simpler and less demanding calculation than for NTSC and can be accomplished within 32 bit integers. unsigned long 441ToPAL(unsigned long 441_Sample_Input) unsigned long PALTo441(unsigned long PAL_Frames) 44100 Sample Rate to PAL frames Similarly for 44100 Sampling to PAL video frames: 25 1 The last 44100 sample in 24 hours 3,810,240,000: 3,810,240,000 unsigned long 441ToPAL(unsigned long 441_Sample_Input) unsigned long PALTo441(unsigned long PAL_Frames) References 1. K. B. Benson and J. Whitaker, Television Engineering Handbook, McGraw-Hill, 1992. About the Author Brooks Harris is President of Brooks Harris Film & Tape, Inc. (BHFT), a software development company in NYC specializing in edit data exchange. BHFT markets EDLMAX, an EDL and OMF management application, and provides consulting and custom implementations for OEM clients. Harris is a member of SMPTE and AES and a contributor to the SMPTE P18.27 Working Group on Editing Procedures and AES SC-06-01 Audio File Interchange. Harris is also a contributor to the EBU/SMPTE Task Force on the Harmonization of Data Interchange. BHFT is an OMF Champion and an AAF Adopter. |