Pitch Trace¶
While most of ARLO assumes working with Audio spectra, some facility is provided for with Pitch specifically. This describes the various PitchTrace features and how to generate data with them.
PitchTrace Algorithms¶
MaxEnergy¶
The “Maximum Energy” PitchTrace is the simplest form of generating Pitch data. Starting with an audio spectra (as one might see on a visualization), at each time step (pixel on the horizontal axis) the frequency band with the most energy is considered the maximum pitch at that moment.
Note that this extreme simplicity comes with a tradeoff - this method may not be reliable in many situations, being overcome by background noise or stronger components of the main signal.
Since this is looking directly at the generated audio spectra, the normal conditions apply. For example, a larger number of frequency bands will give more resolution, and damping factor can be adjusted for a trade-off between time- and frequency-accuracy. The only parameters are those of the spectra generation itself.
Parameter Name | Type / Range |
---|---|
Spectra Details | |
numFrequencyBands | Integer |
numTimeFramesPerSecond | Float |
dampingRatio | Float |
minFrequency | Float |
maxFrequency | Float |
Fundamental¶
The initial frequency (97 Hz) of the sound, vibration in cycles per second of a string (guitar) or vocal fold (us), is the fundamental frequency. Each successive line is a ‘partial’ or harmonic (Hn) of the 97 Hz fundamental tone ‘G’. f1 = the first “format”. The harmonic progression of sound is an arithmetic progression in which the difference between successive values (Fundamental frequency, 2nd Harmonic, 3rd Harmonic, etc.) is the same as the first value in the progression (the fundamental frequency).
Where F0 = The fundamental (tone produce by the sound source (e.g. vocal fold, instrument, etc.), Hn = a harmonic in the progression: H n = F0 + n (F0).
See also https://sites.google.com/site/nehhipstas/documentation/fundamental-pitch-tracking
A common issue with the MaxEnergy approach, especially with voice signals, is that harmonics in the signal may actually be quite stronger than the fundamental. The Fundamental mode employs further processing in an attempt to identify the true Fundamental frequency of the signal.
Fundamental pitch tracking works by analyzing the spectra created by ARLO. It generates and tests Number of Sample Points potential fundamental frequencies between Pitch Trace Start and End Frequency.
For each candidate frequency, ARLO computes which spectral frequency bands to consider. Frequency bands that are “close” and “far” away from candidate frequency harmonics are used to measure match quality. A match template consisting of one value per frequency band is created with the values -1, 0, and +1. Template values of -1 are frequencies that should not have energy in the spectrum, values of +1 correspond to frequencies that are close to a harmonic, and values of 0 are for frequencies neither closer nor far from a harmonic.
A fractional Tolerance parameter (ranges from 0.0 - 0.25) is used to define how tight the match frequency tolerances are. A tolerance parameter of 0.25 means all spectra frequency bands are used.
Each of these candidate harmonic templates is compared to the measured spectra using mathematical correlation yielding a result ranging from -1.0 to 1.0.
Frequency detections occur when the correlation is greater than Minimum Correlation.
Further filtering of the detected frequencies are possible by placing constrains on the overall volume (Minimum Energy Threshold) and general level of pattern in the signal (Entropy Threshold).
The given detection algorithm has a general tendency to to match higher frequencies resulting in octave shifts. To avoid this problem a penalty function is added to degrade the match quality based on frequency, with higher frequencies penalized more. The penalty weight is 1.0 / (CandidateFundamentalFrequency ^ Inverse Frequency Weight). When set to 0.0, there is no penalty for high frequency matches.
Parameter Name | Type / Range | Default | Description |
---|---|---|---|
Spectra Details | |||
numFrequencyBands | Integer | ||
numTimeFramesPerSecond | Float | ||
dampingRatio | Float | ||
minFrequency | Float | ||
maxFrequency | Float | ||
Fundamental Details | |||
Number of Sample Points | Integer | 1000 | |
Pitch Trace Start Frequency | Float | 80 | The frequency range to consider for choosing the Fundamental. Note that this is separate from the Spectra Generation frequencies, as the algorithm takes into account signal components several harmonics out from the Fundamental range. That is, this range will typically be much smaller than the spectra range. |
Pitch Trace End Frequency | Float | 2000 | |
Tolerance | Float | 0.15 | |
Inverse Frequency Weight | Float | 0 | |
Maximum Path Length Per Transition | Float | 4 | |
Window Size | Integer | 13 | |
Extend Range Factor | Float | 2 | |
Thresholds | |||
Minimum Energy Threshold | Float | 0 | |
Minimum Correlation | Float | 0 | |
Entropy Threshold | Float | 999999 |