$$\DeclareMathOperator{\sinc}{sinc} \DeclareMathOperator{\sgn}{sgn}$$

This page explains some relatively quick methods for estimating characteristics (such as the frequency) of periodic components of a set of measurements.

## 1. Introduction

Fig. 1: Measurements Diagram
In astronomy many things have periodical components. Every celestial object with a closed (elliptical) orbit regularly returns to about the same position in its orbit. Some stars get periodically brighter and dimmer again. The magnetical activity of the Sun increases and decreases again. Suppose that you have a set of measurements of some characteristic of which you believe it may contains such periods. Take, for example, the measurements displayed in Figure 1. During 1024 seconds, one measurement was taken each second, and the results are shown in the diagram (with time along the horizontal axis and the measured values along the vertical axis). How can you find the periods that contribute to this series of measurements?

Fig. 2: Meetreeksgrafiek met proefreeks
One method is to guess at the period, and to compare the measurements to what they should be if your guess is correct. Figure 2 shows the same data again, but also some test data for a guessed harmonic component with a frequency of 10 (i.e., with exactly 10 full periods in the data set) and an amplitude (greatest deviation from the average) of 1. The test data is reasonably close, but not good enough. It is not tall enough, and it still runs slowly out of step with the measurements. It is a lot of work to keep adjusting your guess and try again.

## 2. Use a Fourier Spectrum

A fast method to get some insight is to determine the Fourier spectrum of the data. One can use an FFT algorithm for this, if the time between successive measurements is constant. (FFT stands for Fast Fourier Transform.) If a programming language has a function for this, then it is usually called something like POWER or FFT. The Fourier spectrum shows how much "energy" there is in the data for each whole frequency (i.e., for sines and cosines that exactly fit a whole number of times in the data). That energy is equal to the square of the amplitude. Mathematics tells us that it is always possible to decompose an arbitrary data set into harmonic components in this way. It is much better for the calculations if the data set has an average value equal to 0, so if this is not yet the case, then we calculate the average value and subtract it from the data set first.

Fig. 3: Frequency Spectrum
Figure 3 shows the frequency spectrum of our data set. The horizontal axis shows frequency $$f$$, which says how many times the harmonic component fits in the observations. The vertical axis shows the energy $$P$$, which is the square of the amplitude.

It is clear that the greatest energy is at a frequency of 10, but there is also activity at other frequencies. There is some sort of a peak and a valley near frequency 80, too. We saw before that a frequency of 10 does not fit well enough. A Fourier spectrum only looks at harmonic components that exactly fit the data set a whole number of times, and it would be quite a coincidence if the data set consists of just one harmonic component (a combination of a sine and a cosine with the same frequency) that just happens to exactly fit the data set a whole number of times, so even if there really is only a single harmonic component in the data set, chances are that a Fourier spectrum of that data set will show activity at every frequency.

The advantage of a Fourier spectrum is that it can be calculated relatively quickly. The disadvantages are that a harmonic component is only confined to a single frequency in a Fourier spectrum if that component happens to fit the data set exactly a whole number of times, and that different harmonic components usually distort each other in such a Fourier spectrum, like the peak near frequency 80 that looks rather ugly with a dip in it, because around that frequency there is also a significant contribution from the peak at frequency 10.

## 3. Simple Estimates

### 3.1. Simplest Estimate: Method 1

The simplest estimate of the frequency of a harmonic component from a Fourier spectrum is to locate all peaks in it (where there is greater energy than at neighboring frequencies) and to measure the frequency and amplitude (the square root of the energy) of those peaks.

For our data set, this yields

#
$${f_0}$$$${A}$$
1 10 0.915
2 32 0.0065
3 78 0.0024

so there is also a peak at frequency 32. This peak is far more difficult to see in the diagram than the peak at frequency 78, yet the estimate for the amplitude at frequency 32 is a lot higher than that at frequency 78. That is because of the "slope" from the peak at frequency 10.

### 3.2. Better Estimate: Method 2

If you calculate the Fourier spectrum of a single harmonic component with positive frequency $$f_0$$ that need not fit the data set a whole number of times (i.e., also if $$f_0$$ is not an integer number), then you find that the energy $$P_f$$ at frequency $$f$$ is equal to about

$$P_f = A^2 (\sinc^2_{π}(f - f_0) + \sinc^2_{π}(f + f_0) + 2\sinc_{π}(f - f_0)\sinc_{π}(f + f_0)\cos(2β))$$

where $$\sinc_{π}(x)$$ is a function equal to $$\frac{\sin(π x)}{π x}$$ if $$x$$ is unequal to 0, and 1 if $$x$$ is equal to 0, and $$β$$ is the phase of the harmonic component at the beginning of the data set. The sinc function has its greatest value when its argument is equal to 0, so the terms with $$f - f_0$$ in it are greatest near $$f = f_0$$ and the terms with $$f + f_0$$ in it are greatest near $$f = -f_0$$. So, there are two equally high peaks, symmetrically around frequency 0: one peak at positive frequency and one peak at negative frequency. The terms from the negative peak (the peak at negative frequency) are smaller at positive frequencies than the terms from the positive peak are. If we ignore the terms from the negative peak (with $$f + f_0$$ in it, which have little influence for $$f \gg 0$$), then the previous equation leads to

\begin{align} P_f & = A^2 \sinc^2_{π}(f - f_0) \\ P_{f+1} & = P_f \frac{f^2}{(f + 1)^2} \\ P_{f-1} & = P_f \frac{f^2}{(f - 1)^2} \end{align}

This no longer depends on $$β$$. If $$f_x$$ is the integer frequency at which the spectrum has its highest peak, and $$f_0 = f_x + k$$, then $$k$$ is between −1 and +1. With

$$q = \frac{P_{f_x}^2}{P_{f_{x+1}}P_{f_{x-1}}}$$

the preceding two numbered equations lead to

\begin{align} k & = \frac{\sgn(P_{f_{x+1}} - P_{f_{x-1}})}{\sqrt{1 - \sqrt{q}}} \label{eq:k1} \\ A = \frac{\sqrt{P_{f_x}}}{\sinc_{π}(k)} \label{eq:A1} \end{align}

With this one can find the frequency and amplitude of all peaks from the frequency spectrum. The procedure is as follows:

1. Find the integer frequency $$f_0$$ where the frequency spectrum shows greater energy than for neighboring frequencies
2. Use equation \ref{eq:k1} to estimate the frequency difference $$k$$. The frequency $$f$$ of the component can then be estimated at $$f_0 + k$$. (If $$|k|$$ is greater than 1, then the assumptions upon which the current method is based do not hold for the current data set. See below.)
3. Use equation \ref{eq:A1} to estimate the amplitude of the component.

If we apply this to the data set then we find

#
$${f_0}$$$${A}$$
1 10.30033 1.066
2 32.66 0.016
3 78.26 0.0027

The amplitude of the largest peak is now estimated to be quite a bit higher, and also the amplitude of the peak at frequency 33.

### 3.3. Even Better Estimate: Method 3

The strange shape (in the preceding diagram) of the peak at frequency 78 and the near-invisibility of the peak at frequency 33 derive from the broad "slopes" that come from the tallest peak, which disturb the lower peaks. So, we must try to make the slopes of the tallest peak more steep so that they don't distort the other peaks so much. We do that by multiplying the data set with a mask or apodization function that goes to zero at the beginning and end of the data set. The apodization function we'll use here is a cosine that has values between 0 and 2, has exactly one period in the data set, and goes to zero at the edges. (This is a form of the Hanning mask or Hanning function.) When such a mask is applied, it is important that the average of the data set be equal to zero.

Fig. 4: Frequentiespectrum
The frequency spectrum of the apodized data set is shown in Figure 4. For convenience I again show the previous frequency spectrum (without the data apodization), too, but now as a dashed line.

Note that the vertical axis of this diagram has a much greater range than the previous diagram. In the previous spectrum, the energy ratio between the top and bottom of the diagram was about a thousand million (109), but in this diagram it is ten thousand million times greater still (or about 1019 in total). The peaks are slightly fatter, but the slopes are far steeper, so the smaller peaks stand out much more here than in the previous diagram.

In the same way as before one can calculate the Fourier spectrum of a single harmonic component of frequency $$f_0$$ and amplitude $$A$$. Just like before, we ignore the contributions from the negative peak. We then find

$$P_f = A^2 \left( \frac{\sinc_{π}(f - f_0)}{(f - f_0)^2 - 1} \right)^2$$

This leads (in a similar way as before) to the estimate

\begin{align} q & = \frac{P_{f_x}}{\sqrt{P_{f_{x+1}} P_{f_{x-1}}}} \\ k & = \sqrt{\frac{q - 4}{q - 1}} \sgn(P_{f_{x+1}} - P_{f_{x-1}}) \label{eq:k3} \\ A & = (1 - k^2) \frac{\sqrt{P_{f_x}}}{\sinc_{π}(k)} \end{align}

(If $$|k|$$ is greater than 1, then the assumptions upon which the current method is based do not hold for the current data set. See below.)

Applied to the data set, this yields

#
$${f_0}$$$${A}$$
1 10.29955 1.04978
2 77.6024 0.0019970
3 31.60 0.0012

Now the peak at frequency 78 is estimated to be higher than the peak at frequency 32.

### 3.4. Method 4: General Quadratic

The two preceding methods assume that the components in the Fourier spectrum are pure haromics, which means that they are assumed to have a fixed frequency and amplitude, but this is quite rare in practice. Usually the frequency and amplitude of all components vary a bit with time, which makes the peaks in the Fourier spectrum wider. If the variation gets too large, then those methods might not give any estimate for how much the "real" frequency differs from the Fourier peak. In such cases you can make the simpler assumption that the relationship between energy and frequency near the peaks in the Fourier spectrum is quadratic. Then we find (in a similar way as before) that

\begin{align} k & = -\frac{1}{2} \frac{P_{f_{x+1}} - P_{f_{x-1}}}{P_{f_{x+1}} + P_{f_{x-1}} - 2P_{f_x}} \\ A & = \sqrt{P_{f_x} + \frac{1}{4} (P_{f_{x+1}} - P_{f_{x-1}}) k} \end{align}

You can apply a mask here, too, to make the slopes of the peaks more steep. Applied to our data set this method yields

#
$${f_0}$$$${A}$$
1 10.18609 1.001697
2 77.6994 0.0018485
3 31.25 0.00095

### 3.5. Remarks

1. The resolving power of a frequency spectrum is limited. If the measurements last for a time $$T$$, then the frequencies are multiples of $$1/T$$. Frequency differences smaller than $$1/T$$ cannot be measured from such a frequency spectrum. Two harmonic components of which the frequencies differ from each other by less than that amount cannot be told apart in the frequency spectrum.
2. Fig. 5: Frequency Spectrum Showing Aliasing
A harmonic component with a frequency that is greater than the greatest frequency from the spectrum does show up in the spectrum, but at a frequency that is smaller than its own frequency. If the data set lasts a time $$T$$ and contains $$N$$ data values, then the greatest frequency in the spectrum is about equal to $$\frac{N}{2T}$$. If the frequency $$f$$ of a harmonic component is greater than that greatest frequency, then that component ends up in the spectrum at a frequency equal to $$\left| f - N \left[ \frac{f}{N} \right] \right|$$. This is called aliasing. An example is shown in Figure 5, which shows three different harmonic components that all intersect in the measurement points (indicated by small squares). One cannot tell from measurements at those point which of the components it is.
3. If the data set also contains a component that is not periodic (or perhaps is periodic at a period that is far greater than the length of the data set), then the energy of that component shows up at the lowest frequencies, where it distorts any "true" peaks and boosts them to undeservedly greater amplitudes. To suppress this effect, one can subtract a running average from the data set, but that suppresses not just non-periodic components but also periodic components with periods that are longer than the width of the running average.
4. Methods 2 and 3 ignore the influence from negative peaks, which is strongest at the smallest frequencies (still smaller than the influence of the positive peaks, but not a whole lot smaller). Because of this, the results will tend to be less accurate for the smallest frequencies than for the greater frequencies.
5. The steepening of the slopes of the peaks that method 3 yields has a price, which is that the peaks get fatter at their top than they were for method 2. If two peaks are very close together, then method 3 can have more trouble to separate them than method 2 has. It seems that method 2 is sometimes more accurate than method 3 at finding the characteristics of the greatest component, but method 3 is almost always better at finding the smaller components.

## 4. The Accuracy

### 4.1. Test with the Data Set

The harmonic components that compose the data set are as follows:

#
$${f_0}$$$${A}$$$${β}$$
1 10.3 1.05 36°
2 77.6 0.002
3 31.4 0.001

Now we can determine how accurate the various estimates were. The next table shows the difference between the estimate and the true value for the frequency ($$∆f_0$$) and the relative difference for the amplitude ($$\frac{∆A}{A}$$).

$${∆f_0}$$$${\frac{∆A}{A}}$$
#
Method 1 Method 2 Method 3 Method 4 Method 1 Method 2 Method 3 Method 4
1 −0.3 +0.00033 −0.00045 −0.11 −0.12 +0.015 −0.00021 −0.046
2 +0.4 +0.66 +0.0024 +0.099 +2.3 +0.36 −0.0015 −0.076
3 +0.6 +1.26 +0.20 −0.15 +1.4 +14.7 +0.18 −0.050

Method 3 seems best in general (if the data set contains only pure harmonic components). This is confirmed by a large number of tests with various randomly constructed data sets (with an exponential distribution of amplitudes). The accuracy of the frequency appears to be roughly constant for method 1, and, for methods 2 and 3, to increase with about the 2.7-th power of the average spacing (in the spectrum) between the harmonic peaks, but with an advantage of about a factor of 10 for method 3 compared with method 2. Method 4 is much better than method 1, but often worse than methods 2 and 3 if the data set contains only pure harmonic components. If the components are not pure harmonics, then methods 2 and 3 might not work at all, and then method 4 is best.

The relative accuracy of the amplitudes seems roughly constant for method 1, to increase with about the square of the spacing (in the spectrum) between the harmonic peaks for method 2, and with about the third power of the average spacing for method 3.

### 4.2. Testing with the VSOP Model

Below, we test the methods on the VSOP model of the motion of the planets. One can calculate the positions of the planets from this model for a few thousand years into the future and into the past. The model consists of a large number of harmonic components, so we can test how well we can find those components again in the positions of the planets.

Below is a table that shows the ten largest harmonic components in the ecliptic heliocentric longitude $$λ$$ of the Earth relative to the equinox and ecliptic of the date according to the VSOP model ($$f_\text{vsop}$$ is the frequency in radians per Julian millennium of 365250 days, $$A_\text{vsop}$$ is the amplitude in 10−8 radians), as given by [Meeus], and also the ten largest harmonic components calculated from $$λ$$ (calculated for each 10 days during a period of 10,000 years around the year 2000) using methods 1-4 (the other $$f$$ and $$A$$, in the same units as for the VSOP model). To suppress non-periodic components at small frequencies I removed a running average over 68 years from the data sets before trying to detect harmonic components in them.

$${λ}$$
 #
$${f_\text{vsop}}$$$${A_\text{vsop}}$$$${f_1}$$$${A_1}$$$${f_2}$$$${A_2}$$$${f_3}$$$${A_3}$$$${f_4}$$$${A_4}$$
1 6283.0757 3341656 6283.1853 2957169 6283.0191 3326893 6283.0366 3304973 6283.0874 3213528
2 12566.1572 34894 12565.7423 23781.5 12566.0368 35193.11 12566.0147 34050.8 12566.0137 31518.94
3 5753.3848 3497 5753.5128 3159.722 6069.1218 6556.893 5753.3852 3496.726 5753.4425 3419.802
4 3.5231 3418 77713.5775 2666.237 5753.2840 3970.837 77713.7709 3134.605 77713.6994 2983.824
5 77713.7734 3136 7860.2648 2597.325 5880.2491 3630.401 7860.4192 2676.357 7860.3537 2591.483
6 7860.4194 2676 6069.5570 2477.555 5855.4863 3434.766 3930.2098 2342.756 3930.1724 2323.368
7 3930.2097 2343 3930.1324 2328.372 77713.7717 3136.410 11506.7698 1324.200 529.6594 1270.505
8 11506.7695 1324 5885.4597 1470.287 6681.9815 3059.166 529.6725 1270.298 11506.8305 1221.383
9 529.6910 1273 5223.8403 1312.458 7860.1163 2852.389 1577.4341 1196.691 1577.4788 1095.007
10 1577.3435 1199 5879.8048 1300.653 5885.8305 2839.734 5884.9271 990.4211 5884.8816 977.9457

Among the 10 greatest components found through method 1 there are 6 of the 10 greatest components from the VSOP list. For method 2 this number is 5, and for methods 3 and 4 it is 9. The next two tables display the absolute (for the frequency $$f$$) or relative (for the amplitude $$A$$) error for the first three VSOP components ($$∆_1$$, $$∆_2$$, $$∆_3$$), and also the minimum ($$\min$$), average ($$μ$$), maximum ($$\max$$), and standard error ($$σ$$) of that same kind of error for the first ten VSOP components, for methods 1-4 (M1 - M4). To determine such an error for each of the VSOP components, I looked for the estimated component (from methods 1-4) that was closest to it.

$${f}$$
$${∆_1}$$$${∆_2}$$$${∆_3}$$$${\min}$$$${μ}$$$${\max}$$$${σ}$$
M1 0.1096 −0.4149 0.1280 −0.4149 −0.0385 0.2560 0.2237
M2 −0.0565 −0.1204 −0.1008 −0.3031 0.0073 0.6496 0.2492
M3 −0.0391 −0.1425 0.0004 −0.1433 −0.0255 0.0906 0.0703
M4 0.0117 −0.1436 0.0577 −0.1695 −0.0256 0.1353 0.0946

$${A}$$
$${∆_1}$$$${∆_2}$$$${∆_3}$$$${\min}$$$${μ}$$$${\max}$$$${σ}$$
M1 −0.1151 −0.3185 −0.0964 −0.3185 −0.0557 0.0379 0.1058
M2 −0.0044 0.0086 −0.0178 −0.0178 0.0087 0.0612 0.0230
M3 −0.0110 −0.0242 −0.0001 −0.0242 −0.0016 0.0230 0.0116
M4 −0.0383 −0.0967 −0.0221 −0.0967 −0.0269 0.0187 0.0325

For example, method 2 underestimates the frequency of the greatest VSOP component by 0.0565 units, and underestimates the amplitude by a fraction of 0.0044 (i.e., by 0.44%). Method 3 is usually the best one (with the smallest $$σ$$ that is yet much greater than the $$μ$$) for both the frequency and the amplitude.

Next is a similar table for the ecliptic heliocentric latitude of the Earth. Meeus provides only 5 components for this case, but I did, when calculating the positions, also take the smaller components into account that Meeus does not mention.

$${β}$$
 #
$${f_\text{vsop}}$$$${A_\text{vsop}}$$$${f_1}$$$${A_1}$$$${f_2}$$$${A_2}$$$${f_3}$$$${A_3}$$$${f_4}$$$${A_4}$$
1 84334.6641 280 84334.7981 258.1 84334.6603 279.8 84334.6646 279.2 84334.7219 272.5
2 5507.5532 102 5507.2119 76.91 5507.4655 102.2 5507.4691 102.2 5507.4022 94.04
3 5223.6899 80 5223.8403 66.76 5223.6270 81.32 5223.6410 79.94 5223.7014 76.04
4 2352.8701 44 2353.0529 42.58 2352.9514 44.46 2352.9227 44.55 2353.0000 43.44
5 1577.3400 32 1577.0795 28.1 1577.2574 32.18 1577.2653 32.21 1577.1857 30.72
6 1048.0353 15.54 1047.7396 23.08 1047.7791 22.03 1047.7835 20.58
7 5856.5570 15.01 6283.4136 17.84 5856.3454 17.06 6283.2864 16.03
8 6283.1853 14.21 5856.3852 17.03 6283.1853 15.90 5856.4586 15.98
9 10213.3177 13.88 9437.6669 15.25 9437.6385 15.20 10213.3463 14.14
10 14143.4501 10.36 10213.3738 14.06 10213.3177 14.13 9437.7010 13.21

All four methods find the five greatest VSOP components, and in the correct order, too.

$${f}$$
$${∆_1}$$$${∆_2}$$$${∆_3}$$$${\min}$$$${μ}$$$${\max}$$$${σ}$$
M1 0.1341 −0.3413 0.1503 −0.3413 −0.0269 0.1828 0.2523
M2 −0.0038 −0.0877 −0.0629 −0.0877 −0.0311 0.0813 0.0711
M3 0.0005 −0.0841 −0.0489 −0.0841 −0.0309 0.0526 0.0571
M4 0.0579 −0.1510 0.0114 −0.1543 −0.0212 0.1299 0.1272

$${A}$$
$${∆_1}$$$${∆_2}$$$${∆_3}$$$${\min}$$$${μ}$$$${\max}$$$${σ}$$
M1 −0.0781 −0.2460 −0.0387 −0.2460 −0.1034 −0.0323 0.0874
M2 −0.0008 0.0015 0.0165 −0.0008 0.0066 0.0165 0.0070
M3 −0.0030 0.0019 −0.0007 −0.0030 0.0034 0.0125 0.0062
M4 −0.0267 −0.0780 −0.0494 −0.0780 −0.0413 −0.0127 0.0248

Here, too, method 3 is usually best, and method 2 is almost as good. Methods 1 and 4 systematically underestimate the amplitude.

Next is a similar table for the distance $$r$$ between the Earth and the Sun, measured in units of 10−8 AU.

$${r}$$
 #
$${f_\text{vsop}}$$$${A_\text{vsop}}$$$${f_1}$$$${A_1}$$$${f_2}$$$${A_2}$$$${f_3}$$$${A_3}$$$${f_4}$$$${A_4}$$
1 6283.0757 1670700 6283.1853 1478578. 6283.0191 1663435. 6283.0366 1652461. 6283.0874 1606753.
2 12566.1514 13956 12565.7423 9518.163 12566.0367 14080.45 12566.0130 13582.83 12566.0136 12592.51
3 77713.7734 3084 77713.5775 2621.098 6069.1215 3268.973 77713.7704 3080.909 77713.6995 2933.586
4 5753.3848 1628 7860.2648 1515.749 77713.7717 3083.366 5753.3858 1628.202 5753.4425 1592.821
5 7860.4194 1576 5753.5128 1475.316 5959.7856 2244.096 7860.4191 1575.710 7860.3537 1525.868
6 11506.7695 925 6069.5570 1233.091 5956.6438 2226.028 11506.7709 923.9234 11506.8306 852.9342
7 3930.2100 542 5960.2296 805.2115 5954.3899 2199.643 3930.2094 542.3906 3930.1726 537.9689
8 5884.9268 472 5957.0880 797.9427 5952.2461 2195.110 5884.9287 472.2971 5884.8817 466.1149
9 5507.5532 346 5953.9464 791.5154 6609.7268 1926.811 5507.6453 344.5373 5507.7109 328.0761
10 5223.6938 329 5952.6898 789.1468 6611.6113 1914.480 5223.7757 328.2695 5223.8040 326.4411

Among the ten greatest components found by method 1 there are 5 of the ten greatest VSOP components. For method 2 the number is 3, and for methods 3 and 4 it is 10 (and all in the correct order, too).

$${f}$$
$${∆_1}$$$${∆_2}$$$${∆_3}$$$${\min}$$$${μ}$$$${\max}$$$${σ}$$
M1 0.1096 −0.4091 −0.1960 −0.4091 0.0623 0.5329 0.2746
M2 −0.0565 −0.1146 −0.0017 −0.3045 0.1103 0.9054 0.3990
M3 −0.0391 −0.1384 −0.0030 −0.1384 −0.0003 0.0921 0.0631
M4 0.0117 −0.1378 −0.0740 −0.1378 0.0038 0.1577 0.0920

$${A}$$
$${∆_1}$$$${∆_2}$$$${∆_3}$$$${\min}$$$${μ}$$$${\max}$$$${σ}$$
M1 −0.1150 −0.3180 −0.1501 −0.3180 −0.0863 0.0140 0.0994
M2 −0.0043 0.0089 −0.0002 −0.0240 0.0004 0.0243 0.0118
M3 −0.0109 −0.0267 −0.0010 −0.0267 −0.0045 0.0007 0.0086
M4 −0.0383 −0.0977 −0.0488 −0.0977 −0.0348 0.0107 0.0347

Here, too, method 3 is usually best. For the frequency, method 4 is second best, but for the amplitude it is method 2. For the frequency, method 2 is usually even worse than method 1. Methods 1 and 4 systematically underestimate the amplitude.

languages: [en] [nl]

//aa.quae.nl/en/reken/frequentie.html;
Last updated: 2016-02-07