Filtering – A practical guide

Finding good information on how filters work, what the different types of filters mean, and how you should filter your data is hard. Lots of explanations only make sense if you have a year or two of electrical engineering education, and most of the rest are just a list of rules of thumb. I want to try to get you to a place where you can test your own filter settings, and show you the importance of the rules of thumb without going into the relatively complex math that is often used to explain filters. Warning: a lot of the code I’m going to use requires the Matlab Signal Processing Toolbox. If you don’t have it, you wont be able to execute the code yourself, but hopefully you’ll still be able to follow along with the logic.

What is a filter?

On the left is a recording from a neuron shown in the "time domain". The two graphs on the right show the same recording in the "frequency domain"

Fig 1. On the left is a recording from a neuron shown in the “time domain”. The two graphs on the right show the same recording in the “frequency domain”

In the context of signal processing, a filter is anything the removes unwanted things from your signal. However, in the context of physiological recordings a filter is nearly always referring to a process which removes a certain frequency range from the signal. Because of that, we need to get comfortable looking at data in both the “time domain” and the “frequency domain”. The “time domain” should be very easy to understand, this is just the normal way of looking at data, where time is on the x axis, and the amplitude of the signal is on the y axis. The “frequency domain” isn’t that hard to understand either, it just tells you what frequencies make up your original signal, so you have frequency on the x axis, and “magnitude” on the y axis. Why is it “magnitude” and not amplitude on the y axis? Because there is another feature we need to consider, which is “phase”. Which, roughly speaking is how much the waves that make up the signal are shifted in time. More concretely, a graph of y = sin(2π x) and a graph of y = sin(2πx + π/2) are similar, but very different, one is the other shifted along the x axis (Fig 2). Or another way of saying it is: they have the exact same frequency components (i.e. they are both 1 Hz signals), but different phases. So if frequency and magnitude were all we had, those two graphs would appear to be indistinguishable. So in order to get a complete picture of the signal, we need to keep track of the phase of each frequency component. “Magnitude” reminds us that we need to remember “phase” as well.
Fig 2. Different phases

Fig 2. A graph of Y = sin(2π X) and a graph of Y = sin(2πX + π/2) have the same frequency components, but different phases

To see this in action, have a look at figure 1. On the left is a typical recording shown in the time domain. On the right the two graphs are the exact same signal shown in the frequency domain. The top left graph shows the magnitude of all the frequency components of the signal (showing that most of the signal is made up of low frequency components, and generally less and less as the frequency goes up). The bottom left graph doesn’t look like much, but we need this information to completely describe the time domain signal. And importantly, thinking about phase is super important when it comes to discussing filters.

There are a huge variety of filters, but one of the simplest ways of dividing them up is to consider what signals they don’t affect, i.e. what signals they let pass through them. So a low-pass filter lets through low frequency signals and a high-pass filter lets through high frequency data. There are also band-pass filters which only let through signals in a particular frequency range and to break convention, band-stop filters, which stop signals inside a particular frequency range. The frequency at which the signal begins to be filtered is called the “corner frequency” or the “cut-off frequency”. When we describe the actions of a filter, we are often show it’s frequency response, which is shows you what the filter does to various frequency components of your data. This can be estimated by applying the filter to your data, viewing it in the frequency domain and then dividing that by your original signal in the frequency domain. However, as your data doesn’t usually contain all frequency components, it better to create white noise, which is a signal that has the same amount of power in all frequency components. We can convert data to the frequency domain by using the Fourier transform (which I’ve talked about before here). In code, that looks like this.

signal = randn(1000,1); %Create white noise
dt = 0.5; %dt is the time between samples
time = 1:dt:1000 ; %create time base

%% Filter the data
%Make a simple running average filter
%Take the mean of the original data in a window
%of the data tempC(i:i+window)
%I know there are faster ways of doing this.

tempC_filt = zeros(size(signal));

window = 5;
for i = 1:length(signal)
end_idx = i+window;
if end_idx > length(signal)
    end_idx = length(signal);
end
tempC_filt(i) = mean(signal(i:end_idx));
end

%% Now convert data to frequency domain

nfft = 2^nextpow2(length(signal)); %FFTs are faster if you perform them
                                  %on power of 2 lengths vectors

f = (1/dt)/2*linspace(0,1,nfft/2+1)'; %Create Frequency vector'

dft      = fft(signal, nfft)/(length(signal)/2); % take the FFT of original
dft_filt = fft(tempC_filt, nfft)/(length(tempC_filt)/2); %take FFT of filters
mag = abs(dft(1:nfft/2+1));                     %Calculate magnitude
mag_filt = abs(dft_filt(1:nfft/2+1));           %Calculate magnitude

loglog(f, mag_filt./mag)
xlabel('Frequency (Hz)')
ylabel('Signal Out / Signal In')

freqresp_of_movingaverage

Fig 3. The frequency response of a running average filter.

The above code produces the graph in figure 3, which shows the the frequency components of the filtered signal divided by the frequency component of the original signal. What it shows us is that the running average filter doesn’t affect signals below 10^-1 Hz (0.1 Hz), but attenuates them above that. It also shows us that a running average filter doesn’t uniformly attenuation signals above 0.1 Hz, but instead filters out signals more strongly at about 0.4 Hz and 0.8 Hz than it does at 0.6 Hz and 1 Hz. This problem is refereed to as “ripples in the stop band”, where the “stop band” is the region of frequencies where the signal is attenuated, and “ripples” are the changing amount of attenuation of your signal with frequency. Compare this the “perfect” filter, where the signal is not attenuated at all up until some frequency, and then the signal is completely obliterated beyond that (I know it’s shown as only being attenuated by 10^-3, but you can’t show 0 on a log axis). This “perfect” filter is what most filters want to be. They want to not alter the signal in frequency ranges of the “pass band”, and completely remove the signal in the stop band. But filters like that are impossible to create in real life.

Now I’ve said a perfect filter doesn’t want to alter signals in the pass-band, and by looking at the graph above, it might appear that so long as you are below about 0.1 Hz you wont affect the signal, but as I alluded to before, we need to consider what the filter does to the phase. We can achieve that by adding only a few more lines to our above code.

signal = randn(1000,1); %Create random time series data
dt = 0.5; %dt is the time between samples
time = 1:dt:1000 ; %create time base

%% Filter the data
%Make a simple running average filter
%Take the mean of the original data in a window
%of the data tempC(i:i+window)
%I know there are faster ways of doing this.

tempC_filt = zeros(size(signal));

window = 5;
for i = 1:length(signal)
end_idx = i+window;
if end_idx > length(signal)
    end_idx = length(signal);
end
tempC_filt(i) = mean(signal(i:end_idx));
end

%% Now convert data to frequency domain

nfft = 2^nextpow2(length(signal)); %FFTs are faster if you perform them
                                  %on power of 2 lengths vectors

f = (1/dt)/2*linspace(0,1,nfft/2+1)'; %Create Frequency vector'

dft      = fft(signal, nfft)/(length(signal)/2); % take the FFT of original
dft_filt = fft(tempC_filt, nfft)/(length(tempC_filt)/2); %take FFT of filters
mag = abs(dft(1:nfft/2+1));                     %Calculate magnitude
mag_filt = abs(dft_filt(1:nfft/2+1));           %Calculate magnitude

phase = angle(dft(1:nfft/2+1));
phase_filt = angle(dft_filt(1:nfft/2+1));

%% Create an ideal filter frequency response
mag_perf = ones(size(mag_filt));
mag_perf(100:end) = 10^-3;

subplot(1,2,1)
loglog(f, mag_filt./mag, f, mag_perf)
xlabel('Frequency (Hz)')
ylabel('Signal Out / Signal In')
legend('Running Average Filter', 'Perfect Filter')

subplot(1,2,2)
semilogx(f, unwrap(phase-phase_filt));
xlabel('Frequency (Hz)')
ylabel('Phase Shift (Rad)')

Fig 4. Magnitude and Phase response of a running average filter

Here we can clearly see that the phase of the signal is being affected at frequencies as low as 0.05 Hz. So while the amplitude of these frequency components isn’t being changed, they are being shifted later in time. So this is not a particularly good filter. Let’s move on in our quest to both understand, and create a better filter.

Filter Poles

Fig 5.  How the frequency response of a filter varies with the number of poles.

Fig 5. How the frequency response of a filter varies with the number of poles.

(From here on in, I’m only going to be talking about low-pass filters, as everyone uses them. But everything I say is applicable to high-pass filters too)

Another way of describing a filter is by the number of “poles” it has. I’m not going to explain to you what a pole is, because it comes from the mathematical analysis of filters, and ultimately, it is irrelevant for understanding what the consequence of the filter is. Simply put a filter with more poles attenuates frequencies more heavily for each Hz you go up. Here a figure will paint a thousand words, so see fig 5. We can create that figure with the below code.

%All our filters will have a corner
%frequence (fc) of 1000 Hz
fc = 1000; % Corner Frequency of 

[b2pole,a2pole] = butter(2,2*pi*fc,'s'); %Create a 2 pole butterworth filter
[h2pole,freq2] = freqs(b2pole,a2pole,4096);     %Calculate Frequency Response

[b6pole,a6pole] = butter(6,2*pi*fc,'s'); %Create a 2 pole butterworth filter
[h6pole,freq6] = freqs(b6pole,a6pole,4096);     %Calculate Frequency Response

%Plot frequency response
loglog(freq2/(2*pi),abs(h2pole));
hold on
loglog(freq6/(2*pi),abs(h6pole));
xlabel('Frequency (Hz)');
ylabel('Magnitude');
legend('2 Pole Butterworth', '6 Pole Butterworth')

As you can see, as we increase the number of poles in our filter the closer it gets to our perfect filter. Or in electronic engineering terms, it has a steeper “roll off”. You might ask “well then why don’t we use 1 million pole filters then?”. Well digitally, we can create very high order filters. However, when it comes to real circuits there are a bunch of reasons why this is difficult, one of which is that it becomes very difficult to find resistors and capacitors of the exact correct value needed. Another reason is that by the time you get up to 8 poles, that is usually good enough. For instance, with an 8 pole Butterworth filter, you attenuate a signal at twice the corner frequency to about 4% of it’s original value. At 10x the corner frequency, the signal is attenuated by about 1 billion fold. But the important thing to note is that even with a filter with a large number of poles it does not remove ALL of the signal above the corner frequency.

Butterworth?

Fig 6. Frequency Response of filters of various types.

Fig 6. Frequency Response of filters of various types.

You may have noticed previously I was talking about “Butterworth” filters. You may have been wondering what that means. There are a litany of other filter types, most notably Bessel and Chebyshev. To give a perfectly accurate definition of the difference between these is beyond me, but the practical difference is a trade off between the steepness of the roll off (for a give number of poles) and the behaviour in the time domain. A Chebyshev filter has the steepest roll off, but the worst behaviour in the time domain, while a Bessel filter has the shallowest roll off, but the best behaviour in the time domain. The Butterworth filter is between the two. Again, let’s look at the frequency response of these filters to get a better idea (Fig 6). From looking at that, you can see why Chebychev filters aren’t used for physiological data, it attenuates signals that you might think you weren’t touching (i.e. ones well below the corner frequency). But it is great for some applications because of it’s very steep roll off. Code Below:

%All our filters will have a corner
%frequence (fc) of 1000 Hz
fc = 1000; % Corner Frequency of 
n_poles = 8;

[zbess, pbess, kbess] = besself(n_poles, 2*pi*fc); %Create 8 pole Bessel
[bbess,abess] = zp2tf(zbess,pbess,kbess);          %Convert to transfer function
[hbess,freqbess] = freqs(bbess,abess,4096);        %Calculate Frequency Response

[bbut,abut] = butter(n_poles,2*pi*fc,'s'); %Create a 8 butterworth filter
[hbut,freqbut] = freqs(bbut,abut,4096);         %Calculate Frequency Response

[bcheb,acheb] = cheby1(n_poles,3,2*pi*fc,'s'); %Create 8 pole Chebychev   cheby1(n,3,2*pi*f,'s');
                                                    % with 3dB of ripple
[hcheb,freqcheb] = freqs(bcheb,acheb,4096);         %Calculate Frequency Response

%Plot frequency response
figure;
loglog(freqbess/(2*pi), abs(hbess))
hold on
loglog(freqbut/(2*pi),abs(hbut));
loglog(freqcheb/(2*pi),abs(hcheb));
xlabel('Frequency (Hz)');
ylabel('Magnitude');
legend('Bessel', 'Butterworth', 'Chebychev')
xlim([10 10000])
ylim([10^-4 2])

Fig 7. Different types of filters in the time domain.

Fig 7. Different types of filters in the time domain.

However, when comparing the Bessel and the Butterworth, it may seem like the Butterworth is clearly superior: The Bessel filter attenuates signals below the corner frequency just like the Chebyshev! But when you look in the time domain, the reason why physiologists nearly always use a Bessel filter becomes obvious. Other filters producing “ringing” after encountering a step-like response. This is because the other filters delay waves of different frequencies by different times (remember, a step like response is created by a sum of a large number of waves). However, the Bessel filter doesn’t do this. The technical way to describe this is that a Bessel filter has linear phase response. This doesn’t mean it affects the phase of all the frequencies in the pass band equally, but that the way if shifts the phase means that the various frequency components aren’t shifted in time by different amounts (think about the fact that one cycle of a slow wave lasts for longer than a faster wave, so if both waves were shifted by half a wave, then slow wave would be shifted by more time). It turns out the Bessel filter is the best possible way to achieve this.

Digitizing – The hidden filter

Fig 8. The affect of digitizing inappropriately

Fig 8. The affect of digitizing a signal with a 32 Hz, 64 Hz and 256 Hz component, inappropriately

So what have we learnt so far? That filters remove unwanted frequency components in your data. That for a given filter type, a filter with a high number of poles has a steeper roll off. Finally, we’ve learnt that different types of filters have different behaviour in the time and frequency domain, and that a Bessel filter has the best behaviour in the time domain. But there is one type of filter we haven’t covered: your digitizer AKA your analog to digital converter AKA your ADC. No, I’m not saying there is a filter built into your ADC, I am saying that the mere act of digitizing your signal filters your data, and if you’re not careful, it can do some very strange things. Don’t believe me, look at this code, and the figure it produces (Fig 8).

%% Original, high sample rate signal
% Let us imagine this is like our analog signal

nfft = 4096;

dt = 0.001; % inter sample time = 0.001s = 1kHz sampling
t = (1:nfft)*0.001; % time vector

% Create signal vector that is the sum of 32 Hz, 64 Hz, and 256 Hz

signal = sin(32*2*pi*t) + sin(2*pi*64*t) + sin(2*pi*256*t);

f = (1/dt)/2*linspace(0,1,nfft/2+1)'; %Create Frequency vector
dft  = fft(signal, nfft)/(length(signal)/2); % take the FFT of original
mag = abs(dft(1:nfft/2+1));                  %Calculate magnitude

semilogx(f, mag, 'LineWidth', 2 );

%% Now lets digitize our faux-analog signal
% Essentially we are down sampling,
% by sampling it every 0.005s = 200 Hz = every 5th sample.

signal_dig = signal(1:5:end);
nfft_dig = 2^nextpow2(length(signal_dig)); %FFTs are faster if you perform them
                                           %on power of 2 lengths vectors


f_dig = (1/(dt*5))/2*linspace(0,1,nfft_dig/2+1)'; %Create Frequency vector
dft_dig = fft(signal_dig, nfft_dig)/(length(signal_dig)/2);
mag_dig = abs(dft_dig(1:nfft_dig/2+1));
hold on
semilogx(f_dig, mag_dig, 'LineWidth', 2);
xlim([10 1000])
xlabel('Frequency (Hz)');
ylabel('Magnitude');
legend('Original Signal', 'Digitized Signal');

Fig 9. By filtering before we digitize, we avoid aliasing our data.

Fig 9. By filtering before we digitize, we avoid aliasing our data.

In the code, we create a signal that is a sum of 32, 64 and 256 Hz sine waves. We’ll think of this signal as being our analog signal. We then “digitize” it as 200 Hz. We then look at the two signals in the frequency domain. In the original signal we see peaks at 32, 64 and 256 Hz, and after we digitize it at 200 Hz, we see a loss of 256 Hz signal, which perhaps isn’t too surprising when you think about it. But something very disturbing has happened, we now have a new signal with a peak at 56 Hz. This signal is a complete phantom and is called “Aliasing”, and it happened because we need to filter before we digitize (See Fig 9B). This is a correlate of the Nyquist-Shannon sampling theorem, which says that any signal which has no frequency component higher than F, can be perfectly reconstructed if it is sampled at 2F. So, inversely, if we digitize a signal at rate 2F, but it has frequency components higher than F, then the digitized signal we end up with is not a faithful representation of the original. Now let’s try to follow the rules. If we add in these lines of code, then we get Fig 9A.

%% Let's filter the signal before we digitize it

[bbut,abut] = butter(8,100/(1000/2));      %Create a 100Hz, 8 pole butterworth filter
signal_filt = filter(bbut, abut, signal);  %Filter the data
signal_filt_dig = signal_filt(1:5:end);
dft_filt_dig = fft(signal_filt_dig, nfft_dig)/(length(signal_filt_dig)/2);
mag_filt_dig = abs(dft_filt_dig(1:nfft_dig/2+1));
semilogx(f_dig, mag_filt_dig, 'LineWidth', 2);


And this is one of the two reasons why we must ALWAYS low-pass filter data before we digitize it: if we don’t, we will get aliases, i.e. spurious frequency components. This is also what causes strange patterns in some photographs, the original data had higher frequencies (i.e. patterns changing in space) than the camera CCD could sample. The other reason we filter BEFORE was digitize is that high frequency noise can sometimes be larger than the signal we want to record, and hence filtering before digitization allows us to amplify the signal as much as possible without saturating the digitizer.

Choosing a corner and sampling frequency

Fig 10. Trying to choose the corner frequency for a Bessel filter isn't obvious.

Fig 10. Trying to choose the corner frequency for a Bessel filter isn’t obvious.

So now we know we need to digitize at least twice as fast as we filter (and in reality probably a bit faster than that, because filters have a roll off, so there is still some signals at X Hz if we filter at X Hz, so I would say we should probably digitize at 3 to 10 times faster than you filter). But how do we choose what frequency we filter at? Just see what frequency components are in the signal we want to record, and set the corner frequency to just above what we want to keep, right? Well let’s give that a go. I’m going to make a faux action potential with a Gaussian with a full width at half maximum of 1 ms, then look at the signal in the frequency domain. Unsurprisingly, the signal has basically no power above 1 kHz, so lets apply an 1kHz 8-pole Bessel filter to the data. And what do we see? Our “action potential” has been reduced by 10% and significantly shifted in time (Fig 10). How the hell did that happen? Well go up and have a close look at figure 6, even though the corner frequency is 1 kHz, you can see that the Bessel filter is attenuating signals down to ~200 Hz. And that is your rule of thumb for an 8-pole Bessel filter: you need to set the corner frequency to 5 times higher than the highest frequency you want to record, for a 4-pole filter you need to be 4 times higher. So in our case, where the highest frequency is about 1kHz, we need to set our 8-pole Bessel filter to about 5 kHz, which means we need to digitize at at least 15 kHz. (I might add that Bessel filters are odd about this. For most filters, the more poles you get the closer a signal can get to the corner frequency without being filtered. However, this is the opposite with a Bessel filter.)

dt = 1/20000; % 20kHz sampling rate
t = 0:dt:0.02; % time vector 4ms long
signal = exp(-(t-0.01).^2/(0.001/sqrt(2*log(2)))^2); %Gaussian with 1ms FWHM
nfft = 2^nextpow2(length(signal)); %FFTs are faster if you perform them
                                   %on power of 2 lengths vectors

f = (1/dt)/2*linspace(0,1,nfft/2+1)'; %Create Frequency vector
dft = fft(signal, nfft)/(length(signal)/2);      %take the FFT of original
mag = abs(dft(1:nfft/2+1));                      %Calculate magnitude

subplot(1,2,1);
semilogx(f, mag, 'LineWidth', 2);
title('Freq Components of Original')
xlabel('Frequency (Hz)');
ylabel('Magnitude');
xlim([30 10000]);
xticks([100, 1000, 10000]);

[bbess, abess] = besself(8, 2*pi*1000); %Create 8 pole 1kHz filter
[num,den] = bilinear(bbess,abess,1/dt);
signalbess1k = filter(num, den, signal); %Apply Filter

[bbess, abess] = besself(8, 2*pi*5000); %Create 8-pole 5kHz filter
[num,den] = bilinear(bbess,abess,1/dt);
signalbess5k = filter(num, den, signal); %Apply Filter

subplot(1,2,2);
plot(t*1000,signal, t*1000, signalbess1k, t*1000, signalbess5k, 'LineWidth', 2);
xlabel('Time (ms)')
ylabel('Amplitude')
legend('Original Signal', 'Filtered Signal (1kHz)', 'Filtered Signal (5kHz)', 'Location', 'south');

I should probably add that these rules of thumb are not absolute (apart from digitizing at greater than twice your filter rate: always do that!). If your filter slightly deforms the shape of your action potential, or slightly delays it in time, this may not matter for your particular experiments. But you should still be aware of the fact that if you’re filtering too close to your signal of interest, you are not recording the “true” signal.

Conclusion/Rules of Thumb

  • A filter removes unwanted frequeny components
  • A low-pass filter removes high frequency signals
  • A filter with more poles has a steeper roll off
  • A Bessel filter has the best behaviour in the time domain
  • For a 4-pole Bessel, you must filter at at least 4 times the frequency of the fastest wanted component of your signal
  • For an 8-pole Bessel, you must filter at at least 5 times the frequency of the fastest wanted component of your signal
  • You must digitize at more than 2-3 times the frequency of your low-pass filter

Leave a Reply

Your email address will not be published. Required fields are marked *