So I want to preface this post by saying that I am not a trained statistician. However, this post relies on a fact that I’m pretty comfortable stating: If the null hypothesis is true (e.g. there is no difference between your populations), then P should only be less than 0.05 5% of the time. That’s half the point of null-hypothesis statistical testing! Lets write some quick python code to demonstrate this.

import numpy as np import matplotlib.pyplot as plt from scipy import stats np.random.seed(1) #set random seed so you get same data n_reps = 10000 #perform 10000 times ps = np.zeros(n_reps) for n in range(n_reps): #make two normally distributed samples, with mean=0, std dev = 1 and n = 10 sample1 = np.random.normal(loc = 0, scale = 1, size = 10) sample2 = np.random.normal(loc = 0, scale = 1, size = 10) #perform t test (t, ps[n]) = stats.ttest_ind(sample1, sample2) weights = np.ones_like(ps)/float(len(ps)) plt.hist(ps, bins = 20, weights = weights) plt.xlabel('P Value') plt.ylabel('Probability') plt.title('P < 0.05 ' + str(np.sum(ps<0.05)/float(len(ps))*100) + '% of the time')So what we do is create two normally distributed random variables, which have the same mean, the same standard deviation and n = 10, we subject them to a t-test, and we repeat this 10,000 times. Because the two samples are drawn from the same distribution, we would expect no difference between them, and the P values we get support that idea: getting P values less than 0.05 are relatively rare. In fact, the P value is less than 0.05 about 5% of the time, which is what leads people to say (incorrectly) that "if we reject the null hypothesis when P < 0.05, then we will only be wrong 5% of the time". The important thing is that when testing samples where you know the null hypothesis is true, the P value should be uniformly distributed. If that is not the case, something is wrong. So what about when the null hypothesis isn't true? Let's change the code a little and see.

np.random.seed(1) #set random seed so you get same data n_reps = 10000 #perform 10000 times ps = np.zeros(n_reps) for n in range(n_reps): #make two normally distributed samples, with mean=0 or 1, std dev = 1 and n = 10 sample1 = np.random.normal(loc = 0, scale = 1, size = 10) sample2 = np.random.normal(loc = 1, scale = 1, size = 10) #perform t test (t, ps[n]) = stats.ttest_ind(sample1, sample2) fig, ax = plt.subplots(1, 2, figsize=(10,4)) weights = np.ones(10000)/float(10000) ax[0].hist(np.random.normal(loc = 0, scale = 1, size = 10000), bins = 50, weights = weights, alpha = 0.5) ax[0].hist(np.random.normal(loc = 1, scale = 1, size = 10000), bins = 50, weights = weights, alpha = 0.5) ax[0].set_xlabel('Sample value') ax[0].set_ylabel('Probability') ax[0].set_title('Sample Distributions') weights = np.ones_like(ps)/float(len(ps)) ax[1].hist(ps, bins = 20, weights = weights) ax[1].set_xlabel('P Value') ax[1].set_ylabel('Probability') ax[1].set_title('P < 0.05 ' + str(round(np.sum(ps<0.05)/float(len(ps))*100,2)) + '% of the time')

So now we repeatedly perform a t-test on two samples drawn from two different populations. What we see is that now the distribution of P values is skewed, and P is less than 0.05 about 56% of the time, meaning that we have a power of 56% to detect the difference between the two populations given our sample size.

The point I want to get across is that the distribution of P values is uniform when the null hypothesis is true, and it's skewed when the null hypothesis is false.

So now what happens if our data isn't drawn from a normal distribution? Well let's change our code again and see.

import numpy as np import matplotlib.pyplot as plt from scipy import stats np.random.seed(1) #set random seed so you get same data n_reps = 10000 #perform 10000 times ps = np.zeros(n_reps) for n in range(n_reps): #make two log normally distributed samples from the same population. sample1 = np.random.lognormal(mean = 2, sigma = 1.5, size = 10) sample2 = np.random.lognormal(mean = 2, sigma = 1.5, size = 10) #perform t test (t, ps[n]) = stats.ttest_ind(sample1, sample2) fig, ax = plt.subplots(1, 2, figsize=(10,4)) #plot the distribution out data was sampled from ax[0].hist(np.random.lognormal(mean = 2, sigma = 1.5, size = 1000), bins = 50) ax[0].set_xlabel('Sample value') ax[0].set_ylabel('Frequency') ax[0].set_title('Sample Distribution') weights = np.ones_like(ps)/float(len(ps)) ax[1].hist(ps, bins = 20, weights = weights) plt.xlabel('P Value') plt.ylabel('Probability') plt.title('P < 0.05 ' + str(np.sum(ps<0.05)/float(len(ps))*100) + '% of the time')

So now our samples are drawn from a lognormal distribution with a heavy positive skew. When we perform 10,000 t-tests we see that our distribution of P values is no longer uniform. This tells us that something isn't right. But does it means we're more likely to make type I error (rejecting the null hypothesis when we shouldn't/false positive) or a type II error (accepting the null hypothesis when we shouldn't/false positive). Well our simulation tested a case where the null hypothesis was true (i.e. there was no difference) and we would reject the null hypothesis when P is less than 0.05. Now P is less than 0.05 about 2.5% of the time, so our type I error rate has actually gone down. But what about the type II error rate? Well we need to run another simulation to see.

np.random.seed(1) #set random seed so you get same data n_reps = 10000 #perform 10000 times norm_ps = np.zeros(n_reps) log_ps = np.zeros(n_reps) mean1 = 1 mean2 = 2 variance = 1 for n in range(n_reps): #make two normally distributed samples, with mean=0, std dev = 1 and n = 10 norm_sample1 = np.random.normal(loc = mean1, scale = np.sqrt(variance), size = 10) norm_sample2 = np.random.normal(loc = mean2, scale = np.sqrt(variance), size = 10) log_sample1 = np.random.lognormal(mean = np.log(mean1/np.sqrt(1+variance/mean1**2)), sigma = np.sqrt(np.log(1+variance/mean1**2)), size = 10) log_sample2 = np.random.lognormal(mean = np.log(mean2/np.sqrt(1+variance/mean2**2)), sigma = np.sqrt(np.log(1+variance/mean2**2)), size = 10) #perform t test (t, norm_ps[n]) = stats.ttest_ind(norm_sample1, norm_sample2) (t, log_ps[n]) = stats.ttest_ind(log_sample1, log_sample2) fig, ax = plt.subplots(2, 2, figsize=(10,10)) weights = np.ones(10000)/float(10000) ax[0,0].hist(np.random.normal(loc = 0, scale = 1, size = 10000), bins = 50, weights = weights, alpha = 0.5, label="Sample1") ax[0,0].hist(np.random.normal(loc = 1, scale = 1, size = 10000), bins = 50, weights = weights, alpha = 0.5, label="Sample2") ax[0,0].legend(prop={'size': 10}) ax[0,0].set_xlabel('Sample value') ax[0,0].set_ylabel('Probability') ax[0,0].set_title('Normal Population Distributions') weights = np.ones_like(norm_ps)/float(len(norm_ps)) ax[0,1].hist(norm_ps, bins = 20, weights = weights) ax[0,1].set_xlabel('P Value') ax[0,1].set_ylabel('Probability') ax[0,1].set_title('P < 0.05 ' + str(round(np.sum(norm_ps<0.05)/float(len(norm_ps))*100,2)) + '% of the time') weights = np.ones(10000)/float(10000) ax[1,0].hist(np.random.lognormal(mean = np.log(mean1/np.sqrt(1+variance/mean1**2)), sigma = np.sqrt(np.log(1+variance/mean1**2)), size = 10000), weights = weights, alpha = 0.5, bins = np.linspace(0,20,100), label="Sample1") ax[1,0].hist(np.random.lognormal(mean = np.log(mean2/np.sqrt(1+variance/mean2**2)), sigma = np.sqrt(np.log(1+variance/mean2**2)), size = 10000), weights = weights, alpha = 0.5, bins = np.linspace(0,20,100), label="Sample2") ax[1,0].legend(prop={'size': 10}) ax[1,0].set_xlabel('Sample value') ax[1,0].set_ylabel('Probability') ax[1,0].set_title('Lognormal Population Distributions') weights = np.ones_like(log_ps)/float(len(log_ps)) ax[1,1].hist(log_ps, bins = 20, weights = weights) ax[1,1].set_xlabel('P Value') ax[1,1].set_ylabel('Probability') ax[1,1].set_title('P < 0.05 ' + str(round(np.sum(log_ps<0.05)/float(len(log_ps))*100,2)) + '% of the time')

So what we've done is run 10,000 simulations of two experiments. In one we take samples from two normal populations, both with a standard deviation of 1, and means of 1 and 2 respectively. In the other, we take samples from two lognormal populations, again, with a standard deviation of 1, and means of 1 and 2 respectively. As before, when we sample from a normal distribution, our t-test has a power of about 55%. However, in the case of our lognormal distribution we end up with a power of 65%. So does that mean people have been lying to you, and that doing statistics on non-normal data is actually better?. That would be the wrong conclusion, because if we transform the data, we can get even higher power. Specifically, if we log our lognormal data, and make it normally distributed, our power goes up significantly. Check it out:

np.random.seed(1) #set random seed so you get same data n_reps = 1000 #perform 10000 times norm_ps = np.zeros(n_reps) log_ps = np.zeros(n_reps) mean1 = 1 mean2 = 2 variance = 1 for n in range(n_reps): log_sample1 = np.log(np.random.lognormal(mean = np.log(mean1/np.sqrt(1+variance/mean1**2)), sigma = np.sqrt(np.log(1+variance/mean1**2)), size = 10)) log_sample2 = np.log(np.random.lognormal(mean = np.log(mean2/np.sqrt(1+variance/mean2**2)), sigma = np.sqrt(np.log(1+variance/mean2**2)), size = 10)) #perform t test (t, log_ps[n]) = stats.ttest_ind(log_sample1, log_sample2, equal_var=False) fig, ax = plt.subplots(1, 2, figsize=(10,4)) weights = np.ones(10000)/float(10000) ax[0].hist(np.log(np.random.lognormal(mean = np.log(mean1/np.sqrt(1+variance/mean1**2)), sigma = np.sqrt(np.log(1+variance/mean1**2)), size = 10000)), weights = weights, alpha = 0.5, bins = np.linspace(-4,4,100), label="Sample1") ax[0].hist(np.log(np.random.lognormal(mean = np.log(mean2/np.sqrt(1+variance/mean2**2)), sigma = np.sqrt(np.log(1+variance/mean2**2)), size = 10000)), weights = weights, alpha = 0.5, bins = np.linspace(-4,4,100), label="Sample2") ax[0].legend(prop={'size': 10}) ax[0].set_xlabel('Sample value') ax[0].set_ylabel('Probability') ax[0].set_title('Log(Lognormal Population Distributions)') weights = np.ones_like(log_ps)/float(len(log_ps)) ax[1].hist(log_ps, bins = 20, weights = weights) ax[1].set_xlabel('P Value') ax[1].set_ylabel('Probability') ax[1].set_title('P < 0.05 ' + str(round(np.sum(log_ps<0.05)/float(len(log_ps))*100,2)) + '% of the time')

So transforming the data to make it normal has increased our power. So basically, working with lognormal data decreases our type I error rate, but INCREASES our type II error rate, relative to normal data.

Okay, so now we know how to see what kind of effect sampling simulated data from perfect distributions has on our statistics. But can we look at our own data and get an idea of how much of an effect it's non-normality will have on our results? Normality tests like the Anderson-Darling test aren't much use here. If your sample size is small, normality tests will nearly always tell you you data is normal, and if your sample size is large, even negligible deviations from normality will cause the test to tell you that your data is not normal. In reality, we don't want a test to tell you whether your data is exactly, perfectly normal. You want to know whether its deviations from normality matter. So what we can do instead is generate a simulated population that matches our data, then draw random samples from it. Thankfully this is actually pretty easy to in python, though it does require a bit of care.

import numpy as np import matplotlib.pyplot as plt from sklearn.neighbors import KernelDensity data = np.array([0.9390851, 0.9666265, 0.7483506, 0.9564947, 0.243594, 0.9384128, 0.3984024, 0.5806361, 0.9077033, 0.9272811, 0.9013306, 0.9253298, 0.9048479, 0.9450448, 0.8598598, 0.8911161, 0.8828563,0.9820026,0.9097131,0.9449918,0.9246619,0.9933545,0.8525506,0.791416,0.8806227,0.8839348,0.8120107,0.6639317,0.9337438,0.9624416,0.9370267,0.9568143,0.9711389,0.9332489,0.9356305,0.9160964,0.6688652,0.9664063,0.4913768]).reshape(-1,1) x = np.linspace(-1, 2, 100).reshape(-1,1) fig, ax = plt.subplots(1,3, figsize=(10,4), sharey=True) bw = [0.25, 0.05, 0.01] ax[0].set_ylabel('Normalized PDF') for n in range(ax.size): #Create our estimated distribution kde = KernelDensity(kernel='gaussian', bandwidth=bw[n]).fit(data) #Plot our estimated distribution plt.sca(ax[n]) ax[n].fill(x, np.exp(kde.score_samples(x))/np.max(np.exp(kde.score_samples(x))), '#1f77b4') ax[n].plot(data, np.random.normal(loc = -0.05, scale = 0.01, size= data.size), '+k') ax[n].set_xlim(-0.5, 1.5) plt.xticks([0, 1]) ax[n].set_ylim(-0.1, 1.1) ax[n].set_title('Bandwidth = '+str(bw[n])) ax[n].set_xlabel('Sample value')

What we've done is use the `KernelDensity`

function from the scikitlearn package, which generates a probability density function by summing up lots of little functions AKA a kernel, where each data point is. The choice of an appropriate kernel is key to generating a good probability density function. In this case we've chosen a Gaussian kernel, but we can still alter it's "bandwidth" (basically it's standard deviation). I have some data that is the orientation selectivity index of some axon terminals (the data points along the bottoms of the graphs). This measure can only take on values between 0 and 1. We can see that using a bandwidth of 0.25 has done a bad job of modelling our population, as it's saying values above 1 are quite likely, and that the distribution is basically symmetrical. Using a bandwidth of 0.01 has also done a bit of a bad job, as you can see it's made a multi-modal distribution, which I'm pretty sure isn't accurate. Using a bandwidth of 0.05 seems to have done a pretty good job, producing a relatively smooth curve which seems to model out data well. So with that decided, let's use that probability density function to sample some data, and do some t-tests when the null hypothesis is true

import numpy as np import matplotlib.pyplot as plt from scipy import stats from sklearn.neighbors import KernelDensity np.random.seed(1) #set random seed so you get same data n_reps = 10000 #perform 10000 times ps = np.zeros(n_reps) data = np.array([0.9390851, 0.9666265, 0.7483506, 0.9564947, 0.243594, 0.9384128, 0.3984024, 0.5806361, 0.9077033, 0.9272811, 0.9013306, 0.9253298, 0.9048479, 0.9450448, 0.8598598, 0.8911161, 0.8828563,0.9820026,0.9097131,0.9449918,0.9246619,0.9933545,0.8525506,0.791416,0.8806227,0.8839348,0.8120107,0.6639317,0.9337438,0.9624416,0.9370267,0.9568143,0.9711389,0.9332489,0.9356305,0.9160964,0.6688652,0.9664063,0.4913768]).reshape(-1,1) x = np.linspace(-1, 2, 100).reshape(-1,1) kde = KernelDensity(kernel='gaussian', bandwidth=0.05).fit(data) for n in range(n_reps): sample1 = kde.sample(10) sample2 = kde.sample(10) (t, ps[n]) = stats.ttest_ind(sample1, sample2) fig, ax = plt.subplots(1, 2, figsize=(10,4)) #plot the distribution out data was sampled from weights = np.ones(10000)/float(10000) ax[0].hist(kde.sample(10000), bins = 50, weights = weights) ax[0].set_xlabel('Sample value') ax[0].set_ylabel('Probability') ax[0].set_title('Population Distribution') weights = np.ones_like(ps)/float(len(ps)) ax[1].hist(ps, bins = 20, weights = weights) plt.xlabel('P Value') plt.ylabel('Probability') plt.title('P < 0.05 ' + str(np.sum(ps<0.05)/float(len(ps))*100) + '% of the time')

So like when we were dealing with non-normal data before, our type I error rate has gone down. So you can bet our type II error rate is higher than it needs to be. But let's prove it. First, I think that transforming our data by taking a large number, say 100, and raising it to the power of our data will make the data more normal. If that's true, out P value distribution in the null case should be more uniform, and the power in the non-null case should be higher. I'll spare you the code, though it's available here. But the result is what I predicted, as you can see below.

So by applying our transform we've seen how having non-normal data decreases our type I error rate, but increases our type II error rate. For what it's worth, with the kind of data I had, trying to transform it to normality was a bad idea, as it will never become normal, and I'd be better off using a different model altogether, but the underlying concept remains: you can use `KernelDensity`

to allow you to draw data from your own distribution.

Finally, there is one more thing I want to cover. Some statisticians like to point out that when performing good old fashioned t-tests and ANOVAs, there is **no** assumption of normality: the data can follow any distribution and hypothesis testing will be legitimate. In certain cases, this is true. Specifically, if the sample size is big enough, even wildly non-normal data will behave well in parametric tests. However, even with relatively tame non-normal data, the number of samples require before the t-test is well behaved can be very large, well outside the range that most physiologists will be able to achieve. The figure below shows that. I've done what I've done before and performed 10,000 t-tests on data from the same population, with varying sample sizes. Even with a sample size of 100, the distribution of p values isn't uniform.

I should point out that there are a bunch of caveats here. For instance, if you're the first person to measure something, and you have n = 5, then you probably can't usefully use KernelDensity to construct a population distribution, as you have no idea what it looks like. For much the same reason, you can't use the techniques I've demonstrated to measure the power of your hypothesis testing based on your samples: power analysis relies on know *population* variables, not *sample* variables... There are probably other caveats, I'll let you know once people on twitter tell me.

So in conclusion, if hypothesis testing is working properly, when the null hypothesis is true, P values should be uniformly distribution. If the null hypothesis is not true the distribution of P values is skewed. For P values to mean what you think they mean, when you perform standard t-tests and ANOVAs your data needs to be normally distributed unless you have very large sample size. With a bit of care you can use `KernelDensity`

to simulate your population of interest. Then you can perform t-tests on the null-hypothesis to simply look at your type I error rate. And so long as you have a way of transforming your data to make it normal, you can then see what effect doing analysis on your untransformed data has on your type II error rate. Often you don't know how to transform your data to make it normal and that is why simulating the null hypothesis is so useful.

So Nonnegative Matrix Factorization (NMF) can be explained in a variety of ways, but I think the simplest way is that NMF reveals your signal’s component parts. That is to say, if your signal is the sum of a variety of other signals, then NMF will reveal those underlying signals. To me, this naturally makes me think of signals that vary over time, but NMF can be used on a variety of other signals, like images, or text. However, in order to get to the examples, we need to understand how NMF works. So lets break NMF down, word by word. (If you’re fine with the concepts of matrix multiplication and factorization, skip down to “Example 1 – Time series“).

**Nonnegative**

This is the easiest bit to explain. NMF only works on signals that are always positive. I’ve said that NMF splits a signal up into the individual elements that are summed up to make the signal. Because the signal must always be positive, the individual elements must also be positive. However, you can take a negative signal, add some positive value to it and it becomes positive. So then you can use NMF on it, right? Well not always. What is more important to think about is the individual elements. To make your signal, are the individual elements always added together, or is one element subtracted from another? If it is the former, then NMF may work, even if the signal is always negative (once you add a constant to the data to make it positive). If it is the later, NMF will never work. For example, lets say we were interested in the amount of a contaminant in lake. Every day various factories dumped discharge into the lake. Perhaps we could use NMF to figure out how each factory affected the amount of contaminant in the lake (because the total contaminant is a sum of all the discharges from each factory). However, if the lake was sometimes cleaned of the contaminant, NMF probably isn’t appropriate, as we now have a subtractive signal mixed in with our additive signal.

**Matrix**

Even the word “matrix” may intimidate some people. There is nothing magical about matrices. They are just a collection of numbers arranged in a square/rectangle. And just like there are rules for how you multiply numbers, there are rules for how you multiply matrices, rules that you need to understand in order to understand NMF. While these rules may seem a little weird if you’re only used to multiplying numbers, thankfully, they’re not very difficult to remember. It’s easiest to just see the rules in action. Concretely, if we have two matrices, X and Y where:

Then we calculate X × Y by doing the following

So if we look at element *a* in the product matrix X × Y (i.e. the number in first column, first row), we see it is somehow related to the first row of matrix X, and the first column of matrix Y. We see that the first element in the first row of matrix X is multiplied by the first element in the first column of matrix Y, and this is added to the product of the second element in the first row of matrix X and the second element in the first column of matrix Y. It’s worth noting at this point that this means that X × Y produces a matrix that is NOT the same as Y × X.

If you’ve never come across matrix multiplication before, this may all seem very strange, very pointless and very annoying. But let me show *one* way of thinking of matrix multiplication that hopefully will make it make some sense, or at least give it some utility. Let us say that matrix X is some data, we can then see matrix Y and set of weights, where X × Y is now a *sort of* weighted average of X. How does this work? Well, if the first row of X is a set of obervations, then the first column of Y is our weights. As we move across the elements in that row of X (our data), we look to the corresponding element in the column of Y (our weights), and multiply the two together. If the element in Y is large, then that element of X is amplified. If the element in Y is zero, then that element of X is ignored. But lets look at a concrete example. We have matrices X and Y,

and we multiply them together:

Because the first element of the first column of Y is 1, we then apply this weighting to the first element of the first row of X. However, the second element of the first column of Y is zero. This means we apply a zero weight to second element of the first row of X. We sum these two products (3 × 1 + 1 × 0 = 3) and we put the result in the first row, first column of X × Y. When we go down to the second row of X, we again apply the weights of the first column of Y, and the element in the second row, first column of X × Y is 2 × 1 + 5 × 0 = 2. Notice, because the first column of Y was [1 0] now the first column of X × Y is [3 2], i.e. we applied a weighting of 1 to the first column of X, and a weighting of zero to the second column, and so we just reproduced first column of X. In order to calculate what will be in the second column of X × Y, we apply the weights see in the second column of Y, to the data in X.

So to put this really succinctly, the element in the ith row, jth column of the matrix X × Y is the data in the ith row of X, weighted by the weights in the jth column of Y. If any element, say element n, in the jth column of Y is much much larger than all the other elements in that column, then then jth column of X × Y will essentially be equal to the nth column of X, multiplied by the value of that nth element in Y. I’m going restate that idea again, because it is super important later on: *As we travel down a column in Y, if any element is much much larger than all the others, then that same column of X × Y will be essentially equal to the equivalent column of X, multiplied by a constant*.

This may all be a lot to take in, so as hopefully a simple exercise that will help you see how when you calculate X × Y, Y acts like a set of weights. What is the product of these two matrices:

**Factorization**

Hopefully you haven’t completely forgotten high school math, but if you have, factorization is solving this problem: if we have a number *a*, what are all the values of *x* and *y* such that *x × y = a*. So, the factors of the number 4 are 1, 4 and 2, as 1 × 4 = 4, and 2 × 2 = 4. These are strictly speaking, the positive integer factors of 4. There are an infinite number of non-integer factors, e.g. 0.25 × 16 = 4, -0.1 × -40 = 4 etc… We can factor matrices just the same. The factors of matrix *Z* are all the matrices such that *X* and *Y* such that *X × Y = Z*. And just like factorizing numbers, there can be a large number of factors for a given matrix, even an infinite number. But if you’re new to matrices there is a reason that there can be a large number factors for a given matrix that might not be immediately obvious: size.

If our matrix Z is 2 x 2 matrix, what sizes can the factors of Z be? Well any size, so long as X has 2 rows, and Y has 2 columns, and the “inner dimension” of X and Y match (“inner dimension” is the number of columns X has and the number of rows Y has) there may be factors of any size, see:

Matrix factorization (sometimes called “decomposition”) is a complex field, there are tonnes of approaches to factor matrices, with different goals and applications, but the different ways of factorizing a matrix all do fundamentally the same thing, you give it a matrix, and it gives you two (or more) matrices, that when multiplied together give you your original matrix *or close approximation to it*.

Approximation? How can you say two matrices are a factor of another matrix if it is only an approximation? Well it turns out a) factoring matrices is often very computationally complex b) some matrices don’t even have factors, especially given constraints like nonnegativity and c) an approximation is often good enough.

Alright, enough background, let’s get into how this works.

**Example 1 – Time series**

So earlier I gave a made up example: There are 3 factories that dump contaminants into a lake. Whether the factory dumps each day is random (so not all factories are active every day, in fact, if all factories were active every day, we couldn’t tease apart their signals). Each factory dumps a different amount of the contaminant with a different time course. We measure the concentration of the contaminant in the lake. We collect data for 1000 days. Can we figure out what each factories outflow looks like? With NMF we can.

First lets simulate our data:

rng(0); %The output of each factory factory1 = [0; 0; 9; 5; 3; 2; 1; 0; 0; 0; 0; 0]; factory2 = [0; 0; 0; 0; 0; 3; 2; 1; 1; 0; 0; 0]; factory3 = [0; 5; 5; 6; 6; 7; 4; 2; 1; 0.5; 0; 0]; %a matrix to store the all, for ease later on. allfactories = horzcat(factory1, factory2, factory3); num_days = 1000; % we collect data for 100 days data = zeros(12,num_days); % preallocate matrix to store our data h = figure(); set(h, 'MenuBar', 'none'); set(h, 'ToolBar', 'none'); subplot(2,1,1); plot(allfactories); xlabel('Time (hours)'); ylabel('Factory output'); legend('Factory 1', 'Factory 2', 'Factory 3'); title('Individual Factories') for d = 1:num_days which_factories = boolean(randi([0 1], 1, 3)); %Randomly decide which factory discharges today output_for_day = sum(allfactories(:, which_factories), 2); %sum the output of active factories data(:,d) = output_for_day; end subplot(2,1,2); plot(output_for_day); hold on; plot(allfactories(:, which_factories), '--', 'Color', [0.5 0.5 0.5]); hold off xlabel('Time (hours)'); ylabel('Amount in lake'); legend('Total Output', 'Individual Output'); title('Data from day 1000')

Which should make a figure like this, where on the top we have the output of each factory, and on the bottom we have the concentration of contaminant in the lake on an arbitrary day, and how each factory contributed (only two factories dumped that day).

Now we’re going to do our NMF. It’s VERY important to think about this carefully. We need to lay our data out right, or the output of your NMF will be garbage. Specifically, each column of our data matrix needs to be a single observation, while each row of our matrix is the same data from a different observation. So in our example, each column is data from a different day, and each row is a different hour.

Why this is so crucial is that we can only understand the matrix factorization if we understand the data. Specifically, if we organize the data in this way, then when we factorize our data matrix such that *W* × *H* = Data, then the columns of *W* make up our signals that sum to make our data, and *H* make up our weights (in NMF, the factors are always called *W* and *H*, and I don’t know why). Have a look at this figure, and maybe re-read the section on matrix multiplication if you don’t see why.

Running NMF is simply a one line command in MATLAB.

[W, H] = nnmf(data, 3, 'replicates', 100);

Let’s just go over the one line in detail, we are saying we want to perform NMF on the matrix *data*, and return the factors *W* and *H*. The second argument *3* is stating how many individual signals we are trying to break the data down into. In our example, we *know* there are three factories. So we know 3 is a good number. Sometimes in real life, you wont know what value to use here, and it’s beyond the scope of this article to talk about how to choose values. The second and third argument is a parameter-value pair, where I am telling the algorithm to perform the factorization 100 times, and then choose the best one. This can be important. NMF is solved numerically, with the computer first having a random guess at the solution, and then iteratively improving it. Depending on that first guess, the final solution can vary significantly. Hence it is important to have many initial guesses and only choose the best solution.

So we run that line and what do we expect? W×H should equal our data, H should be a matrix filled with 0s and 1s, and the matrix W should be identical to our matrix *allfactories*. Why? Well W×H should equal our data because that is the whole point of NMF: finding two matrices that when multiplied together make our data. We expect H to be full of 0s and 1s, because those are the weights which say on a given day (column) which factories were dumping. Finally, we expect W to be equal to *allfactories*, as *allfactories* is a matrix containing the possible outputs of each factory, i.e. the individual signals that make up our data.

So what do we see? Well it certainly seems like W×H very closely approximates our data:

But matrix H is not full of 0s and 1s. It seems to be full of numbers close to 0 and numbers around 0.045. This is because there are many ways to factor a number, i.e. 0.5 × 8 = 4, just like 2 × 2 does. And in this case, the NMF algorithm had no idea that we expected matrix H to be 0s and 1s, so it did the best it could. For a similar reason, the values in matrix W seem too large. However, this is easy to fix: we simply scale matrix H by dividing it by its maximum value, and scale matrix W by multiplying it by the same value. And lo and behold, the columns of W now contain the estimate of the output of each factory and columns of H tells us on which days each factory dumped.

max(H(:)) ans = 0.0507 % This isn't good scale_factor = max(H(:)); H = H./scale_factor; W = W.*scale_factor; h3 = figure(); set(h3, 'MenuBar', 'none'); set(h3, 'ToolBar', 'none'); plot(W, 'Color', [0.4 0.4 0.7], 'LineWidth', 2) hold on plot(allfactories, 'Color', [0.7 0.4 0.4], 'LineWidth', 2) xlabel('Time (hour)') ylabel('Contaminant amount') legend('Original Factory Ouput','','', 'NMF Approximation','','');

You can see that the approximation isn’t perfect, we could improve that possibly by running the nnmf() function with more replicates, by allowing it more iterations before it terminates and certainly by collecting more data. But nonetheless, I hope you see the importance of what has just happened, with nothing more than the raw data, an assumption that the data was a sum of positive signals, and the knowledge of how many signals there are, NMF has almost perfectly pulled apart those signals.

**Example 2 – Calcium Imaging**

I wont bore you with the code, but I made a model of some calcium imaging data, where I have 5 cells glowing, and their brightness changes over time. Like this:

Conceptually, you should be able to see how each frame of this video is basically a sum of the activity of these five cells. So we should be able to use NMF to factorize this data into a W matrix that contains a picture of each cell, and an H matrix that contains information related to how bright each cell is in time. However, there is a complication. Like any normal person would, I have stored this movie as a [height] x [width] x [number of frames] matrix, i.e. a 3D matrix. NMF only works on 2D matrices. So what do we do? Well have have to reshape our movie so that each column is one frame, and each row is a particular pixel, i.e. the first column is the first frame, and the first row is the top left pixel, like this:

Once we’ve done that, NMF should be able to factorize the matrix like this:

This may seem a bit hard to grasp at first, but go back to the factory example: there, each column of W was the output of each factory, and H told us if the factory was active or not. This is very similar, W is the “shape” of each cell, and H tells us how active that cell is. After we run NMF, we simply reshape each column of W back into a height x width matrix to see each cell, and we plot each row of H to see how active it is.

So when we run the NMF on that movie above, we can extract the following data: the images on the left are the reshaped columns of matrix W, and the graphs on the right are the rows of matrix H (again, I had to apply some scaling factor to W and H).

Now while I’m not suggesting that everyone drops their hand drawn ROIs just yet, you’ve got to admit that one line of matlab code achieving that is impressive. If you want to have a go yourself, the script is available here.

**Example 3 – Faces**

So now we’re going to do something weird. We’re going to try to factorize faces. You can *kinda* see how a face can be seen as a sum of different parts, e.g. eyes, noses, mouth. We have a dataset here called the “Frey Faces” which are 2000 images of Prof. Brendan Frey making faces at a camera.

After some thought, you should be able to see that NMF is unlikely to pull out images of eyes, noses etc, because every image contains eyes, noses etc, i.e. these aren’t the individual signals that are summed to make each photo. What changes are the expressions. So what we should pull out are the signals that vary with each expression. It’s also worth pointing out that, unlike the 3 factory or 5 cell example, in this data set, we don’t know the number of individual signals. But have a quick look at the dataset, I’m going to say there are 8 primary facial expressions. Like always, we need to organize our data, in this case so that each column represents one face, and each row represents a given pixel. Running the NMF, and then reshaping the columns of W back to the right size shows us the supposed individual signals that make up the total.

Some of these individual signals seem believable, others a little strange, but lets roughly gauge how well we did by comparing the original faces to those reconstructed when we multiply H by W.

I believe it is fair to say that these are convincing, though not perfect, estimates of the original.

Obviously, there is no scientific breakthrough here. But I just wanted to show how varied applications of NMF can be.

**Summary**

So NMF is a technique that splits your data up into a set of individual signals and weights to apply to those signals to recreate your original data. It works best when your data is truly a sum of individual signals (e.g. the factory and calcium imaging examples) but it will still work to some degree when the data isn’t quite like that (the faces). There are some caveats when it comes to applying NMF that I have mentioned (e.g. the data needs to be positive, and laid out in an appropriate matrix) but a variety I haven’t mentioned (what is the relationship between the number of observations/columns in the data matrix and the number of individual signals you can decompose the data into?). But this wasn’t meant to be a comprehensive discussion of NMF, just something to get you started.

If you want to see some published application of NMF and to make it clear that I don’t think I’m the first person to invent the idea of using NMF to probe calcium imaging data or faces, have a look at this paper “Simultaneous Denoising, Deconvolution, and Demixing of Calcium Imaging Data” from the Paninski lab or “Learning the parts of objects by non-negative matrix factorization” from Lee & Seung.

]]>**What is a filter?**

To see this in action, have a look at figure 1. On the left is a typical recording shown in the time domain. On the right the two graphs are the exact same signal shown in the frequency domain. The top left graph shows the magnitude of all the frequency components of the signal (showing that most of the signal is made up of low frequency components, and generally less and less as the frequency goes up). The bottom left graph doesn’t look like much, but we need this information to completely describe the time domain signal. And importantly, thinking about phase is super important when it comes to discussing filters.

There are a huge variety of filters, but one of the simplest ways of dividing them up is to consider what signals they don’t affect, i.e. what signals they let *pass* through them. So a low-pass filter lets through low frequency signals and a high-pass filter lets through high frequency data. There are also band-pass filters which only let through signals in a particular frequency range and to break convention, band-stop filters, which stop signals inside a particular frequency range. The frequency at which the signal begins to be filtered is called the “corner frequency” or the “cut-off frequency”. When we describe the actions of a filter, we are often show it’s frequency response, which is shows you what the filter does to various frequency components of your data. This can be estimated by applying the filter to your data, viewing it in the frequency domain and then dividing that by your original signal in the frequency domain. However, as your data doesn’t usually contain all frequency components, it better to create white noise, which is a signal that has the same amount of power in all frequency components. We can convert data to the frequency domain by using the Fourier transform (which I’ve talked about before here). In code, that looks like this.

signal = randn(1000,1); %Create white noise dt = 0.5; %dt is the time between samples time = 1:dt:1000 ; %create time base %% Filter the data %Make a simple running average filter %Take the mean of the original data in a window %of the data tempC(i:i+window) %I know there are faster ways of doing this. tempC_filt = zeros(size(signal)); window = 5; for i = 1:length(signal) end_idx = i+window; if end_idx > length(signal) end_idx = length(signal); end tempC_filt(i) = mean(signal(i:end_idx)); end %% Now convert data to frequency domain nfft = 2^nextpow2(length(signal)); %FFTs are faster if you perform them %on power of 2 lengths vectors f = (1/dt)/2*linspace(0,1,nfft/2+1)'; %Create Frequency vector' dft = fft(signal, nfft)/(length(signal)/2); % take the FFT of original dft_filt = fft(tempC_filt, nfft)/(length(tempC_filt)/2); %take FFT of filters mag = abs(dft(1:nfft/2+1)); %Calculate magnitude mag_filt = abs(dft_filt(1:nfft/2+1)); %Calculate magnitude loglog(f, mag_filt./mag) xlabel('Frequency (Hz)') ylabel('Signal Out / Signal In')The above code produces the graph in figure 3, which shows the the frequency components of the filtered signal divided by the frequency component of the original signal. What it shows us is that the running average filter doesn’t affect signals below 10^-1 Hz (0.1 Hz), but attenuates them above that. It also shows us that a running average filter doesn’t uniformly attenuation signals above 0.1 Hz, but instead filters out signals more strongly at about 0.4 Hz and 0.8 Hz than it does at 0.6 Hz and 1 Hz. This problem is refereed to as “ripples in the stop band”, where the “stop band” is the region of frequencies where the signal is attenuated, and “ripples” are the changing amount of attenuation of your signal with frequency. Compare this the “perfect” filter, where the signal is not attenuated at all up until some frequency, and then the signal is completely obliterated beyond that (I know it’s shown as only being attenuated by 10^-3, but you can’t show 0 on a log axis). This “perfect” filter is what most filters want to be. They want to not alter the signal in frequency ranges of the “pass band”, and completely remove the signal in the stop band. But filters like that are impossible to create in real life.

Now I’ve said a perfect filter doesn’t want to alter signals in the pass-band, and by looking at the graph above, it might appear that so long as you are below about 0.1 Hz you wont affect the signal, but as I alluded to before, we need to consider what the filter does to the phase. We can achieve that by adding only a few more lines to our above code.

signal = randn(1000,1); %Create random time series data dt = 0.5; %dt is the time between samples time = 1:dt:1000 ; %create time base %% Filter the data %Make a simple running average filter %Take the mean of the original data in a window %of the data tempC(i:i+window) %I know there are faster ways of doing this. tempC_filt = zeros(size(signal)); window = 5; for i = 1:length(signal) end_idx = i+window; if end_idx > length(signal) end_idx = length(signal); end tempC_filt(i) = mean(signal(i:end_idx)); end %% Now convert data to frequency domain nfft = 2^nextpow2(length(signal)); %FFTs are faster if you perform them %on power of 2 lengths vectors f = (1/dt)/2*linspace(0,1,nfft/2+1)'; %Create Frequency vector' dft = fft(signal, nfft)/(length(signal)/2); % take the FFT of original dft_filt = fft(tempC_filt, nfft)/(length(tempC_filt)/2); %take FFT of filters mag = abs(dft(1:nfft/2+1)); %Calculate magnitude mag_filt = abs(dft_filt(1:nfft/2+1)); %Calculate magnitude phase = angle(dft(1:nfft/2+1)); phase_filt = angle(dft_filt(1:nfft/2+1)); %% Create an ideal filter frequency response mag_perf = ones(size(mag_filt)); mag_perf(100:end) = 10^-3; subplot(1,2,1) loglog(f, mag_filt./mag, f, mag_perf) xlabel('Frequency (Hz)') ylabel('Signal Out / Signal In') legend('Running Average Filter', 'Perfect Filter') subplot(1,2,2) semilogx(f, unwrap(phase-phase_filt)); xlabel('Frequency (Hz)') ylabel('Phase Shift (Rad)')

Here we can clearly see that the phase of the signal is being affected at frequencies as low as 0.05 Hz. So while the amplitude of these frequency components isn’t being changed, they are being shifted later in time. So this is not a particularly good filter. Let’s move on in our quest to both understand, and create a better filter.

**Filter Poles**

Another way of describing a filter is by the number of “poles” it has. I’m not going to explain to you what a pole is, because it comes from the mathematical analysis of filters, and ultimately, it is irrelevant for understanding what the consequence of the filter is. Simply put a filter with more poles attenuates frequencies more heavily for each Hz you go up. Here a figure will paint a thousand words, so see fig 5. We can create that figure with the below code.

%All our filters will have a corner %frequence (fc) of 1000 Hz fc = 1000; % Corner Frequency of [b2pole,a2pole] = butter(2,2*pi*fc,'s'); %Create a 2 pole butterworth filter [h2pole,freq2] = freqs(b2pole,a2pole,4096); %Calculate Frequency Response [b6pole,a6pole] = butter(6,2*pi*fc,'s'); %Create a 2 pole butterworth filter [h6pole,freq6] = freqs(b6pole,a6pole,4096); %Calculate Frequency Response %Plot frequency response loglog(freq2/(2*pi),abs(h2pole)); hold on loglog(freq6/(2*pi),abs(h6pole)); xlabel('Frequency (Hz)'); ylabel('Magnitude'); legend('2 Pole Butterworth', '6 Pole Butterworth')

As you can see, as we increase the number of poles in our filter the closer it gets to our perfect filter. Or in electronic engineering terms, it has a steeper “roll off”. You might ask “well then why don’t we use 1 million pole filters then?”. Well digitally, we can create very high order filters. However, when it comes to real circuits there are a bunch of reasons why this is difficult, one of which is that it becomes very difficult to find resistors and capacitors of the exact correct value needed. Another reason is that by the time you get up to 8 poles, that is usually good enough. For instance, with an 8 pole Butterworth filter, you attenuate a signal at twice the corner frequency to about 4% of it’s original value. At 10x the corner frequency, the signal is attenuated by about 1 billion fold. But the important thing to note is that even with a filter with a large number of poles it does not remove ALL of the signal above the corner frequency.

**Butterworth?**

%All our filters will have a corner %frequence (fc) of 1000 Hz fc = 1000; % Corner Frequency of n_poles = 8; [zbess, pbess, kbess] = besself(n_poles, 2*pi*fc); %Create 8 pole Bessel [bbess,abess] = zp2tf(zbess,pbess,kbess); %Convert to transfer function [hbess,freqbess] = freqs(bbess,abess,4096); %Calculate Frequency Response [bbut,abut] = butter(n_poles,2*pi*fc,'s'); %Create a 8 butterworth filter [hbut,freqbut] = freqs(bbut,abut,4096); %Calculate Frequency Response [bcheb,acheb] = cheby1(n_poles,3,2*pi*fc,'s'); %Create 8 pole Chebychev cheby1(n,3,2*pi*f,'s'); % with 3dB of ripple [hcheb,freqcheb] = freqs(bcheb,acheb,4096); %Calculate Frequency Response %Plot frequency response figure; loglog(freqbess/(2*pi), abs(hbess)) hold on loglog(freqbut/(2*pi),abs(hbut)); loglog(freqcheb/(2*pi),abs(hcheb)); xlabel('Frequency (Hz)'); ylabel('Magnitude'); legend('Bessel', 'Butterworth', 'Chebychev') xlim([10 10000]) ylim([10^-4 2])However, when comparing the Bessel and the Butterworth, it may seem like the Butterworth is clearly superior: The Bessel filter attenuates signals below the corner frequency just like the Chebyshev! But when you look in the time domain, the reason why physiologists nearly always use a Bessel filter becomes obvious. Other filters producing “ringing” after encountering a step-like response. This is because the other filters delay waves of different frequencies by different times (remember, a step like response is created by a sum of a large number of waves). However, the Bessel filter doesn’t do this. The technical way to describe this is that a Bessel filter has linear phase response. This doesn’t mean it affects the phase of all the frequencies in the pass band equally, but that the way if shifts the phase means that the various frequency components aren’t shifted in time by different amounts (think about the fact that one cycle of a slow wave lasts for longer than a faster wave, so if both waves were shifted by half a wave, then slow wave would be shifted by more time). It turns out the Bessel filter is the best possible way to achieve this.

**Digitizing – The hidden filter**

%% Original, high sample rate signal % Let us imagine this is like our analog signal nfft = 4096; dt = 0.001; % inter sample time = 0.001s = 1kHz sampling t = (1:nfft)*0.001; % time vector % Create signal vector that is the sum of 32 Hz, 64 Hz, and 256 Hz signal = sin(32*2*pi*t) + sin(2*pi*64*t) + sin(2*pi*256*t); f = (1/dt)/2*linspace(0,1,nfft/2+1)'; %Create Frequency vector dft = fft(signal, nfft)/(length(signal)/2); % take the FFT of original mag = abs(dft(1:nfft/2+1)); %Calculate magnitude semilogx(f, mag, 'LineWidth', 2 ); %% Now lets digitize our faux-analog signal % Essentially we are down sampling, % by sampling it every 0.005s = 200 Hz = every 5th sample. signal_dig = signal(1:5:end); nfft_dig = 2^nextpow2(length(signal_dig)); %FFTs are faster if you perform them %on power of 2 lengths vectors f_dig = (1/(dt*5))/2*linspace(0,1,nfft_dig/2+1)'; %Create Frequency vector dft_dig = fft(signal_dig, nfft_dig)/(length(signal_dig)/2); mag_dig = abs(dft_dig(1:nfft_dig/2+1)); hold on semilogx(f_dig, mag_dig, 'LineWidth', 2); xlim([10 1000]) xlabel('Frequency (Hz)'); ylabel('Magnitude'); legend('Original Signal', 'Digitized Signal');In the code, we create a signal that is a sum of 32, 64 and 256 Hz sine waves. We’ll think of this signal as being our analog signal. We then “digitize” it as 200 Hz. We then look at the two signals in the frequency domain. In the original signal we see peaks at 32, 64 and 256 Hz, and after we digitize it at 200 Hz, we see a loss of 256 Hz signal, which perhaps isn’t too surprising when you think about it. But something very disturbing has happened, we now have a new signal with a peak at 56 Hz. This signal is a complete phantom and is called “Aliasing”, and it happened because we need to filter before we digitize (See Fig 9B). This is a correlate of the Nyquist-Shannon sampling theorem, which says that any signal which has no frequency component higher than F, can be perfectly reconstructed if it is sampled at 2F. So, inversely, if we digitize a signal at rate 2F, but it has frequency components higher than F, then the digitized signal we end up with is not a faithful representation of the original. Now let’s try to follow the rules. If we add in these lines of code, then we get Fig 9A.

%% Let's filter the signal before we digitize it [bbut,abut] = butter(8,100/(1000/2)); %Create a 100Hz, 8 pole butterworth filter signal_filt = filter(bbut, abut, signal); %Filter the data signal_filt_dig = signal_filt(1:5:end); dft_filt_dig = fft(signal_filt_dig, nfft_dig)/(length(signal_filt_dig)/2); mag_filt_dig = abs(dft_filt_dig(1:nfft_dig/2+1)); semilogx(f_dig, mag_filt_dig, 'LineWidth', 2);

And this is one of the two reasons why we must ALWAYS low-pass filter data before we digitize it: if we don’t, we will get aliases, i.e. spurious frequency components. This is also what causes strange patterns in some photographs, the original data had higher frequencies (i.e. patterns changing in space) than the camera CCD could sample. The other reason we filter BEFORE was digitize is that high frequency noise can sometimes be larger than the signal we want to record, and hence filtering before digitization allows us to amplify the signal as much as possible without saturating the digitizer.

**Choosing a corner and sampling frequency**

dt = 1/20000; % 20kHz sampling rate t = 0:dt:0.02; % time vector 4ms long signal = exp(-(t-0.01).^2/(0.001/sqrt(2*log(2)))^2); %Gaussian with 1ms FWHM nfft = 2^nextpow2(length(signal)); %FFTs are faster if you perform them %on power of 2 lengths vectors f = (1/dt)/2*linspace(0,1,nfft/2+1)'; %Create Frequency vector dft = fft(signal, nfft)/(length(signal)/2); %take the FFT of original mag = abs(dft(1:nfft/2+1)); %Calculate magnitude subplot(1,2,1); semilogx(f, mag, 'LineWidth', 2); title('Freq Components of Original') xlabel('Frequency (Hz)'); ylabel('Magnitude'); xlim([30 10000]); xticks([100, 1000, 10000]); [bbess, abess] = besself(8, 2*pi*1000); %Create 8 pole 1kHz filter [num,den] = bilinear(bbess,abess,1/dt); signalbess1k = filter(num, den, signal); %Apply Filter [bbess, abess] = besself(8, 2*pi*5000); %Create 8-pole 5kHz filter [num,den] = bilinear(bbess,abess,1/dt); signalbess5k = filter(num, den, signal); %Apply Filter subplot(1,2,2); plot(t*1000,signal, t*1000, signalbess1k, t*1000, signalbess5k, 'LineWidth', 2); xlabel('Time (ms)') ylabel('Amplitude') legend('Original Signal', 'Filtered Signal (1kHz)', 'Filtered Signal (5kHz)', 'Location', 'south');

I should probably add that these rules of thumb are not absolute (apart from digitizing at greater than twice your filter rate: always do that!). If your filter slightly deforms the shape of your action potential, or slightly delays it in time, this may not matter for your particular experiments. But you should still be aware of the fact that if you’re filtering too close to your signal of interest, you are not recording the “true” signal.

**Conclusion/Rules of Thumb**

- A filter removes unwanted frequeny components
- A low-pass filter removes high frequency signals
- A filter with more poles has a steeper roll off
- A Bessel filter has the best behaviour in the time domain
- For a 4-pole Bessel, you must filter at at least 4 times the frequency of the fastest wanted component of your signal
- For an 8-pole Bessel, you must filter at at least 5 times the frequency of the fastest wanted component of your signal
- You must digitize at more than 2-3 times the frequency of your low-pass filter

Stimulus Isolators can be split up it two ways: the first way is battery powered versus mains powered, and the second way is constant current versus constant voltage. For the discussion today, it makes no difference. In both constant voltage and constant current stimulators, a voltage drives current between the two output terminals of the stimulator. And mains powered stimulus isolators try their very best to approximate battery powered stimulus isolators by running the AC voltages from the mains through some kind of transformer, which effectively “isolates” the stimulus isolator from earth. What do I mean what I say “isolates”? I mean that if a power source is isolated from earth, then the output of that power source cannot be measured relative to earth, as no current wants to flow from the power output to earth. Still confused? Let’s look at Fig. 1.

Here we have a 9 volt battery, which truly has 9 volts between it’s positive and negative terminals. We get a multimeter and put it’s red lead on the positive terminal, and connect the black lead to earth. What voltage do we see on the multimeter? We see zero volts. We can explain this in lots of ways, but the simplest is to remember that current must flow in a circuit, in order for the multimeter to sense the voltage on the battery, electrons must flow from the battery to the earth, which if it did, it would not be able to get back to the negative terminal of the battery. Another way of thinking about this is that the multimeter is asking a question something like “What is the a height difference between the peak of Mount Everest (on Earth) and the peak of Mount Olympus (on Mars)?”. Yes, you could measure the height of each mountain from sea level (equivalent to measuring the voltage difference between ground and ground (0V) and the voltage difference between then positive and negative terminals of the battery (9V)) and then compare that measurement, but that isn’t what the question was. Directly asking what the the height difference between the peaks doesn’t even really make sense as a question because they share no common reference point. Similarly, asking what voltage exists between a battery terminal and the earth makes no sense unless the battery connected to earth, like it is in Fig 2. Now current can happily flow from the positive terminal of the battery, into the multimeter, into ground and then back into the battery. Or to put it another way, now the battery and the multimeter have a common reference point: the ground.So what does this have to do with stimulus isolators? Well if we replace the battery with a stimulus isolator, and the multimeter with our electrode connected to a preamplifier, we have a typical electrophysiology set up. If our stimulator is not isolated from ground, then the voltage it creates will be directly sensed by our preamplifier. How does this work? Well have a look at Fig 3.

Fig 3. shows us a non-isolated stimulator. I’ve drawn it’s power supply as a battery, but it could as well be a voltage output from a DAC or any other non-isolated voltage. Our recording electrode is sampling the voltage at V_{Record}, which is the same place as the two electrodes of our bipolar stimulating electrode (yes, there will be some resistance between all of these, but they don’t really matter in order to get a conceptual understanding).

Now you may be saying “Hang on there, I’ve used a stimulus isolator before, and I still see a stimulus artifact”, and of course, you do. This is because while a good stimulus isolator is essentially completely isolated from ground, resistively speaking, it still has some capacitive coupling to ground, usually on the order of 1-10’s of pF. So this is like putting a capacitor between the battery and ground in Fig 3. What this means is that rapidly changing voltages caused by your stimulator still allow some current to flow through R_{Ground}, creating a small stimulus artifact, but the smaller this capacitive coupling is, this smaller and briefer the artifact will be.

So what does this mean? It means don’t ground either of the output pins of your stimulator or you will undo all the engineering that has gone into the design of your stimulator! It means don’t let the leads which run to your stimulator run all over your rig (which increases the capacitive coupling to ground). While I’ve found wrapping your stimulator leads in a grounded shield (or using coaxial cables with a grounded shield) is a good way to remove high frequency noise introduced by the stimulator (or more the leads attached to the stimulator acting like antennas), it also can increase the capacitive coupling to ground, so you get a bigger artifact.

There is another benefit of using stimulus isolators: safety. Specifically, if you were to hold onto the positive output of a non-isolated 100V stimulator, it wouldn’t be fun, as current could flow through you, into ground, and from ground, back to the stimulator. However, if this stimulator was isolated this couldn’t happen, as there is no return path from ground to the stimulator. It’s basically the inverse of why birds can sit on overhead cables carrying mains electricity: the bird is isolated from ground, while the power isn’t, in our case however, it is the power which is isolated from ground, while you aren’t. In practice, if your are an in vitro scientist, this is a pretty theoretically benefit, but if you are using stimulators in a clinical setting to excite nerves or muscles, this is very important, as the last thing you want is current to flow through your patients heart to ground: you only want the current to flow between the two poles of your stimulating electrodes and into the muscle/nerve of choice.

]]>The main script is available here, and it requires distinguishable_colors.m (which in turn requires the image processing toolbox I believe).

The code is relatively well documented/commented, and there is even a ‘Help!’ button. If anyone has any problems with it, please let me know.

]]>

The above says that we have ion channels, and they change to ion channels with the rate constant β. The proportion of open ion channels is n, and so the proportion of closed channels must be 1-n (because all ion channels are either closed or open, and the total proportions must add up to 1). Now this reaction scheme above can be translated into the following equation:

If we remember from the last post, this means “The rate at which n changes over time is equal to -β times n”. So the more n there is, or the larger β is, the faster n decays. Why does that reaction scheme translate to that equation? Because that is what that reaction scheme means: they are just two ways of writing the same thing. Now what if we had the opposite reaction scheme, like this:

Now we are saying that the ion channels change to ion channels, with the rate constant α. Similarly, this translates to the following equation:

Which means that the rate of change of n over time (the rate at which the ion channels open) is equal to α times 1-n, or the proportion of closed channels. Hopefully you can see the pattern of how the reaction scheme turns into an equation. And hopefully, it is also making sense: in the first equation, ion channels could only go from to , so what was on the right hand side of the equation was always negative, as the amount of ion channels (i.e. n) could only decrease (because reaction rates are always positive). Likewise, in the second reaction scheme, the ion channels could only go from to , so what was on the right hand side of the equation was always positive, meaning the amount of channels was always increasing.

The last reaction scheme we need to consider is when the reaction is bidirectional, that is, ion channels can go from to and also from to . That is going to look like this:

This is basically the previous two reaction schemes added together, and so the equation that explains it basically is the previous two equations added together:

If you’re still with me, we’ve done the the hard conceptual work. We just have one more problem to overcome: our rate constants α and β are not actually constant, they vary with voltage. If you’ve ever looked at a current generated during a voltage clamp family, this should make sense to you. The rate at which the current turns on depends on the voltage. Thankfully, the equations for α and β are pretty simple. For the HH potassium channel, α and β are given by

So with all that together, we can calculate what n is. But how do we go from n to a ionic current? Well I said earlier that n was the proportion of open channels. This was a lie. You should think of n as probability that a “particle” in potassium channel is in the open position. In order for the channel to conduct, you need all four of these particles to be in the open position. And just like how if the probability of getting heads in a coin flip is 0.5, then the probability of getting two heads in a row is 0.5 * 0.5, then the probability of having all four n particles in the open position is n * n * n * n or n^{4}. This then tells us the proportion of potassium channels that are open, so we multiply n^{4} by the maximum possible potassium conductance, usually referred to as or “g bar”, which gives us the actual potassium conductance.

So with all of that knowledge under our belt, let’s implement this (just in python this time).

import matplotlib.pyplot as plt import numpy as np import math #Functions to calculate the alpha and beta rates #Voltage converted to mV #Reaction rates converted to per second (Rather than per millisecond) def alpha_n(v): v = v*1000 return 0.01 * (-v-55)/( math.exp((-v-55)/10.0) -1) * 1000 def beta_n(v): v = v*1000 return 0.125*math.exp((-v-65)/80.0) * 1000 # Basic simulation Properties dt = 10E-6; # 10 us timestep # Basic Cell Properties Cm = 100E-12; # Membrane Capacitance = 100 pF v_init = -70E-3; # Initial membrane potential -70 mV #HH Potassium Properties n = alpha_n(v_init)/(alpha_n(v_init)+beta_n(v_init)) # Initial condition for n Gbar_k = 1E-6 # Maximum potassium conductance Gleak = 5E-9 # 5 nS conductance Ek = -80E-3 # Reversal for HH potassium current Eleak = -70E-3 # Reversal potential of -70 mV # Injected Current step current_magnitude = 200E-12; # 200 pA #Injected current, 0.2 seconds of 0 current, 0.3 seconds of some current, #and 0.5 seconds of no current i_inj = np.concatenate( (np.zeros([round(0.2/dt),1]), current_magnitude*np.ones([round(0.3/dt), 1]), np.zeros([round(0.5/dt), 1])) ) #Preallocate the voltage output v_out = np.zeros(np.size(i_inj)) #The real computational meat for t in range(np.size(v_out)): if t == 1: v_out[t] = v_init; #At the first time step, set voltage to the initial condition else: #Calculate how the n particle changes dn = (alpha_n(v_out[t-1]) * (1 - n) - beta_n(v_out[t-1]) * n) * dt n = n + dn #Calculate the current potassium channel conductance Gk = Gbar_k*n*n*n*n; #Calculate what the current through the HH potassium channel is i_k = Gk * (v_out[t-1] - Ek) #Calculate the current through ion channels i_leak = Gleak * (v_out[t-1] - Eleak) i_cap = i_inj[t] - i_leak - i_k #Calculate what i is dv = (i_cap/Cm) * dt #Calculate dv, using our favourite equation v_out[t] = v_out[t-1] + dv #add dv on to our last known voltage #Make the graph t_vec = np.linspace(0, dt*np.size(v_out), np.size(v_out)) plt.plot(t_vec, v_out) plt.xlabel('Time (s)') plt.ylabel('Membrane Potential (V)') plt.show()

Now that doesn’t look very exciting, which shouldn’t be a surprise, as we haven’t inserted any sodium channels yet. However, there are a couple things that we should go over first. On lines 8 through 14 I have two functions that calculate the alpha and beta rates. These are typically written such that mV are the units of voltage, and ms are the units of time (meaning that the reaction rates have a unit of ms^{-1}). I’m trying to keep everything in seconds, volts, amps etc… so I simply multiply the voltage by 1000 to get millivolts, and multiple the result by 1000 to get units of s^{-1}. The other thing that comes out of left field is line 24, where I set the initial condition for n. I’ve said before that when we were calculating we needed to be told what the initial conditions are, as we’re only going to add or subtract from that initial value. It’s just the same here: we need to know what n is to begin with. This equation calculates what n would be if held at the initial voltage for an infinitely long period of time, and so is a pretty good place to start. In the final post, I’m going to explain why that is, but for know, just believe me.

At line 51, we calculate what is, by following the equation I’ve already laid out, and then add it on to n. On line 56 we calculate what the current potassium conductance is, but multiplying the maximum possible potassium conductance by n^{4}. Then we just follow the same equations as last time to calculate what the current is that is charging the capacitor, and then how that changes the membrane potential.

Okay, the *pièce de résistance* is coming: the sodium channel. Unfortunately, the sodium channel has *2* gating particles, m and h. The m particle is called the activation particle and opens when the cell is depolarized. The h particle is called the inactivation particle, and “closes” (it’s value approaches 0) when the cell is depolarized. However, when the cell is depolarized, the m particle opens faster than the h particle closes, giving you the transient nature of the sodium current. That said, the mathematics and code of these particles are essentially identical to the n particle of the potassium channel. We have alpha and beta rate constants for each particle, and they are controlled by identical functions as the n particle, just with different constants. Then the final sodium conductance is the maximum sodium conductance () multipled by m^{3}h. So when you include them, the code looks like this:

import matplotlib.pyplot as plt import numpy as np import math #Functions to calculate the alpha and beta rates def alpha_n(v): v = v*1000 return 0.01 * (-v-55)/( math.exp((-v-55)/10.0) -1) * 1000 def beta_n(v): v = v*1000 return 0.125*math.exp((-v-65)/80.0) * 1000 def alpha_m(v): v = v*1000 return 0.1 * (-v-40)/( math.exp((-v-40)/10.0) -1) * 1000 def beta_m(v): v = v*1000 return 4*math.exp((-v-65)/18.0) * 1000 def alpha_h(v): v = v*1000 return 0.07*math.exp((-v-65)/20.0) * 1000 def beta_h(v): v = v*1000 return 1/( math.exp((-v-35)/10.0) +1) * 1000 # Basic simulation Properties dt = 10E-6; # 10 us timestep # Basic Cell Properties Cm = 100E-12; # Membrane Capacitance = 100 pF v_init = -80E-3; # Initial membrane potential -80 mV ##HH Potassium Properties # Initial condition for particles n = alpha_n(v_init)/(alpha_n(v_init)+beta_n(v_init)) m = alpha_m(v_init)/(alpha_m(v_init)+beta_m(v_init)) h = alpha_h(v_init)/(alpha_h(v_init)+beta_h(v_init)) Gbar_k = 1E-6 # Maximum potassium conductance Gbar_na= 7E-6 # Maximum sodium conductance Gleak = 5E-9 # 5 nS conductance Ek = -80E-3 # Reversal for HH potassium current Ena = 40E-3 # Reversal for HH sodium current Eleak = -70E-3 # Reversal potential of -60 mV # Injected Current step current_magnitude = 200E-12; # 200 pA #Injected current, 0.2 seconds of 0 current, 0.3 seconds of some current, #and 0.5 seconds of no current i_inj = np.concatenate( (np.zeros([round(0.2/dt),1]), current_magnitude*np.ones([round(0.3/dt), 1]), np.zeros([round(0.5/dt), 1])) ) #Preallocate the voltage output v_out = np.zeros(np.size(i_inj)) #The real computational meat for t in range(np.size(v_out)): if t == 1: v_out[t] = v_init; #At the first time step, set voltage to the initial condition else: #Calculate how the particles change dn = (alpha_n(v_out[t-1]) * (1 - n) - beta_n(v_out[t-1]) * n) * dt n = n + dn dm = (alpha_m(v_out[t-1]) * (1 - m) - beta_m(v_out[t-1]) * m) * dt m = m + dm dh = (alpha_h(v_out[t-1]) * (1 - h) - beta_h(v_out[t-1]) * h) * dt h = h + dh #Calculate the potassium channel conductance Gk = Gbar_k*n*n*n*n; #Calculate what the current through the potassium channel is i_k = Gk * (v_out[t-1] - Ek) #Calculate the sodium channel conductance Gna = Gbar_na*m*m*m*h #Calculate what the current through the sodium channel is i_na = Gna * (v_out[t-1] - Ena) #Calculate the current through ion channels i_leak = Gleak * (v_out[t-1] - Eleak) #Sum all the currents i_cap = i_inj[t] - i_leak - i_k - i_na #Calculate dv, using our favourite equation dv = (i_cap/Cm) * dt #add dv on to our last known voltage v_out[t] = v_out[t-1] + dv #Make the graph t_vec = np.linspace(0, dt*np.size(v_out), np.size(v_out)) plt.plot(t_vec, v_out) plt.xlabel('Time (s)') plt.ylabel('Membrane Potential (V)') plt.show()

And there we have it! A spiking neuron. If you read through that code, and compare it to when we just had the potassium channel, you will see that things look very similar. We are basically performing the exact same computations with slightly different alpha and beta functions. This idea extends any ion channel. Some require slightly different functions, and some may have more than one state (i.e. not just open or closed, which can get pretty complex), but the same fundamental rules apply: you set up a reaction scheme, you write the equivalent differential equation, and then you convert that to code.

In the final post, we’ll discuss a few little details I’ve skipped over, and talk about how to do this for real (spoiler: you don’t do it this way!).

]]>

Before I get started, I just want to say, that if you know what a Jacobian Matrix is, or what Runge–Kutta methods are, this post is not for you. This is neuronal modelling for people at the back of the class.

As I said above, the most fundamental equation in all of this is this:

Which is saying “The rate at which the membrane potential changes is equal to the current flowing, divided by the capacitance of the cell”. But to make it even more clear, let’s look into the left part of the equation: which may already be scaring you. I want you to treat this as just a normal fraction, just like where the symbol just means “a little bit of” so means “a little bit of v divided by a little bit of t”, where v is voltage and t is time. So what we’re saying is, in a little bit of time, how much does v change? Because we’re treating as a simple fraction, we can just multiply each side by and get the equation

So what we are saying is: “The amount that the membrane potential changes (in a small amount of time) is equal to the current flowing, divided by the capacitance of the cell multiplied by that small amount of time. And THAT is basically what all neuronal models perform: for each compartment of the cell, it calculates what i is, then performs that computation. What you might notice is that we’re only talking about how the membrane potential **changes**, not what it actually **is**. And because of that, we need to choose where the membrane potential begins or what is called “the initial conditions”.

With that knowledge in mind, let’s convert this mathematics to code and get started with becoming computational neuroscientists.

**Matlab**

% Basic simulation Properties dt = 10E-6; % 10 us timestep % Basic Cell Properties Cm = 100E-12; % Membrane Capacitance = 100 pF v_init = -70E-3; % Initial membrane potential -70 mV % Injected Current step current_magnitude = 100E-12; % 100 pA %Injected current, 0.2 seconds of 0 current, 0.3 seconds of some current, %and 0.5 seconds of no current i_inj = [zeros(round(0.2/dt),1); current_magnitude*ones(round(0.3/dt), 1); zeros(round(0.5/dt), 1)]; %Preallocate the voltage output v_out = zeros(size(i_inj)); %The real computational meat for t = 1:length(v_out) if t == 1 v_out(t) = v_init; %At the first time step, set voltage to the initial condition else i_cap = i_inj(t); %Calculate what i is dv = i_cap/Cm * dt; %Calculate dv, using our favourite equation v_out(t) = v_out(t-1) + dv; %add dv on to our last known voltage end end %Make the graph t_vec = dt*[1:length(v_out)]; plot(t_vec,v_out); xlabel('Time (s)') ylabel('Membrane Potential (V)')

**Python**

import matplotlib.pyplot as plt import numpy as np # Basic simulation Properties dt = 10E-6; # 10 us timestep # Basic Cell Properties Cm = 100E-12; # Membrane Capacitance = 100 pF v_init = -70E-3; # Initial membrane potential -70 mV # Injected Current step current_magnitude = 100E-12; # 100 pA #Injected current, 0.2 seconds of 0 current, 0.3 seconds of some current, #and 0.5 seconds of no current i_inj = np.concatenate( (np.zeros([round(0.2/dt),1]), current_magnitude*np.ones([round(0.3/dt), 1]), np.zeros([round(0.5/dt), 1])) ) #Preallocate the voltage output v_out = np.zeros(np.size(i_inj)) #The real computational meat for t in range(np.size(v_out)): if t == 0: v_out[t] = v_init; #At the first time step, set voltage to the initial condition else: i_cap = i_inj[t]; #Calculate what i is dv = i_cap/Cm * dt; #Calculate dv, using our favourite equation v_out[t] = v_out[t-1] + dv; #add dv on to our last known voltage #Make the graph t_vec = np.linspace(0, 1, np.size(v_out)) plt.plot(t_vec, v_out) plt.xlabel('Time (s)') plt.ylabel('Membrane Potential (V)') plt.show()

Despite the length of that code, all the import stuff is happening in the for loop. At the first time point we set the voltage equal to the initial voltage, and from then on at each time point we calculate the new voltage by adding something (dv) onto the previous voltage. Now you may be looking at the voltage trace and thinking that it doesn’t look very neuronal. Well that is because we have no ion channels. In order to include ion channels, we need to know how they behave. The standard model for an ion channel starts with this simple formula:

Where we are saying the ionic current for an ion channel X (), is equal to the conductance of the ion channel (), multiplied by the driving force, where the driving force is the membrane potential () minus the reversal potential of that ion channel (). To include this into our model, we need to think very carefully about the direction of that current. That is to say, when is positive, is that current going to make the membrane potential more positive, or more negative? Well let’s think about it, let’s imagine an EPSP: if Vm is -70 mV and the reversal potential is 0 mV, and lets say the conductance is 1 nS. So the current will be -70 nA. Now we know an EPSP must depolarize a cell, but if we plug a negative current into our favourite equation we would get a negative , meaning that the current would hyperpolarize the cell. This is due to the fact that calculates the current through the ion channel, which is equal and opposite to the current going through the membrane capacitor.

So with that understanding of how ion channels behave, let’s include a leak potassium current (i.e. the conductance doesn’t vary over time or voltage).

**Python** Important additions on lines 30 and 31

import matplotlib.pyplot as plt import numpy as np # Basic simulation Properties dt = 10E-6; # 10 us timestep # Basic Cell Properties Cm = 100E-12; # Membrane Capacitance = 100 pF v_init = -70E-3; # Initial membrane potential -70 mV Gk = 5E-9 # 5 nS conductance Ek = -90E-3 # Reversal potential of -90 mV # Injected Current step current_magnitude = 100E-12; # 100 pA #Injected current, 0.2 seconds of 0 current, 0.3 seconds of some current, #and 0.5 seconds of no current i_inj = np.concatenate( (np.zeros([round(0.2/dt),1]), current_magnitude*np.ones([round(0.3/dt), 1]), np.zeros([round(0.5/dt), 1])) ) #Preallocate the voltage output v_out = np.zeros(np.size(i_inj)) #The real computational meat for t in range(np.size(v_out)): if t == 1: v_out[t] = v_init; #At the first time step, set voltage to the initial condition else: i_ion = Gk * (v_out[t-1] - Ek) #Calculate the current through ion channels i_cap = i_inj[t] - i_ion; #Calculate what i is dv = i_cap/Cm * dt; #Calculate dv, using our favourite equation v_out[t] = v_out[t-1] + dv; #add dv on to our last known voltage #Make the graph t_vec = np.linspace(0, 1, np.size(v_out)) plt.plot(t_vec, v_out) plt.xlabel('Time (s)') plt.ylabel('Membrane Potential (V)') plt.show()

**Matlab** Important additions on lines 25 and 26

% Basic simulation Properties dt = 10E-6; % 10 us timestep % Basic Cell Properties Cm = 100E-12; % Membrane Capacitance = 100 pF v_init = -70E-3; % Initial membrane potential -70 mV Gk = 5E-9; % 5 nS conductance Ek = -90E-3; % Reversal potential of -90 mV % Injected Current step current_magnitude = 100E-12; % 100 pA %Injected current, 0.2 seconds of 0 current, 0.3 seconds of some current, %and 0.5 seconds of no current i_inj = [zeros(round(0.2/dt),1); current_magnitude*ones(round(0.3/dt), 1); zeros(round(0.5/dt), 1)]; %Preallocate the voltage output v_out = zeros(size(i_inj)); %The real computational meat for t = 1:length(v_out) if t == 1 v_out(t) = v_init; %At the first time step, set voltage to the initial condition else i_ion = Gk * (v_out(t-1) - Ek); %Calculate the current through ion channels i_cap = i_inj(t) - i_ion; %Calculate what i is dv = i_cap/Cm * dt; %Calculate dv, using our favourite equation v_out(t) = v_out(t-1) + dv; %add dv on to our last known voltage end end %Make the graph t_vec = dt*[1:length(v_out)]; plot(t_vec,v_out); xlabel('Time (s)') ylabel('Membrane Potential (V)')

Now *that* looks more like a neuron. A pretty dull neuron, with no action potentials, but at least we’re getting there. And how do we put in action potentials? Well we have to go and talk to two people: Hodgkin and Huxley. And we’re going to talk to them in part 2.

Typically, when most neuroscientists want to make an electrical model of a cell, (and prepare yourself, as there are going to be quite a few of those) they do something like this:

Where you have a conductance of a give ion channel X ( ) that has a reversal potential ( ) and these produce currents that charge up the membrane capacitor ( ). We literally HAVE to make the model more complex to even generate a field potential, but before we do, we’re going to simplify matters. We know those conductances create a current, which is explained with the equation: . The different currents add up linearly, giving us a total ionic current. So let’s just scrap all of these reversal potentials and conductances, and treat them as a current source. (Personally, I think this is a better way to get an intuitive understanding of membrane biophysics anyway. Membrane resistors cause people lots of confusion). This gives us a simpler looking model. This is a very good way to think about cells. But it is a single “compartment”. That is to say our entire cell always has the exact same membrane potential throughout space. This is, of course, not accurate for most neurons, and more importantly, if a cell truly behaved like this, we would not get field potentials. In order to get field potentials, we need to add a second compartment. Now don’t get scared! This is just two single compartment models, where the intracellular side is joined by a single axial resistor . The outside of each compartment is connected to ground by a resistor and . And finally, the outside of each compartment is connected via an extracellular resistor .Let’s imagine glutamate is released onto the dendrite, which causes AMPA channels to open. This causes an increase in which begins to charge the local capacitor . This causes the local voltage in the dendrite to increase (i.e. we have an EPSP). Now there is a difference in voltage between and and this causes current to flow down the dendrite to the soma (i.e. grows from zero). Now we need to consider a fundamental rule of electricity: Any current that flows into something must always flow out of it too. What this means is the axial current is equal to the sum of the currents and the current that flows through the capacitor . So we know their *must* be current flowing into the part of the circuit called and because of that, the voltage at must be growing. If is non-zero, then the current *must* also be non-zero. And there is our field potential! We need to complete matters though: We started with the idea that an EPSP was occurring in the dendrite, due to an increase in . That current must have come from somewhere, and it is equal to the sum of , and the opposite of the current flowing through (if that confuses you, just consider all the currents flowing into the the part marked and know that they must add to zero). Of course, as soon as is non zero, then we must have a voltage forming over and now we have a field potential occurring at .

Now that may have been a little complex. I urge you to draw the circuit yourself, and follow how the current and voltages form, however, we can make it even simpler. Let’s just explain it with words. If an EPSP forms locally, current must flow from where the membrane potential is more depolarized, to the rest of the cell, that is we get axial current. Because of the rule that the current flowing into a compartment must be equal to the current flowing out of it, when current flows into the second compartment, it must flow out of it. In a two compartment model, this means all axial current flows out into the extracellular space. Some of this current must flow back to the extracellular space where the EPSP formed initially, and some must flow to ground. The current that flowed to ground creates a positive field potential outside the second compartment. Because some of the current flowed to ground, then the current that flowed back to the extracellular space outside of where the EPSP formed is *not* equal to what flowed into the cell to begin with, hence some current must flow from ground to that site, to make up the difference. This creates a negative voltage, relative to ground. This concept, expanded across multiple compartments is what gives rise to “current sources” and “current sinks”, and is fully explored in “current source density experiments”. The current sink is where positive charge moves into the cell (e.g. the site of an EPSP) and a current source is where the positive charge moves out of the cell.

Below I’m simulating the field potential generated in the above two compartment model when an 5 nS EPSP impacts the dendrite. Adjust the properties of the cell to get a feel about what is happening inside and outside the cell. (Forgive the scripts slowness, each graph requires about 40,000 computations)

Now what you should notice is that no matter how you change the parameters of the simulation, the local field potential at the soma is equal and opposite to the field potential at the dendrite. This is a property of all two compartment simulations where the resistors to ground are equal. Exactly why that is might not be immediately apparent, but if we do simplify the circuit, we can get a feel for why that is. (Don’t get concerned about the direction that the arrows are pointing to show current flow. These are not meant to suggest the direction of positive charge movement. We just need to define current in some direction!)

Now we can see what is going on: Any current that flows into one compartment () *must* come out of the other (i.e. ). Now, the charges that make up must come from somewhere, and the current must go some where, so will be non zero. This will cause a voltage drop over meaning that the voltage must be greater than zero. This means current will flow between ground and (i.e. is not zero). Now is the sum of and , and we know is equal to , then this tells us that must be equal and opposite to (i.e. the difference between and must go somewhere, and it goes to ground). Again, go through it yourself and you’ll see that this must be the case.

Now, this is only a two compartment model, so it will never accurately represent the complex field potentials seen in real neural tissue. But it DOES give you the fundamental insight: all current that flows into a cell must flow out of the cell at some point in space. This current then flows back to where the current moved into the cell. This extracellular current flow gives rise to the local field potential. These same ideas explain why the local field potential and EEG usually go in opposite directions and even why the signals of a multi-lead ECG look different from lead to lead.

]]>

The electronics are dead easy: You buy an MCP9808 breakout board, and you connect it to a Raspberry Pi. The MCP9808 is a high accuracy thermometer that communicates via the I2C protocol, something that the Raspberry Pi supports natively. You solder wires to the Vdd, GND, SCL, SDA pins of the breakout board, and connect them to pins 1 (3.3 Volts), pin 9, pin 5 and pin 3, respectively, making sure that the wires are long enough to reach inside the fridge from where you can mount the Raspberry Pi. It’s also probably a good idea to coat the breakout board in silicon sealant.

In order to setup the Raspberry Pi, you first need to turn on SSH, so you can remotely access the Pi once it is in place, and turn on the I2C interface. In order to do that, turn your Pi on, log in, and run sudo raspi-config. It should be pretty obvious what to do: go down to advanced options, and SSH and I2C are both there.

Now you need to install python support for the MCP9808. In the terminal, type

sudo apt-get update sudo apt-get install build-essential python-dev python-pip python-smbus git sudo pip install RPi.GPIO cd ~ git clone https://github.com/adafruit/Adafruit_Python_MCP9808.git cd Adafruit_Python_MCP9808 sudo python setup.py install

With that done, confirm that it is all working with:

cd examples sudo python simpletest.py

Where you should see the temperature at the sensor printing out. Press Ctrl-C to quit.

Now I suggest you set up a gmail address specifically for your freezer. It’s not hard. Also, we’re going to save the gmail password in a public folder, so you’re password will be wide open. So I REALLY suggest you set up a gmail account for the freezer. You also need to go to log into the account and turn on access for less secure apps (I think).

Get back to your home directory, and open a new file called read_temp.py

cd ~ sudo nano read_temp.py

And put in the following code, while changing the relevant lines about who you want to send the emails to (the variable emails, on line 20), and the details of your new, freezer specific email address (line 29 to 38).

#!/usr/bin/python import time import Adafruit_MCP9808.MCP9808 as MCP9808 import smtplib from collections import deque from email.mime.text import MIMEText #print '**************************************' #print '-----EXECUTING TEMP SENSOR SCRIPT-----' #print '**************************************' #Connect to i2c sensor with default pins (SCL=P9_19, SDA = P9_20) sensor = MCP9808.MCP9808() #Initialize sensor sensor.begin() emails = 'email1@gmail.com,email2@gmail.com,email3@gmail.com' #comma separated values temp_buffer = deque(maxlen=4) #4 position buffer for the last four reads def send_mail(address, string): #This should really be in a try statement text = string msg = MIMEText(text) me = 'YOUREMAILADDRESS@gmail.com' them = address msg['Subject'] = 'Temperature Warning' msg['From'] = me msg['To'] = them mailserver = smtplib.SMTP("smtp.gmail.com", 587) mailserver.ehlo() mailserver.starttls() mailserver.ehlo() mailserver.login("YOUREMAILADDRESS@gmail.com", "PASSWORD") mailserver.sendmail(me, msg['To'].split(','), msg.as_string()) mailserver.close() #print 'Email Sent' def mean(l): #The worlds silliest function return sum(l)/len(l) def buf_to_str(buf): string = "" for el in buf: string = string + str(el) + ", " return string def check_temp(): temp = sensor.readTempC() temp_buffer.append(temp) if mean(temp_buffer) > -10: string = "The temperature is currently " + str(temp) + ". The last four readings have been " + buf_to_str(temp_buffer) + ". The time is " + time.strftime('%H:%M:%S') send_mail(emails, string) time.sleep(60) while True: check_temp()

Press Ctrl-X to exit, and confirm the save. Confirm that works properly with sudo python read_temp.py.

Now we need to set it up, so this runs on startup. We should set up a bash script to execute this script, so in the terminal, type sudo nano launcher.sh. Then type in the following:

#!/bin/sh sleep 10 sudo python /home/pi/read_temp.py

We need to set that file to be executable, so type sudo chmod 777 launcher.sh. Finally, we need to launch that file on startup. So enter sudo nano /etc/rc.local and at the end of that file (but before the exit 0 line), type:

/home/pi/launcher.sh &

Making sure that the final line of that file is exit 0. Press Ctrl-X to exit, and then save.

And that should be that. Mount your Pi somewhere around the freezer, get the wires in through the door seals without distrupting them too much, and hang the sensor somewhere senisble, so it doesn’t get in the way too much. Make sure your Pi is connected to the internet, and you should get emails whenever the freezer gets too warm.

If you give this a try, and it’s not working, I’m happy to try to help, but I’m no Unix expert, so I can’t promise I’ll be any use.

]]>

The diagram to the right shows the fundamental form of the voltage divider. We have a supplied voltage (Vin) which is applied across two resistors, then we access the voltage just across one of the resistors (Vout). Vin has now been split (divided) across these two resistors. Why do we do this? Well in circuit design, this might be because some part of out circuit might need a lower voltage to work, and it can be more sensible to make a voltage divider, rather than put in another battery (or voltage regulator) to produce this voltage. What is the value of Vout? Well some very simple math should tell us.

The total resistance of the circuit is just R1 + R2 (resistors in series add). So the current that flows through R1 and R2 can be calculate by Ohms law:

The voltage over R2 = Vout, and again, by Ohms law

So you can either remember that formula, or work it out on the fly. What you should see is that as R2 becomes large relative to R1, then Vout = Vin and as R2 becomes small relative to R1, Vout approaches 0. But perhaps a demonstration might help give you a more intuitive feel, so have a play around with this.

So hopefully you’ve got an idea of how voltage dividers work: If R2 has a low resistance then Vout approaches zero, and if R2 has a high resistance, Vout approaches Vin. So how can we, as electrophysiologists use this? Well why don’t we make Vin some signal we want to record but it’s mixed with some noise we don’t, and wouldn’t it be great if we had a special resistor that would have a low resistance when signals we didn’t want were coming through, and a high resistance at all other times. Let’s say we don’t want high frequency signals; only low frequency signals. So we need a special resistor that has a high resistance to low frequencies, and a low resistance to high frequencies. Well we do have such a ‘resistor’, and it’s called a capacitor! When the voltage across a capacitor is changing rapidly, it draws lots of current (i.e. it behaves like a low resistance) while when the voltage is changing very slowly, it draws almost no current (high resistance). So if we replace R2 with a capacitor we create a low-pass filter. By choosing the values of R1 and the capacitor, we can decide the cut off frequency (Fc) which is roughly the point where the signals start being significantly attenuated, with the equation

I’ve made an interactive demo, to try to give you a feel for how it works.

By swapping the positions of resistor and the capacitor, you can make a high-pass filter, and theoretically, you can also use an inductor (which presents a high resistance to rapidly changing voltages), though you probably wont do this in practice. Furthermore, in practice, filters are made out of many more repetitions of this kind of circuit, and with integrated circuits called op-amps, to make them perform better, but fundamentally they behave in the same fashion: they are frequency dependent voltage dividers. As an electrophysiologist, a lot of the limitations of your favorite recording technique can understood by looking for a voltage divider. You can consider how voltages spread from the soma to the dendrite by using the voltage divider as a metaphor. Indeed, I doubt there is a bioelectrical signal that you can’t get some insight into by looking at it through the lens of the voltage divider.

]]>