Don’t be afraid of parallel programming in Python

When it comes to writing code, I have always been a believer of the rules of optimization, which state:

The first rule of optimization is: Don’t do it.

The second rule of optimization (for experts only) is: Don’t do it yet.

These rules exist because, in general, if you try to rewrite your code to get a speed up, you will probably waste a lot of time and end up with code that is unreadable, fragile and that only runs a few milliseconds faster. This is especially true in scientific computing, where we are writing in high level languages, which use highly optimized libraries to perform computationally intensive tasks.

However, there are times when those rules of optimization can be broken. And there is a super simple way of leveraging parallel programming in Python that can give you a >10x speed up.

Continue reading

Principal Component Analysis (PCA) For Dummies

PCA is one of the first tools you’ll put in your data science tool box. Typically, you’ve collected so much data, that you don’t even know where to begin, so you perform PCA to reduce the data down to two dimensions, and then you can plot it, and look for structure. There are plenty of other reasons to perform PCA, but that’s a common one.

However, when you ask how PCA works, you either get a simple graphical explanation, or long webpages that boil down to the statement “The PCAs of your data are the eigenvectors of your data’s covariance matrix”. If you understood what that meant, you wouldn’t be looking up how PCA works! I want to try to provide a gateway between the graphical intuition and that statement about covariance and eigenvectors. Hopefully this will give you 1) a deeper understanding of linear algebra, 2) a nice intuition about what the covariance matrix is, and of course, 3) help you understand how PCA works.

Continue reading

How bad is my distribution really?

Most scientists think that t-tests, ANOVAs, linear regression etc rely on your data being normally distributed. Is that really true? And given that no data is perfectly normal, how normal does your data have to be? Can I see what effect non-normality will have on my hypothesis testing? I’m going to use some simply tricks that us to see how bad our distribution is.
Continue reading

Neuronal Modelling – The very basics. Part 1.

I think a lot of people are confused about neuronal modelling. I think a lot of people think it is more complex than it is. I think too many people think you have to be a mathematical or computational wizard to understand it, and I think that leads to a lot of good modelling being discounted and a lot of bad modelling being let through peer-review. I’m here to tell you that biophysical models on neurons don’t have to be hard to implement, or understand. I’m going to start you off on the ground floor, in fact, below the ground floor, this is the basement level. All you need to know is a little coding (I’m going to do both Matlab and Python to start). But I should temper your expectations. When we are done, you’re not going to be ready to publish fully fledged multi-compartment models of neurons, but at least you will understand the fundamental principles of what is happening. And the most fundamental of all is this…

$\frac{\mathrm{d}v}{\mathrm{d}t} = \frac{i}{C}$

Shouting Into the Void: Interacting with PubMed part II

So you’ve just been accepted for publication. At this point, after months of extra experiments and back and forths with reviewers, you’re probably well and truly sick of your paper. However, the months roll by you look at the paper again, you note a cute bit of data analysis here, nice turn of phrase there, and with a gleam in your eye you look to see how many citations you’ve gotten. And unless you’re very lucky, that number might well be still in single digits. 12 months of work, and less than 10 people have ever cited your work. Maybe you feel like it was all for nothing. Well I’m here to make you feel better, because your work was much more important than that. Continue reading

Interacting with PubMed. Part I

As a scientist, your life’s work is your publication list. I like to be intimate with mine. Sometimes I just stare at it. I’d buy it a glass of wine if I could. Maybe even caress it softly. Sure, she ain’t much to look at, but she’s mine, and I want to show her off. And if you want a job, you’re going to want to show yours off too. So I’m going to show you how to scrape your publication list from Pubmed with Python. Continue reading