#### Stats question

4

I have a bit of a stats issue here, but I'll simplify the actual specifics so it doesn't get too boring...

Imagine I have investigated the effects of five drugs on heart rate. I give subject number 1 drug A and monitor their heart rate at hourly intervals for 24h. After a washout period, I then give this same person drug B, and again monitor heat rate hourly. I do this for drug C, D, and E on the same person. I then repeat this procedure on 2 more people.

I draw a line graph, showing change in HR (compared to the pre-drug HR) over time, with five lines, one for each drug. I want to see whether there is a significant difference between the change in HR induced by drugs A-E (and a placebo control), so I decide to do repeated measures ANOVA for each particular timepoint, with corrected pairwise comparison. However, at one time point (say 15h), the distribution of heart rates over the 3 volunteers is not normal, and therefore I have to do the ANOVA on ranks.

My question is - does this them mean that I have to do EVERY ANOVA on this graph as a ranks ANOVA, or can I just use normal ANOVA for all the others? Effectively, I'm asking whether you can mix rank and non-rank tests of the same type, as long as you define the test of normality which is used to determine this?

On the one hand, I know the ANOVA by ranks is a bit more robust, but then it seems stupid that I could lose significance at another time point, purely because one set of data at another timepoint is not normally distributed.

Can anyone help?

J

I can't really help directly on this, but I was at a meeting yesterday when the subject of statistics came up, and the prof said that whatever you do you must always know exactly why you did it - he said whenever he sees a load of statistics he wants to make sure people know all about them and why they made the choices they did (I'm avoiding stats because my sample is nowhere big enough and luckily enough the only other work done in my area used percentages!) there was also a lot of talk about whether advice should be sought from a mathematician or a researcher on what to do - and that might be your answer

M

It sounds like what you're saying is you want to do a separate ANOVA for each time-point - the reason you won't be able to find an answer to the question is because you should be including 'time' as a variable in the analysis. Otherwise, you don't truly control for multiple comparisons. So if you're doing a single 5x25 ANOVA with the factors TIME and DRUG and one of your conditions doesn't have a normal distribution, then the whole thing should be a single ranks ANOVA.

FYI time-series data is not suited to a standard ANOVA at all because you make so many comparisons - if you correct e.g. Bonferroni then it becomes near impossible for an effect to reach significance; if you don't then with that many comparisons the odds are you'll get a few false positives. If it were me, I'd bootstrap the data and construct my own ANOVA tables - that option might not be available to you though if you can't easily get training in robust statistics.

4

Melsie, thanks for your reply. Most of it's above me (I'm in biomedical science and we generally avoid stats where possible!). However, I'm not sure I follow one of your points. You say that I should include time as a variable. However, if I pick just two time points (say an "early" one at 3h and a "late" one at 24h), then when do I need to correct for time? If I understand what you're saying, it's almost as if the act of taking a measurement at other time points alters the way I have to analyse them, which seems very strange...

E

Hi 4matt,
I'm not all that great with stats myself, but I think the issue is this: If you conduct repeated measurements on the same individuals, then you can't really analyse each time point independently, because they are *not* independent. If you had sampled each timepoint on a different set of individuals, then they might be independent. But in your case, your values for 3h will be correlated with your values for 24h, simply because they were done on the same individuals.

Plus you're not supposed to do lots of individual tests on one dataset, because then the chance of type 1 error is higher. If you're using significance level of 0.05, that means 5% of your tests will come out as significant just by chance, even though they are not "actually" significant.

I think you're on the right track with repeated measures ANOVA, since that adjusts for this kind of correlated covariance structure. But you don't want to do it for each timepoint separately, you want to do it for the whole dataset at once, with time as a factor. Then you can test the significance of drug, time and drug*time interaction. So I'd think you'd have to do it all rank, or all non-rank.

If you have any access to advice from qualified statisticians I'd definitely go for that- check with your uni, they might have something available for postgrads. They'll be able to give you advice specifically for your dataset. You'll almost certainly be able to use whatever you learn there in projects later on! There are also a number of online groups that seem to offer free stats advice to researchers out of the goodness of their hearts... I've never tried them, but might be worth a shot if you get stuck (?)

How much you need to worry depends on how obvious your differences between treatments are. If you draw a graph with error bars and it looks pretty clear cut, you can just do a repeat measures ANOVA and no one's likely to argue with your findings. However if it's not that clear cut, and you are making conclusions based on your stats, you need to be more careful about your statistical design.

Good luck! BTW- I'm no expert, and could be wrong on any this!