**Mood:** irritated

**Now Playing:** Melody of You (Sixpence None the Richer)

**Topic:** Data Analysis

So, I have the data for my pilot study using the SPQ and the maths-computing inventory. I’ve begun to try an analyse it but seem to be hitting brick walls everytime I try!

Well, the first brickwall was when I realised that I had grouped Environmental Sciences and Biology together in the questionnaire, this meant I had a Hard-Applied-Life and a Hard-Pure-Life discipline in the same grouping – and I was stuck with how to separate out the students belonging to the Env. Sci. and Biology. Thankfully I had the students intended degree they wanted to complete from when they entered the OU and was able to pick out somewhat those who were from the Env. Sci.

The next problem (well more my fault!) – I had started doing analysis when I realised that for the maths-computing inventory I had negative-marked questions and I had forgot to account for those when I had calculated the scores for the scales – so had to go back and do those and then start back the analysis.

Well, after doing several ANOVAs etc. (again my fault!) – I then decided to check and see whether my data was normal and had homogeneous variances between the groups (the disciplines, gender and software) – well, whadya know … they weren’t!!! All the scores I had calculated from both the SPQ and the Maths-Computing Inventory were all non-normal (Kolmogorov-Smirnov normality test) – that did got me frustrated because then tried to see if I can transform it into being normal (log, square, squareroot, arcsin, reciprocal – and various combinations of those) – but no go … as the kurtosis and skewness were both less than 1, I guess that is why there were no improvement by transforming. Anyway, then proceeded to trim the samples, by taking out the outliers (well only did it for the surface approach score – haven’t really tried the others) – and that didn’t improve it one bit – same sort of results …. so, gave up on that and accepted that my data was non-normal.

Then came the other brickwall … test of homogeneity of variance (Levene’s test) – well some groups were homogeneous whilst others weren’t – only gender was homogeneous between all scores. The rest were at least heterogeneous for one variable, for example in the Hard/Soft discipline group it was heterogeneous for the mathematics motivation scale. Well, I know I can apply Kruskal-Wallis Test for the non-normal data but has to have homogeneous variance – so have done that for gender – which I got most everything being not significant except computer confidence. What is interesting, was I wanted to check and see if that was influence by age (age was related to computer confidence as well – using spearman rank correlation coefficient) – now in normal parametric statistics you can control that variable (perhaps as a covariate) – but in nonparametric statistics wasn’t certain, so decided to employ spearman rank correlation to test the correlation between age and computer confidence by controlling gender (i.e. just for males and then for females) – to see what will happen … and there seems to be some correlation between age and computer confidence when gender is controlled (well for females there is a correlation – but for males no).

Anyway, I’ve moved away from the point, I now have data that is non-normal and heterogeneous variances – and don’t know what tests I can use in SPSS for comparing across groups – I have looked up the stuff on the internet and it seems that there is a new statistic called MOM-H statistic which uses Wilcox and Keselman MOM (Modified One-Step M-estimator) for deciding how much should be trimmed and then combining this with Schrader and Hettmansperger’s H statistic (all of this developed by Othman et al, 2004) can be used for non-normal and heterogeneous variances – still got to look up the paper … but my problem is that I’m wondering if I even want to venture into that realm of deep statistics – I mean, I’m not sure how important these pilot study results will be to my main study … should I go through doing all that data analysis for nothing?? Can my simple descriptive statistics suffice? I’ve got to decide, because it will take me awhile to decipher what all the symbols mean etc. in those papers and how it relates to my data before I can even begin to use it and that might take up more time than I really have.