Category Archives: Data Analysis
For Survey 2 on the NetGen project, I’ve looked at the database yesterday. Just wanted to make notes on the changes, I’ve made and the reasoning why, before I forget because Ruslan has told me there are particular requirements for UK data archiving that is needed by the ESRC.
First of all, Ruslan had fixed some parts of the Survey 2, but these are my additional fixings and have saved it as a new file 28-09-09:
- Any data that had numeric answers were changed from being string to numeric (this makes it easier for SPSS to analyse the data – particularly if we want means that is if we treat some of the Likert scales as continuous)
- Some of the data is related to tick boxes, where when students have ticked the boxes this is coded as 1 where students have not ticked this has been left blank as a system-missing value. I have recoded these variables into the same variable to be that 1 = yes and system-missing value to be 0, so that 0 = no. I think it is a fair assumption if the students did not tick the box then it is a no.
- For QE4, the data was coded as follows: 1= “don’t know”, 2 =”not at all useful”, 3= “not very useful”, 4 = “fairly useful” and 5 = “very useful”. If means were used for analysing this question, then it would be all wrong because of the “don’t know” is represented by 1. So, I’ve recoded into the same variable, so that “don’t know” is now represented by 5, and 1 = “not at all useful”, 2 = “not very useful”, 3 = “fairly useful” and 4 = “very useful”.
These are the things I intend to do today for Survey 2 and if I get time for Survey 3 and the linked Survey 2 and Survey 3:
Change system-missing to user-missing values
At the moment there are system-missing values (i.e. blanks) – I want to change these to user-missing values (i.e 999) (and make sure and add these to the variable as missing) – looked through the internet to see if there was a good reason for doing – or whether that was just my preference – and it seems to be more my preference than anything else … but here is my reasoning for doing it
- If we want to do a missing-data analysis it might be easier to sort out who were missing using the 999 as SPSS automatically disregards system-missing values
- If any new variable was computed in SPSS, blanks may represent that SPSS failed to compute an answer rather than a student failed to input an answer – so need to distinguish between that
- Should really be able to distinguish between questions that are not applicable to the student and where students missed out a question – although in this survey as far as I can see there are no ‘not applicable’ questions
Look for double entries
There seemed to be some double entries for students, I’m not sure whether to delete them or just exclude them from the analysis. As most of the double entries seem to have missing values anyhow they might just be excluded – but possibly shouldn’t take any changes and just omit them from the analysis. Hence I need to create a new variable which will be 1 = include and o = exclude, where I can then select cases.
Create a new ID
Unfortunately at the moment, the NetGen ID is a string rather than numeric and whilst that is ok for identifying students, when ordering students it tends to be: 2NG10, 2NG100, 2NG11 which is not the best way for finding and ID. So, I’m going to create a numeric ID, by removing the 2NG – going to do this in Excel and then copy back over, since in Excel it will be easy to remove the 2NG.