training & workshops
HPC School 2012
A funded eight-day graduate course for South African students who wish to advance their expertise in high performance computing (HPC), parallel programming and related research topics which would potentially rely on HPC techniques. 1 to 8 July 2012
python course at CHPC
This tutorial:
Course content navigation:
File I/O: Reading ASCII data
In this tutorial we are going to try and read an ASCII file which contains concentration of C02 measured at cape Point between the 1 January 1995 and the 31st December 2007.
You can download the file here: CPT_CO2_dm_95_07.txt
The first few lines look like this:
Date All data Filtd data
01-Jan-95 #N/A #N/A
02-Jan-95 #N/A #N/A
03-Jan-95 #N/A #N/A
04-Jan-95 #N/A #N/A
15-Jan-95 357.98 357.63
16-Jan-95 357.89 357.71
...
So there is 1 header line. The rest consists of 3 colums seperated by tabs, with bad values assigned the #N/A character. First let's start a python session and open our data file. Then within the python interpreter we define our path and file names.
pname = ''
fname = 'CPT_CO2_dm_95_07.txt'
Then we open the file and assign a file pointer called fid to our file.
fid = open(pname+fname)
The output of fid on the prompt will give something like:
open file 'CPT_CO2_dm_95_07.txt', mode 'r' at 0xb77c3128
Now, I am going to use the readline and readlines method to read my data line by line.
# Read 1 line in header
header=fid.readline()
This only read 1 line (in this case, the first line) into a variable called header.
The output of header will give me a string variable:
'Date\tAll data\tFiltd data\r\n'
Then read all data lines in one go using the readlines method
data = fid.readlines()
All my data is now stored in a list variable called data. I can check the number of rows using the len method.
len(data)
4748
Now I can go through each line to extract the variable of interest. If I just print out my 1st data row I see that each row is finished with a \r\n, which means a return and new line. I also see that each element in my rows is seperated by a tab \t
data[0]
'01-Jan-95\t#N/A\t#N/A\r\n'
I also want to replace all the bad values "#N/A" with a "NaN". To do that I use the replace method. Here is an example for the 1st row of data
row = data[0]
row = row.replace("#N/A","NaN")
Now I remove the return and new line characters from my row
row = row.strip('\r\n')
And then I seperate my 3 variables using the tab character
a,b,c=row.split('\t')
The 3 elements are now stored in string variables a, b and c. I can directly convert those strings to float using the float method.
b=float(b)
c=float(c)
For the time variable it is more difficult. In this example, I will convert my time variable from a string to a float. I will define my time as the number of days since 1-January-1950. This is done using methods in the datetime module of python called datetime and timedelta.
from datetime import datetime, timedelta
I convert my variable a to a datetime object
datetime.strptime(a, "%d-%b-%y")
and I then convert it to the number of days since 1-Jan-1950
(datetime.strptime(a, "%d-%b-%y")-datetime(1950,1,1)).days
Let us now put it all together in a loop.
#Initialise my variables as nans of length data
tserial = ones((len(data),))*NaN
allData = ones((len(data),))*NaN
filtData = ones((len(data),))*NaN
i=0
for row in data:
row = row.replace("#N/A","NaN")
row = row.strip('\r\n')
a,b,c=row.split('\t')
# Store time as days since 1-Jan-1950
tserial[i]=(datetime.strptime(a, "%d-%b-%y")-datetime(1950,1,1)).days
allData[i]= float(b)
filtData[i] = float(c)
i=i+1
You can download the example code from: getCo2.py file
