A+ A A-

File I/O: Reading binary data

This tutorial:

File I/O: Reading ASCII data

Plotting: time-series

File I/O: Reading binary data

Plotting: Maps


In this tutorial we will read a file of binary data collected from the tropical Rainfall Measuring Mission (TRMM) satellite. A description of the binary data can be found on the RSS / TMI site. The daily data is the most complicated so that what we are going to read. You should then find it easy to read 3-daily , weekly or monthly TMI/TRMM data based on this tutorial.

The file we are using here was downloaded from the SSMI FTP site.

From the site, we see that the daily binary data file consists of fourteen 0.25 x 0.25 degree grid (1440,320) byte maps. Seven ascending maps in the following order: Time (T), Sea Surface Temperature (S), 10-meter Surface Wind Speed using 11 GHz (Z), 10-meter Surface Wind Speed using 37 GHz (W), Atmospheric Water Vapor (V), Cloud Liquid Water (L), and Rain Rate (R), are followed by seven descending maps in the same order.

This means that the data will have the following dimensions (14,1440,320). If we want to separate the data into ascending and descending paths (which I want), then we can order the data so it has dimensions (2,7,1440,320), where (2) represents the number of satellite paths (ascensing or descending), (7) indicates the number of variables in the file (1440) is the length of the longitude and (320) is the length of the latitude. Before we even start reading the data, we can define our longitude and lontitude variables. For this we use the numpy arange method.

# Number of longitude and latitude points
nlon = 1440
nlat = 320
# Define lats and lons
lat = arange(-39.875,40,0.25)
lon = arange(-179.875,180,0.25)

Ok, so the first thing to do to read the data in is to define where our file is on our sytem and to open a file pointer just like when reading the ASCII data in the 1st tutorial. Of course, make sure you have unzipped your binary file first.

fid = open(pname+fname, 'rb')

The method we use is called fromfile. There are other ways to do this but fromfile is supposed to be very fast so that is the one I have chosen. We load the binary data into a variable called data and reshape it straight away to have the data in the correct shape, rather than in 1 long data stream.

data = fromfile(file=fid, dtype=uint8).reshape((2,7,nlat,nlon))

The data has been read in so now we can close the file.


Now we want to convert the data into meaningful geophysical values using the scaling factors and offsets for each variable. We also want to discard all data that is bad such as land, contaminated data etc... On the RSS / TMI site, it says that all values above 250 can be flagged as bad. So we will mask all values above or equal to 250. The mkas variable has the same dimension as our data.


To extract say the SST variable collected during the satellite descending path we will do the following:

sst = data[1,1,:,:]*scale_factor+offset

We then assign a NaN to all the bad data.


You should now have read the SST for the descending path. You can test that everything looks OK by plotting it using the imshow method from pylab.

from pylab import imshow, colorbar, show

Although we've done the hardest bit, there still remains one thing to do. We need to re-order the longitude and the data to go from -180 to 180 rather than from 0 to 360 degrees East. To do this this I am going to create an empty array the same shape as my SST and then I am going to cut my SST variable in the middle, put the second half in front (from 0 to 180 degrees East) and put the first half (from 0 to -180) last. I use the numpy empty_like method to initiate my new SST variable. Here is the code:

# Re-order my SST so that it goes from longitude -180 to 180 rather than 0 to 360
# Flip dimension of variable along longitude
newSST = empty_like(SST)
newSST[:,0:nlon/2] = SST[:,nlon/2:nlon]
newSST[:,nlon/2:nlon] = SST[:,0:nlon/2]

Now my SST variable correctly spans the longitudes from -180 to 180 degrees East. That's it. We can clean it up and put it all together in one function called readTmi1D, which we add to our other functions in the courseFunctions.py file.

Go to top of page

Last Updated on Monday, 06 August 2012 17:23

Hits: 737

Plotting: time-series

This tutorial:

File I/O: Reading ASCII data

Plotting: time-series

File I/O: Reading binary data

Plotting: Maps


In this tutorial we are going to use the data we have just read from our ASCII file in File I/O: Reading ASCII data example and then plot it in a time-series.

For all plotting functionality I use pylab, which comes with the matplotlib package.

First let's load the data using the function we have created in the File I/O: Reading ASCII data example.

from courseFunctions import readCo2
# Path and file name definition
pname = ''
fname = 'CPT_CO2_dm_95_07.txt'
# Read in data
time, x, y = readCo2(pname+fname)

Using pylab we just plot data without any formatting.

from pylab import *

This is all right but we would like to have our x axis formatted to indicate dates rather than just numbers. For this we use the plot_date method of matplotlib. We create a figure object and define the layout of our axis.

fig = figure(figsize =(12., 9.),facecolor='white', edgecolor='black')
ax = fig.add_axes([0.05, 0.5, 0.85, 0.35],axisbg='white')

We convert our time object to a matplotlib time object. I call the matplotlib time object plotDate.

from datetime import datetime, timedelta
st = matplotlib.dates.date2num(datetime(1950,1,1))

I plot my unfiltered data as a thin grey line and my filtered data as a thick black line.

ax.plot_date(plotDate,x, 'gray',linewidth = 1.0)
ax.plot_date(plotDate,y, 'k',linewidth = 1.5)

I want my x-axis to start on the 1-Jan 1995 and to end on the 31st of December 2007. I define 2 matplotlib dates called start_time and end_time to specify the axis-limits. If you wanted to plot data for a shorter period, you could change the definition of start_time and end_time to something else.


I now set the x-axis limits to start_time and end_time.

ax.set_xlim( time_start, time_end)

Now the trick is to use the following methods from matplotlib.dates.

from matplotlib.dates import YearLocator, MonthLocator, DateFormatter

We want to have a reference for every year and every month on our plot.

years    = YearLocator()   # every year
months   = MonthLocator()  # every month

The following defines the format for how our time is displayed on the x-axis. In this case, it will be a 4-digit year.

dateFmt = DateFormatter('%Y')

We want labels on the x-axis for each year.

# format the ticks

We can now print out the plot using the FigureCanvasAgg method.

from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas
canvas = FigureCanvas(fig)

You can download the code from file plotCo2.py.

Go to top of page

Last Updated on Monday, 06 August 2012 17:19

Hits: 958

File I/O: Reading ASCII data

This tutorial:

File I/O: Reading ASCII data

Plotting: time-series

File I/O: Reading binary data

Plotting: Maps


In this tutorial we are going to try and read an ASCII file which contains concentration of C02 measured at cape Point between the 1 January 1995 and the 31st December 2007.
You can download the file here: CPT_CO2_dm_95_07.txt

The first few lines look like this:

Date All data Filtd data
01-Jan-95 #N/A #N/A
02-Jan-95 #N/A #N/A
03-Jan-95 #N/A #N/A
04-Jan-95 #N/A #N/A
15-Jan-95 357.98 357.63
16-Jan-95 357.89 357.71

So there is 1 header line. The rest consists of 3 colums seperated by tabs, with bad values assigned the #N/A character. First let's start a python session and open our data file. Then within the python interpreter we define our path and file names.

pname = ''
fname = 'CPT_CO2_dm_95_07.txt'

Then we open the file and assign a file pointer called fid to our file.

fid = open(pname+fname)

The output of fid on the prompt will give something like:
open file 'CPT_CO2_dm_95_07.txt', mode 'r' at 0xb77c3128

Now, I am going to use the readline and readlines method to read my data line by line.

# Read 1 line in header

This only read 1 line (in this case, the first line) into a variable called header.

The output of header will give me a string variable:
'Date\tAll data\tFiltd data\r\n'

Then read all data lines in one go using the readlines method

data = fid.readlines()

All my data is now stored in a list variable called data. I can check the number of rows using the len method.


Now I can go through each line to extract the variable of interest. If I just print out my 1st data row I see that each row is finished with a \r\n, which means a return and new line. I also see that each element in my rows is seperated by a tab \t


I also want to replace all the bad values "#N/A" with a "NaN". To do that I use the replace method. Here is an example for the 1st row of data

row = data[0]
row = row.replace("#N/A","NaN")

Now I remove the return and new line characters from my row

row = row.strip('\r\n')

And then I seperate my 3 variables using the tab character


The 3 elements are now stored in string variables a, b and c. I can directly convert those strings to float using the float method.


For the time variable it is more difficult. In this example, I will convert my time variable from a string to a float. I will define my time as the number of days since 1-January-1950. This is done using methods in the datetime module of python called datetime and timedelta.

from datetime import datetime, timedelta

I convert my variable a to a datetime object

datetime.strptime(a, "%d-%b-%y")

and I then convert it to the number of days since 1-Jan-1950

(datetime.strptime(a, "%d-%b-%y")-datetime(1950,1,1)).days

Let us now put it all together in a loop.

#Initialise my variables as nans of length data
tserial = ones((len(data),))*NaN
allData = ones((len(data),))*NaN
filtData = ones((len(data),))*NaN
for row in data:

    row = row.replace("#N/A","NaN")
    row = row.strip('\r\n')
    # Store time as days since 1-Jan-1950
    tserial[i]=(datetime.strptime(a, "%d-%b-%y")-datetime(1950,1,1)).days
    allData[i]= float(b)
    filtData[i] = float(c)

You can download the example code from: getCo2.py file

Go to top of page

Last Updated on Monday, 06 August 2012 17:22

Hits: 525

Marjolaine Rouault's Tutorial

Last Updated on Monday, 30 July 2012 17:25

Hits: 392

CHPC in the News

Documentation for users:

Tsessebe Cluster Unavailable

Graphical Processing Unit Cluster Unavailable

CHPC SAGrid Cluster Available

Dirisa Storage Unit Unavailable

Social Share

FacebookTwitterGoogle BookmarksLinkedin

Website developed by Multidimensions