CHAPTER 2 CDMS Python Application Programming Interface

Overview

This chapter describes the CDMS Python application programming interface (API). Python is a popular public-domain, object-oriented language. Its features include support for object-oriented development, a rich set of programming constructs, and an extensible architecture. CDMS itself is implemented in a mixture of C and Python. In this chapter the assumption is made that the reader is familiar with the basic features of the Python language.

Python supports the notion of a module, which groups together associated classes and methods. The import command makes the module accessible to an application. This chapter documents the cdms module.

The chapter sections correspond to the CDMS classes. Each section contains tables describing the class internal (non-persistent) attributes, constructors (functions for creating an object), and class methods (functions). A method can return an instance of a CDMS class, or one of the Python types:

 

Python types used in CDMS

Type

Description

Array

Numeric or masked multidimensional data array. All elements of the array are of the same type. Defined in the Numeric and MA modules.

Comptime

Absolute time value, a time with representation (year, month, day, hour, minute, second). Defined in the cdtime module. cf. reltime

Dictionary

An unordered collection of objects, indexed by key. All dictionaries in CDMS are indexed by strings, e.g.:

axes['time']

Float

Floating-point value.

Integer

Integer value.

List

An ordered sequence of objects, which need not be of the same type. New members can be inserted or appended. Lists are denoted with square brackets, e.g.,

[1, 2.0, 'x', 'y']

None

No value returned.

Reltime

Relative time value, a time with representation (value, "units since basetime"). Defined in the cdtime module. cf. comptime

Tuple

An ordered sequence of objects, which need not be of the same type. Unlike lists, tuples elements cannot be inserted or appended. Tuples are denoted with parentheses, e.g.,

(1, 2.0, 'x', 'y')

A first example

The following Python script reads January and July monthly temperature data from an input dataset, averages over time, and writes the results to an output file. The input temperature data is ordered (time, latitude, longitude).

1 #!/usr/bin/env python

2 import cdms

3 from cdms import MV

4 jones = cdms.open('/pcmdi/cdms/obs/jones_mo.nc')

5 tasvar = jones['tas']

6 jans = tasvar[0::12]

7 julys = tasvar[6::12]

8 janavg = MV.average(jans)

9 janavg.id = "tas_jan"

10 janavg.long_name = "mean January surface temperature"

11 julyavg = MV.average(julys)

12 julyavg.id = "tas_jul"

13 julyavg.long_name = "mean July surface temperature"

14 out = cdms.open('janjuly.nc','w')

15 out.write(janavg)

16 out.write(julyavg)

17 out.comment = "Average January/July from Jones dataset"

18 jones.close()

19 out.close()

 

Line

Notes

2,3

Makes the CDMS and MV modules available. MV defines arithmetic functions.

4

Opens a netCDF file read-only. The result jones is a dataset object.

5

Gets the surface air temperature variable. 'tas' is the name of the variable in the input dataset. This does not actually read the data.

6

Read all January monthly mean data into a variable jans . Variables can be sliced like arrays. The slice operator [0::12] means `take every 12th slice from dimension 0, starting at index 0 and ending at the last index.' If the stride 12 were omitted, it would default to 1.

Note that the variable is actually 3-dimensional. Since no slice is specified for the second or third dimensions, all values of those dimensions are retrieved. The slice operation could also have been written [0::12, : , :] .

Also note that the same script works for multi-file datasets. CDMS opens the needed data files, extracts the appropriate slices, and concatenates them into the result array.

7

Reads all July data into a masked array julys .

8

Calculate the average January value for each grid zone. Any missing data is handled automatically.

9,10

Set the variable id and long_name attributes. The id is used as the name of the variable when plotted or written to a file.

14

Create a new netCDF output file named 'janjuly.nc' to hold the results.

15

Write the January average values to the output file. The variable will have id " tas_jan " in the file.

write is a utility function which creates the variable in the file, then writes data to the variable. A more general method of data output is first to create a variable, then set a slice of the variable.

Note that janavg and julavg have the same latitude and longitude information as tasvar . It is carried along with the computations.

17

Set the global attribute ' comment '.

18

Close the output file.

cdms module

The cdms module is the Python interface to CDMS. The objects and methods in this chapter are made accessible with the command:

import cdms

The functions described in this section are not associated with a class. Rather, they are called as module functions, e.g.,

file = cdms.open('sample.nc')

cdms module functions

Type

Definition

Variable

asVariable(s)

Transform s into a transient variable.

s is a masked array, Numeric array, or Variable. If s is already a transient variable, s is returned.

See also: isVariable .

Axis

createAxis(data, bounds=None)

Create an Axis, which is not associated with a file or dataset. This is useful for creating a grid which is not contained in a file or dataset.

data is a one-dimensional, monotonic Numeric array.

bounds is an array of shape (len(data),2), such that for all i, data[i] is in the range [bounds[i,0],bounds[i,1]]. If bounds is not specified, the default boundaries are generated at the midpoints between the consecutive data values, provided that the autobounds mode is `on' (the default). See setAutoBounds.

Also see: CdmsFile.createAxis

Axis

createEqualAreaAxis(nlat)

Create an equal-area latitude axis. The latitude values range from north to south, and for all axis values x[i], sin(x[i])-sin(x[i+1]) is constant.

nlat is the axis length.

The axis is not associated with a file or dataset.

Axis

createGaussianAxis(nlat)

Create a Gaussian latitude axis. Axis values range from north to south.

nlat is the axis length.

The axis is not associated with a file or dataset.

RectGrid

createGaussianGrid(nlats, xorigin=0.0, order="yx")

Create a Gaussian grid, with shape (nlats, 2*nlats).

nlats is the number of latitudes.

xorigin is the origin of the longitude axis.

order is either "yx" (lat-lon, default) or "xy" (lon-lat)

RectGrid

createGenericGrid(latArray, lonArray, latBounds=None, lonBounds=None, order="yx", mask=None)

Create a generic grid, that is, a grid which is not typed as Gaussian, uniform, or equal-area. The grid is not associated with a file or dataset.

latArray is a NumPy array of latitude values.

lonArray is a NumPy array of longitude values

latBounds is a NumPy array having shape (len(latArray),2), of latitude boundaries.

lonBounds is a NumPy array having shape (len(lonArray),2), of longitude boundaries.

order is a string specifying the order of the axes, either "yx" for (latitude, longitude), or "xy" for the reverse.

mask (optional) is an integer-valued NumPy mask array, having the same shape and ordering as the grid.

RectGrid

createGlobalMeanGrid(grid)

Generate a grid for calculating the global mean via a regridding operation. The return grid is a single zone covering the range of the input grid.

grid is a RectGrid.

RectGrid

createRectGrid(lat, lon, order, type="generic", mask=None)

Create a rectilinear grid, not associated with a file or dataset. This might be used as the target grid for a regridding operation.

lat is a latitude axis, created by cdms.createAxis.

lon is a longitude axis, created by cdms.createAxis.

order is a string with value "yx" (the first grid dimension is latitude) or "xy" (the first grid dimension is longitude).

type is one of 'gaussian','uniform','equalarea',or 'generic'

If specified, mask is a two-dimensional, logical Numeric array (all values are zero or one) with the same shape as the grid.

RectGrid

createUniformGrid(startLat, nlat, deltaLat, startLon, nlon, deltaLon, order="yx", mask=None)

Create a uniform rectilinear grid. The grid is not associated with a file or dataset. The grid boundaries are at the midpoints of the axis values.

startLat is the starting latitude value.

nlat is the number of latitudes. If nlat is 1, the grid latitude boundaries will be startLat +/- deltaLat/2.

deltaLat is the increment between latitudes.

startLon is the starting longitude value.

nlon is the number of longitudes. If nlon is 1, the grid longitude boundaries will be startLon +/- deltaLon/2.

deltaLon is the increment between longitudes.

order is a string with value "yx" (the first grid dimension is latitude) or "xy" (the first grid dimension is longitude).

If specified, mask is a two-dimensional, logical Numeric array (all values are zero or one) with the same shape as the grid.

Axis

createUniformLatitudeAxis(startLat, nlat, deltaLat)

Create a uniform latitude axis. The axis boundaries are at the midpoints of the axis values. The axis is designated as a circular latitude axis.

startLat is the starting latitude value.

nlat is the number of latitudes.

deltaLat is the increment between latitudes.

RectGrid

createZonalGrid(grid)

Create a zonal grid. The output grid has the same latitude as the input grid, and a single longitude. This may be used to calculate zonal averages via a regridding operation.

grid is a RectGrid.

Axis

createUniformLongitudeAxis(startLon, nlon, deltaLon)

Create a uniform longitude axis. The axis boundaries are at the midpoints of the axis values. The axis is designated as a circular longitude axis.

startLon is the starting longitude value.

nlon is the number of longitudes.

deltaLon is the increment between longitudes.

Variable

createVariable(array, typecode=None, copy=0, savespace=0, mask=None, fill_value=None, grid=None, axes=None, attributes=None, id=None)

This function is documented in Table 2.31.

Integer

isVariable(s)

Return 1 if s is a variable, 0 otherwise. See also: asVariable .

Dataset
or
CdmsFile

open(url,mode='r')

Open or create a Dataset or CdmsFile.

url is a Uniform Resource Locator, referring to a cdunif or XML file. If the URL has the extension '.xml' or '.cdml', a Dataset is returned, otherwise a CdmsFile is returned. If the URL protocol is 'http', the file must be a '.xml' or '.cdml' file, and the mode must be 'r'. If the protocal is 'file' or is omitted, a local file or dataset is opened.

mode is the open mode. See Table 2.22.

Example: Open an existing dataset:

f = cdms.open("sampleset.xml")

 

Example: Create a netCDF file:

f = cdms.open("newfile.nc",'w')

List

order2index (axes, orderstring)

Find the index permutation of axes to match order. Return a list of indices

axes is a list of axis objects.

orderstring is defined as in orderparse.

List

orderparse(orderstring)

Parse an order string. Returns a list of axes specifiers.

orderstring consists of:

  • Letters t, x, y, z meaning time, longitude, latitude, level
  • Numbers 0-9 representing position in axes
  • Dash (-) meaning insert the next available axis here.
  • The ellipsis ... meaning fill these positions with any remaining axes.
  • (name) meaning an axis whose id is name

None

setAutoBounds(mode)

Set autobounds mode.

If mode is 'on' (the default), the getBounds method will automatically generate boundary information for an axis or grid, if the boundaries are not explicitly defined.

If mode is 'off', and no boundary data is explicitly defined, the bounds will NOT be generated; the getBounds method will return None for the boundaries.

None

setClassifyGrids(mode)

Set the grid classification mode. This affects how grid type is determined, for the purpose of generating grid boundaries.

If mode is 'on' (the default), grid type is determined by a grid classification method, regardless of the value of grid.getType().

If mode is 'off', the value of grid.getType() determines the grid type

 

Class Tags

Tag

Class

'axis'

Axis

'database'

Database

'dataset'

Dataset, CdmsFile

'grid'

RectGrid

'variable'

Variable

'xlink'

Xlink

CdmsObj

A CdmsObj is the base class for all CDMS database objects. At the application level, CdmsObj objects are never created and used directly. Rather the subclasses of CdmsObj (Dataset, Variable, Axis, etc.) are the basis of user application programming.

All objects derived from CdmsObj have a special attribute .attributes. This is a Python dictionary, which contains all the external (persistent) attributes associated with the object. This is in contrast to the internal, non-persistent attributes of an object, which are built-in and predefined. When a CDMS object is written to a file, the external attributes are written, but not the internal attributes.

Example: get a list of all external attributes of obj.

extatts = obj.attributes.keys()

 

Attributes common to all CDMS objects

Type

Name

Definition

Dictionary

attributes

External attribute dictionary for this object.

All attributes may be accessed and set using the Python dot notation (`.')

Getting and setting attributes

Type

Definition

Various

value = obj.attname


Get an internal or external attribute value. If the attribute is external, it is read from the database. If the attribute is not already in the database, it is created as an external attribute. Internal attributes cannot be created, only referenced.

 

obj.attname = value


Set an internal or external attribute value. If the attribute is external, it is written to the database.

Axis

An Axis is a one-dimensional coordinate object.

An Axis is contained in a Dataset. Setting a slice of an Axis writes data to the Dataset, referencing an Axis slice reads data from the Dataset. Axis objects are also used to define the domain of a Variable.

An axis in a CdmsFile may be designated the `unlimited' axis, meaning that it can be extended in length after the initial definition. There can be at most one unlimited axis associated with a CdmsFile.

 

Axis Internal Attributes

Type

Name

Definition

Dictionary

attributes

External attribute dictionary.

String

id

Axis identifer.

Dataset

parent

The dataset which contains the variable.

Tuple

shape

The length of each axis.

 

partition attribute

For an axis in a dataset, the .partition attribute describes how an axis is split across files. It is a list of the start and end indices of each axis partition.

Partitioned axis

For example, See Partitioned axis. shows a time axis, representing the 36 months, January 1980 through December 1982, with December 1981 missing. The first partition interval is (0,12), the second is (12,23), and the third is (24,36), where the interval (i,j) represents all indices k such that i <= k < j. The .partition attribute for this axis would be the list:

[0, 12, 12, 23, 24, 36]

Note that the end index of the second interval is strictly less than the start index of the following interval. This indicates that data for that period is missing.

 

Axis Constructors

 

cdms.createAxis(data, bounds=None)

Create an axis which is not associated with a dataset or file. See Table 2.2.

Dataset.createAxis(name,ar)

Create an Axis in a Dataset. (This function is not yet implemented. )

CdmsFile.createAxis(name,ar,unlimited=0)

Create an Axis in a CdmsFile.

name is the string name of the Axis.

ar is a 1-D data array which defines the Axis values. It may have the value None if an unlimited axis is being defined.

At most one Axis in a CdmsFile may be designated as being 'unlimited', meaning that it may be extended in length. To define an axis as unlimited, either:

  • set ar to None, and leave unlimited undefined, or
  • set ar to the initial 1-D array, and set unlimited to cdms.Unlimited

cdms.createEqualAreaAxis(nlat)

See Table 2.2.

cdms.createGaussianAxis(nlat)

See Table 2.2 on page 18.

cdms.createUniformLatitudeAxis(startlat, nlat, deltalat)

See Table 2.2 on page 18.

cdms.createUniformLongitudeAxis(startlon, nlon, deltalon)

See Table 2.2 on page 18.

 

 

Axis Methods

Type

Method Definition

Array

array = axis[ i:j]

Read a slice of data from the external dataset. Data is returned in the physical ordering defined in the dataset. See Table 2.9 for a description of slice operators.

None

axis[ i:j] = array

Write a slice of data to the external dataset. (axes in CdmsFiles only)

List of component times

asComponentTime(calendar=None)

Array version of cdtime tocomp. Returns a list of component times.

List of relative times

asRelativeTime()

Array version of cdtime torel. Returns a list of relative times.

None

assignValue(array)

Set the entire value of the axis.

array is a one-dimensional, Numeric array.

Axis

clone(copyData=1)

Return a copy of the axis, as a transient axis. If copyData is 1 (the default) the data itself is copied.

None

designateCircular(modulo, persistent=0)

Designate the axis to be circular.

modulo is the modulus value. Any given axis value x is treated as equivalent to x+modulus

If persistent is true, the external file or dataset (if any) is modified. By default, the designation is temporary.

None

designateLatitude(persistent=0):

Designate the axis to be a latitude axis.

If persistent is true, the external file or dataset (if any) is modified. By default, the designation is temporary.

None

designateLevel(persistent=0)

Designate the axis to be a vertical level axis.

If persistent is true, the external file or dataset (if any) is modified. By default, the designation is temporary.

None

designateLongitude(persistent=0, modulo=360.0)

Designate the axis to be a longitude axis.

modulo is the modulus value. Any given axis value x is treated as equivalent to x+modulus

If persistent is true, the external file or dataset (if any) is modified. By default, the designation is temporary.

None

designateTime(persistent=0, calendar = cdtime.MixedCalendar)

Designate the axis to be a time axis.

If persistent is true, the external file or dataset (if any) is modified. By default, the designation is temporary.

calendar is defined as in getCalendar() .

Array

getBounds()

Get the associated boundary array. The boundary array has shape (n,2), where n is the length of the axis.

If a boundary array is not explicitly defined and autoBounds mode is on, a default array is generated by calling genGenericBounds. Otherwise if autoBounds mode is off, the return value is None. See setAutoBoundsMode .

Integer

getCalendar()

Returns the calendar associated with the (time) axis. Possible return values, as defined in the cdtime module, are:

  • cdtime.GregorianCalendar: the standard Gregorian calendar
  • cdtime.MixedCalendar: mixed Julian/Gregorian calendar
  • cdtime.JulianCalendar: years divisible by 4 are leap years
  • cdtime.NoLeapCalendar: a year is 365 days
  • cdtime.Calendar360: a year is 360 days
  • None: no calendar can be identified

 

Note: If the axis is not a time axis, the global, file-related calendar is returned.

Array

getValue()

Get the entire axis vector.

Integer

isCircular()

Returns true if the axis has circular topology.

An axis is defined as circular if:

  • axis.topology=='circular', or
  • axis.topology is undefined, and the axis is a longitude

The default cycle for circular axes is 360.0

Integer

isLatitude()

Returns true iff the axis is a latitude axis.

Integer

isLevel()

Returns true iff the axis is a level axis.

Integer

isLinear()

Returns true iff the axis has a linear representation.

Integer

isLongitude()

Returns true iff the axis is a longitude axis.

Integer

isTime()

Returns true iff the axis is a time axis.

Integer

len(axis)

The length of the axis.

Tuple

mapInterval(interval)

Same as mapIntervalExt, but returns only the tuple (i,j), and wraparound is restricted to one cycle.

(i,j,k)

mapIntervalExt(interval)

Map a coordinate interval to an index interval.

interval is a tuple having one of the forms:

(x,y)
(x,y,indicator)
(x,y,indicator,cycle)
None or ':'

where x and y are coordinates indicating the interval [x,y), and:

indicator is a two or three-character string, where the first character is 'c' if the interval is closed on the left, 'o' if open, and the second character has the same meaning for the right-hand point. If present, the third character specifies how the interval should be intersected with the axis:

  • 'n' - select node values which are contained in the interval
  • 'b' - select axis elements for which the corresponding cell boundary intersects the interval
  • 'e' - same as 'n', but include an extra node on either side
  • 's' - select axis elements for which the cell boundary is a subset of the interval

The default indicator is 'ccn' , that is, the interval is closed, and nodes in the interval are selected.

If cycle is specified, the axis is treated as circular with the given cycle value. By default, if axis.isCircular() is true, the axis is treated as circular with a default modulus of 360.0.

An interval of None or ':' returns the full index interval of the axis.

(continued)

None

(mapInterval, continued)

The method returns the corresponding index interval as a 3-tuple (i,j,k), where k is the integer stride, and [i,j) is the half-open index interval i<=k<j (i>=k>j if k<0), or None if the intersection is empty.

For an axis which is circular (axis.topology == 'circular'), [i,j) is interpreted as follows (where N = len(axis)):

  • if 0<=i<N and 0<=j <= N, the interval does not wrap around the axis endpoint
  • otherwise the interval wraps around the axis endpoint.

 

See also: mapInterval, Variable.subRegion()

None

setCalendar(calendar, persistent=1)

Set the calendar for this (time) axis.

calendar is defined as in getCalendar().

If persistent is true, the external file or dataset (if any) is modified. This is the default.

TransientAxis

subAxis(i,j,k=1)

Create an axis associated with the integer range [i:j:k]. The stride k can be positive or negative. Wraparound is supported for longitude dimensions or those with a modulus attribute.

String

typecode()

The Numeric datatype identifier.

 

Axis Slice Operators

Slice

Definition

[i]

The ith element, starting with index 0

[i:j]

The ith element through, but not including, element j

[i:]

The ith element through and including the end

[:j]

The beginning element through, but not including, element j

[:]

The entire array

[i:j:k]

Every kth element, starting at i, through but not including j

[-i]

The ith element from the end. -1 is the last element.

Example: A longitude axis has value [0.0, 2.0, ..., 358.0], of length 180. Map the coordinate interval -5.0 <= x < 5.0 to index interval(s), with wraparound. The result index interval -2<=n<3 wraps around, since -2<0, and has a stride of 1. This is equivalent to the two contiguous index intervals -2<=n<0 and 0<=n<3

> axis.isCircular()

1

> axis.mapIntervalExt((-5.0,5.0,'co'))

(-2,3,1)

>

CdmsFile

A CdmsFile is a physical file, accessible via the cdunif interface. netCDF files are accessible in read-write mode. All other formats (DRS, HDF, GrADS/GRIB, POP, QL) are accessible read-only.

As of CDMS V3, the legacy cuDataset interface is also supported by CdmsFiles. See cu Module.

 

CdmsFile Internal Attributes

Type

Name

Definition

Dictionary

attributes

Global, external file attributes

Dictionary

axes

Axis objects contained in the file.

Dictionary

grids

Grids contained in the file.

String

id

File pathname.

Dictionary

variables

Variables contained in the file.

 

CdmsFile Constructors

 

fileobj = cdms.open(path, mode)

Open the file specified by path returning a CdmsFile object.

path is the file pathname, a string.

mode is the open mode indicator, as listed in Table 2.22.

fileobj = cdms.createDataset(path)

Create the file specified by path, a string.

 

CdmsFile Methods

Type

Definition

TransientVariable

fileobj (varname, selector)

Calling a CdmsFile object as a function reads the region of data specified by the selector. The result is a transient variable, unless raw=1 is specified. See Selectors.

For example, the following reads data for variable 'prc', year 1980:

f = cdms.open('test.nc')

x = f('prc', time=('1980-1','1981-1'))

Variable, Axis, or Grid

fileobj ['id']

Get the persistent variable, axis or grid object having the string identifier. This does not read the data for a variable.

For example:

f = cdms.open('sample.nc')

v = f['prc']

gets the persistent variable v, equivalent to v=f.variables['prc'] .

t = f['time']

gets the axis named 'time', equivalent to t=f.axes['time'] .

None

close()

Close the file.

Axis

copyAxis(axis, newname=None)

Copy axis values and attributes to a new axis in the file. The returned object is persistent: it can be used to write axis data to or read axis data from the file. If an axis already exists in the file, having the same name and coordinate values, it is returned. It is an error if an axis of the same name exists, but with different coordinate values.

axis is the axis object to be copied.

newname, if specified, is the string identifier of the new axis object. If not specified, the identifier of the input axis is used.

Grid

copyGrid(grid, newname=None)

Copy grid values and attributes to a new grid in the file. The returned grid is persistent. If a grid already exists in the file, having the same name and axes, it is returned. An error is raised if a grid of the same name exists, having different axes.

grid is the grid object to be copied.

newname, if specified is the string identifier of the new grid object. If unspecified, the identifier of the input grid is used.

Axis

createAxis(id, ar, unlimited=0)

Create a new Axis. This is a persistent object which can be used to read or write axis data to the file.

id is an alphanumeric string identifier, containing no blanks.

ar is the one-dimensional axis array.

Set unlimited to cdms.Unlimited to indicate that the axis is extensible.

RectGrid

createRectGrid(id, lat, lon, order, type="generic", mask=None)

Create a RectGrid in the file. This is not a persistent object: the order, type, and mask are not written to the file. However, the grid may be used for regridding operations.

lat is a latitude axis in the file.

lon is a longitude axis in the file.

order is a string with value "yx" (the first grid dimension is latitude) or "xy" (the first grid dimension is longitude).

type is one of 'gaussian','uniform','equalarea',or 'generic'

If specified, mask is a two-dimensional, logical Numeric array (all values are zero or one) with the same shape as the grid.

Variable

createVariable(String id, String datatype,List axes, fill_value=None)

Create a new Variable. This is a persistent object which can be used to read or write variable data to the file.

id is a String name which is unique with respect to all other objects in the file.

datatype is an MA typecode, e.g., MA.Float, MA.Int.

axes is a list of Axis and/or Grid objects.

fill_value is the missing value (optional).

Variable

createVariableCopy(var, newname=None)

Create a new Variable, with the same name, axes, and attributes as the input variable. An error is raised if a variable of the same name exists in the file.

var is the Variable to be copied.

newname, if specified is the name of the new variable. If unspecified, the returned variable has the same name as var.

Note: Unlike copyAxis, the actual data is not copied to the new variable.

None

sync()

Writes any pending changes to the file.

Variable

write(var, attributes=None, axes=None, extbounds=None, id=None, extend=None, fill_value=None, index=None, typecode=None)

Write a variable or array to the file. The return value is the associated file variable.

If the variable does not exist in the file, it is first defined and all attributes written, then the data is written. By default, the time dimension of the variable is defined as the 'unlimited' dimension of the file. If the data is already defined, then data is extended or overwritten depending on the value of keywords extend and index, and the unlimited dimension values associated with var.

var is a Variable, masked array, or Numeric array.

attributes is the attribute dictionary for the variable. The default is var.attributes.

axes is the list of file axes comprising the domain of the variable. The default is to copy var.getAxisList().

extbounds is the unlimited dimension bounds. Defaults to var.getAxis(0).getBounds()

id is the variable name in the file. Default is var.id.

extend=1 causes the first dimension to be 'unlimited': iteratively writeable. The default is None, in which case the first dimension is extensible if it is time.Set to 0 to turn off this behaviour.

fill_value is the missing value flag.

index is the extended dimension index to write to. The default index is determined by lookup relative to the existing extended dimension.

Note: data can also be written by setting a slice of a file variable, and attributes can be written by setting an attribute of a file variable.

 

CDMS Datatypes

CDMS Datatype

Definition

CdChar

character

CdDouble

double-precision floating-point

CdFloat

floating-point

CdInt

integer

CdLong

long integer

CdShort

short integer

Database

A Database is a collection of datasets and other CDMS objects. It consists of a hierarchical collection of objects, with the database being at the root, or top of the hierarchy. A database is used to:

The figure below illustrates several important points:


"variable=ua, dataset=ncep_reanalysis_mo,database=CDMS".

Overview

To access a database:

  1. Open a connection. The connect method opens a database connection. connect takes a database URI and returns a database object:

db = cdms.connect("ldap://dbhost.llnl.gov/database=CDMS,ou=PCMDI,o=LLNL,c=US")

  1. Search the database, locating one or more datasets, variables, and/or other objects.

The database searchFilter method searches the database. A single database connection may be used for an arbitrary number of searches.

For example, to find all observed datasets:

result = db.searchFilter("category=observed",tag="dataset")

Searches can be restricted to a subhierarchy of the database. This example searches just the dataset `ncep_reanalysis_mo':

result = db.searchFilter(relbase="dataset=ncep_reanalysis")

  1. Refine the search results if necessary. The result of a search can be narrowed with the searchPredicate method.
  2. Process the results. A search result consists of a sequence of entries. Each entry has a name, the name of the CDMS object, and an attribute dictionary, consisting of the attributes located by the search:

for entry in result:
print entry.name, entry.attributes

  1. Access the data. The CDMS object associated with an entry is obtained from the getObject method:

obj = entry.getObject()

If the id of a dataset is known, the dataset can be opened directly with the open method:

dset = db.open("ncep_reanalysis_mo")

  1. Close the database connection:

db.close()

 

Database Internal Attributes

Type

Name

Summary

Dictionary

attributes

Database attribute dictionary

LDAP

db

(LDAP only) LDAP database object

String

netloc

Hostname, for server-based databases

String

path

path name

String

uri

Uniform Resource Identifier.

 

Database Constructors

 

db = cdms.connect(uri=None, user="", password="")

Connect to the database.

uri is the Universal Resource Indentifier of the database. The form of the URI depends on the implementation of the database. For a Lightweight Directory Access Protocol (LDAP) database, the form is:

ldap://host[:port]/dbname

 

For example, if the database is located on host `dbhost.llnl.gov', and is named 'database=CDMS,ou=PCMDI,o=LLNL,c=US', the URI is:

ldap://dbhost.llnl.gov/database=CDMS,ou=PCMDI,o=LLNL,c=US

If unspecified, the URI defaults to the value of environment variable CDMSROOT.

user is the user ID. If unspecified, an anonymous connection is made.

password is the user password. A password is not required for an anonymous connection.

 

Database Methods

Type

Definition

None

close()

Close a database connection.

List

listDatasets()

Return a list of the dataset IDs in this database. A dataset ID can be passed to the open command.

Dataset

open(dsetid, mode='r')

Open a dataset.

dsetid is the string dataset identifier

mode is the open mode, 'r' - read-only, 'r+' - read-write, 'w' - create.

openDataset is a synonym for open .

SearchResult

searchFilter(filter=None, tag=None, relbase=None, scope=Subtree, attnames=None, timeout=None)

Search a CDMS database.

filter is the string search filter. Simple filters have the form "tag = value". Simple filters can be combined using logical operators '&', '|', '!' in prefix notation. For example, the filter '(&(objectclass=variable)(id=cli))' finds all variables named "cli". A formal definition of search filters is provided in the following section.

tag restricts the search to objects with that tag ("dataset" | "variable" | "database" | "axis" | "grid").

relbase is the relative name of the base object of the search. The search is restricted to the base object and all objects below it in the hierarchy. For example, to search only dataset `ncep_reanalysis_mo', specify:

relbase="dataset=ncep_reanalysis_mo".


To search only variable 'ua' in ncep_reanalysis_mo, use:

relbase="variable=ua, dataset=ncep_reanalysis_mo"


If no base is specified, the entire database is searched. See the scope argument also.

scope is the search scope (Subtree | Onelevel | Base). Subtree searches the base object and its descendants. Onelevel searches the base object and its immediate descendants. Base searches the base object alone. The default is Subtree.

attnames: list of attribute names. Restricts the attributes returned. If None, all attributes are returned. Attributes 'id' and 'objectclass' are always included in the list.

timeout: integer number of seconds before timeout. The default is no timeout.

 

Searching a database

The searchFilter method is used to search a database. The result is called a search result, and consists of a sequence of result entries.

In its simplest form, searchFilter takes an argument consisting of a string filter. The search returns a sequence of entries, corresponding to those objects having an attribute which matches the filter. Simple filters have the form (tag = value), where value can contain wildcards. For example:

'(id = ncep*)'

'(project = AMIP2)'

Simple filters can be combined with the logical operators `&', `|', `!'. For example,

'(&(id = bmrc*)(project = AMIP2))'

matches all objects with id starting with 'bmrc', and a 'project' attribute with value 'AMIP2'.

Formally, search filters are strings defined as follows:

filter ::= "(" filtercomp ")"

filtercomp ::= "&" filterlist | # and

"|" filterlist | # or

"!" filterlist | # not

simple

filterlist ::= filter | filter filterlist

simple ::= tag op value

op ::= "=" | # equality

"~=" | # approximate equality

"<=" | # lexicographically less than or equal to

">=" # lexicographically greater than or equal to

tag ::= string attribute name

value ::= string attribute value, may include '*' as a wild card

Attribute names are defined in the chapter on Climate Data Markup Language (CDML). In addition, some special attributes are defined for convenience:

The set of objects searched is called the search scope. The top object in the hierarchy is the base object. By default, all objects in the database are searched, that is, the database is the base object. If the database is very large, this may result in an unnecessarily slow or inefficient search. To remedy this the search scope can be limited in several ways:

A search result is accessed sequentially within a for loop:

result = db.searchFilter('(&(category=obs*)(id=ncep*))')

for entry in result:
print entry.name

Search results can be narrowed using searchPredicate. In the following example, the result of one search is itself searched for all variables defined on a 94x192 grid:

>>> result = db.searchFilter('parentid=ncep*',tag="variable")

>>> len(result)

65

>>> result2 = result.searchPredicate(lambda x: x.getGrid().shape==(94,192))

>>> len(result2)

3

>>> for entry in result2: print entry.name

variable=rluscs,dataset=ncep_reanalysis_mo,database=CDMS,ou=PCMDI, o=LLNL, c=US

variable=rlds,dataset=ncep_reanalysis_mo,database=CDMS,ou=PCMDI, o=LLNL, c=US

variable=rlus,dataset=ncep_reanalysis_mo,database=CDMS,ou=PCMDI, o=LLNL, c=US

>>>

 

 

 

SearchResult Methods

Type

Definition

ResultEntry

[i]

Return the i-th search result. Results can also be returned in a for loop:

for entry in db.searchResult(tag="dataset"):
...

Integer

len()

Number of entries in the result.

SearchResult

searchPredicate(predicate, tag=None)

Refine a search result, with a predicate search.

predicate is a function which takes a single CDMS object and returns true (1) if the object satisfies the predicate, 0 if not.

tag restricts the search to objects of the class denoted by the tag.

Note: In the current implementation, searchPredicate is much less efficient than searchFilter. For best performance, use searchFilter to narrow the scope of the search, then use searchPredicate for more general searches.

 

A search result is a sequence of result entries. Each entry has a string name, the name of the object in the database hierarchy, and an attribute dictionary. An entry corresponds to an object found by the search, but differs from the object, in that only the attributes requested are associated with the entry. In general, there will be much more information defined for the associated CDMS object, which is retrieved with the getObject method.

 

ResultEntry Attributes

Type

Name

Summary

String

name

The name of this entry in the database.

Dictionary

attributes

The attributes returned from the search.

attributes[key] is a list of all string values associated with the key.

 

ResultEntry Methods

Type

Definition

CdmsObj

getObject()

Return the CDMS object associated with this entry.

Note: For many search applications it is unnecessary to access the associated CDMS object. For best performance this function should be used only when necessary, for example, to retrieve data associated with a variable.

Accessing data

To access data via CDMS:

  1. Locate the dataset ID. This may involve searching the metadata.
  2. Open the dataset, using the open method.
  3. Reference the portion of the variable to be read.

In the next example, a portion of variable 'ua' is read from dataset 'ncep_reanalysis_mo':

dset = db.open('ncep_reanalysis_mo')

ua = dset.variables['ua']

data = ua[0,0]

Examples of database searches

In the following examples, db is the database opened with

db = cdms.connect()

This defaults to the database defined in environment variable CDMSROOT.

List all variables in dataset 'ncep_reanalysis_mo':

for entry in db.searchFilter(filter="parentid=ncep_reanalysis_mo", tag="variable"):

print entry.name

Find all axes with bounds defined:

for entry in db.searchFilter(filter="bounds=*",tag="axis"):

print entry.name

 

Locate all GDT datasets:

for entry in db.searchFilter(filter="Conventions=GDT*",tag="dataset"):

print entry.name

 

Find all variables with missing time values, in observed datasets:

def missingTime(obj):

time = obj.getTime()

return time.length != time.partition_length

 

result = db.searchFilter(filter="category=observed")

for entry in result.searchPredicate(missingTime):

print entry.name

 

Find all CMIP2 datasets having a variable with id "hfss":

for entry in db.searchFilter(filter="(&(project=CMIP2)(id=hfss))",tag="variable"):

print entry.getObject().parent.id

 

Find all observed variables on 73x144 grids:

result = db.searchFilter('category=obs*')

for entry in result.searchPredicate(lambda x: x.getGrid().shape==(73,144),tag="variable"):

print entry.name

 

Find all observed variables with more than 1000 timepoints:

result = db.searchFilter('category=obs*')

for entry in result.searchPredicate(lambda x: len(x.getTime())>1000, tag="variable"):

print entry.name, len(entry.getObject().getTime())

 

Find the total number of each type of object in the database

print len(db.searchFilter(tag="database")),"database"

print len(db.searchFilter(tag="dataset")),"datasets"

print len(db.searchFilter(tag="variable")),"variables"

print len(db.searchFilter(tag="axis")),"axes"

 

Dataset

A Dataset is a virtual file. It consists of a metafile, in CDML/XML representation, and one or more data files.

As of CDMS V3, the legacy cuDataset interface is supported by Datasets. See cu Module.

 

Dataset Internal Attributes

Type

Name

Summary

Dictionary

attributes

Dataset external attributes.

Dictionary

axes

Axes contained in the dataset.

String

datapath

Path of data files, relative to the parent database. If no parent, the datapath is absolute.

Dictionary

grids

Grids contained in the dataset.

String

mode

Open mode.

Database

parent

Database which contains this dataset. If the dataset is not part of a database, the value is None.

String

uri

Uniform Resource Identifier of this dataset.

Dictionary

variables

Variables contained in the dataset.

Dictionary

xlinks

External links contained in the dataset.

 

Dataset Constructors

 

datasetobj = cdms.open(String uri, String mode='r')

Open the dataset specified by the Universal Resource Indicator, a CDML file. Returns a Dataset object. mode is one of the indicators listed in Table 2.22.

openDataset is a synonym for open .

datasetobj = cdms.createDataset(String path, String directory, String fileTemplate)

(Note: this function is not yet implemented)

Create a new dataset, returning a Dataset object. path is the filepath of a CDML file. fileTemplate describes how the dataset is to be partitioned. It is a pathname, relative to the directory, which contains zero or more template specifiers (see Table 2.23). A template may contain directory names as well as file names. A template specifier is a string of the form '%X' or '%eX', where X is one of the characters listed Table 2.23. The form '%eX' may be used to specify the end time or level value. A specifier may appear more than once in a template.

 

Open Modes

Mode

Definition

'r'

read-only

'r+'

read-write

'a'

read-write. Open the file if it exists, otherwise create a new file

'w'

Create a new file, read-write

 

Template Specifiers

Specifier

Definition

Example

d

day number

1 .. 31

f

day number, two-digit, zero-filled

01 .. 31

g

month, lower case, three characters

'jan', 'feb', ...

G

month, upper case, three characters

'JAN', 'FEB', ...

H

hour

0 .. 23

L

vertical level

integer

m

month number, not zero filled

1 .. 12

M

minute

0 .. 59

n

month number, two-digit, zero-filled

01, 02, ..., 12

S

second

0 .. 59

v

variable ID

character

y

year, two-digit, zero-filled

integer

Y

year

integer

z

Zulu time

ex: '6Z19990201'

%

percent sign

'%'

 

For example, the file template

ccsr-a/mo/%v/ccsr-a/%v_ccsr-a_%Y.%n-%eY.%en.nc

contains the specifiers %v (variable name), %Y (year), %eY (end year), and %en (end month). One of the files in the dataset might have the path (relative to the parent directory)

ccsr-a/mo/ta/ccsr-a/ta_ccsr-a_1979.01-1979.12.nc

 

 

 

Dataset Methods

Type

Definition

TransientVariable

datasetobj (varname, selector)

Calling a Dataset object as a function reads the region of data defined by the selector. The result is a transient variable, unless raw=1 is specified. See Selectors.

For example, the following reads data for variable 'prc', year 1980:

f = cdms.open('test.xml')

x = f('prc', time=('1980-1','1981-1'))

 

Variable, Axis, or Grid

datasetobj ['id']

The square bracket operator applied to a dataset gets the persistent variable, axis or grid object having the string identifier. This does not read the data for a variable. Returns None if not found.

For example:

f = cdms.open('sample.xml')

v = f['prc']

gets the persistent variable v, equivalent to v=f.variables['prc'] .

t = f['time']

gets the axis named `time', equivalent to t=f.axes['time'] .

None

close()

Close the dataset.

RectGrid

createRectGrid(id, lat, lon, order, type="generic", mask=None)

Create a RectGrid in the dataset. This is not a persistent object: the order, type, and mask are not written to the dataset. However, the grid may be used for regridding operations.

lat is a latitude axis in the dataset.

lon is a longitude axis in the dataset.

order is a string with value "yx" (the first grid dimension is latitude) or "xy" (the first grid dimension is longitude).

type is one of 'gaussian','uniform','equalarea',or 'generic'

If specified, mask is a two-dimensional, logical Numeric array (all values are zero or one) with the same shape as the grid.

Axis

getAxis(id)

Get an axis object from the file or dataset.

id is the string axis identifier.

Grid

getGrid(id)

Get a grid object from a file or dataset.

id is the string grid identifier.

List

getPaths()

Get a sorted list of pathnames of datafiles which comprise the dataset. This does not include the XML metafile path, which is stored in the .uri attribute.

Variable

getVariable(id)

Get a variable object from a file or dataset.

id is the string variable identifier.

None

sync()

Write any pending changes to the dataset.

 

MV module

 

The fundamental CDMS data object is the variable. A variable is comprised of:

The MV module is a work-alike replacement for the MA module, that carries along the domain and attribute information where appropriate. MV provides the same set of functions as MA. However, MV functions generate transient variables as results. Often this simplifies scripts that perform computation. MA is part of the Python Numeric package, documented at http://numpy.sourceforge.net.

MV can be imported with the command:

import MV

The command

from MV import *

allows use of MV commands without any prefix.

Table 2.25 lists the constructors in MV. All functions return a transient variable. In most cases the keywords axes, attributes, and id are available. axes is a list of axis objects which specifies the domain of the variable. attributes is a dictionary. id is a special attribute string that serves as the identifier of the variable, and should not contain blanks or non-printing characters. It is used when the variable is plotted or written to a file. Since the id is just an attribute, it can also be set like any attribute:

var.id = 'temperature'

 

For completeness MV provides access to all the MA functions. The functions not listed in the following tables are identical to the corresponding MA function: allclose, allequal, common_fill_value, compress, create_mask, dot, e, fill_value, filled, get_print_limit, getmask, getmaskarray, identity, indices, innerproduct, isMA, isMaskedArray, is_mask, isarray, make_mask, make_mask_none, mask_or, masked, pi, put, putmask, rank, ravel, set_fill_value, set_print_limit, shape, size, sort. See the documentation at http://numpy.sourceforge.net for a description of these functions.

 

Variable Constructors in module MV

 

arrayrange(start, stop=None, step=1, typecode=None, axis=None, attributes=None, id=None)

Just like MA.arange() except it returns a variable whose type can be specfied by the keyword argument typecode. The axis, attribute dictionary, and string identifier of the result variable may be specified.

Synonym: arange

masked_array(a, mask=None, fill_value=None, axes=None, attributes=None, id=None)

Same as MA.masked_array but creates a variable instead. If no axes are specified, the result has default axes, otherwise axes is a list of axis objects matching a.shape.

masked_object(data, value, copy=1, savespace=0, axes=None, attributes=None, id=None)

Create variable masked where exactly data equal to value. Create the variable with the given list of axis objects, attribute dictionary, and string id.

masked_values(data, value, rtol=1e-05, atol=1e-08, copy=1, savespace=0, axes=None, attributes=None, id=None)

Constructs a variable with the given list of axes and attribute dictionary, whose mask is set at those places where

abs (data - value) < atol + rtol * abs (data).

This is a careful way of saying that those elements of the data that have value = value (to within a tolerance) are to be treated as invalid. If data is not of a floating point type, calls masked_object instead.

ones(shape, typecode='l', savespace=0, axes=None, attributes=None, id=None)

Return an array of all ones of the given length or shape.

reshape(a, newshape, axes=None, attributes=None, id=None)

Copy of a with a new shape.

resize(a, new_shape, axes=None, attributes=None, id=None)

Return a new array with the specified shape. The original array's total size can be any size.

zeros(shape, typecode='l', savespace=0, axes=None, attributes=None, id=None)

An array of all zeros of the given length or shape.

The following table describes the MV non-constructor functions. With the exception of argsort, all functions return a transient variable.

 

MV functions

Definition

argsort(x, axis=-1, fill_value=None)

Return a Numeric array of indices for sorting along a given axis.

asarray(data, typecode=None)

Same as cdms.createVariabledata, typecode, copy=0). This is a short way of ensuring that something is an instance of a variable of a given type before proceeding, as in

data = asarray(data)

Also see the variable astype() function.

average(a, axis=0, weights=None)

computes the average value of the non-masked elements of x along the se-lected axis. If weights is given, it must match the size and shape of x, and the value returned is:

sum(a*weights)/sum(weights)

In computing these sums, elements that correspond to those that are masked in x or weights are ignored.

choose(conditio