This chapter describes the CDMS Python application programming interface (API). Python is a popular public-domain, object-oriented language. Its features include support for object-oriented development, a rich set of programming constructs, and an extensible architecture. CDMS itself is implemented in a mixture of C and Python. In this chapter the assumption is made that the reader is familiar with the basic features of the Python language.
Python supports the notion of a module, which groups together associated classes and methods. The import command makes the module accessible to an application. This chapter documents the cdms module.
The chapter sections correspond to the CDMS classes. Each section contains tables describing the class internal (non-persistent) attributes, constructors (functions for creating an object), and class methods (functions). A method can return an instance of a CDMS class, or one of the Python types:
The following Python script reads January and July monthly temperature data from an input dataset, averages over time, and writes the results to an output file. The input temperature data is ordered (time, latitude, longitude).
4 jones = cdms.open('/pcmdi/cdms/obs/jones_mo.nc')
10 janavg.long_name = "mean January surface temperature"
11 julyavg = MV.average(julys)
13 julyavg.long_name = "mean July surface temperature"
14 out = cdms.open('janjuly.nc','w')
17 out.comment = "Average January/July from Jones dataset"
The cdms module is the Python interface to CDMS. The objects and methods in this chapter are made accessible with the command:
The functions described in this section are not associated with a class. Rather, they are called as module functions, e.g.,
|
Transform s into a transient variable. s is a masked array, Numeric array, or Variable. If s is already a transient variable, s is returned. |
|
|
Create an Axis, which is not associated with a file or dataset. This is useful for creating a grid which is not contained in a file or dataset. data is a one-dimensional, monotonic Numeric array. bounds is an array of shape (len(data),2), such that for all i, data[i] is in the range [bounds[i,0],bounds[i,1]]. If bounds is not specified, the default boundaries are generated at the midpoints between the consecutive data values, provided that the autobounds mode is `on' (the default). See setAutoBounds. |
|
|
Create an equal-area latitude axis. The latitude values range from north to south, and for all axis values x[i], sin(x[i])-sin(x[i+1]) is constant. |
|
|
Create a Gaussian latitude axis. Axis values range from north to south. |
|
|
createGaussianGrid(nlats, xorigin=0.0, order="yx") Create a Gaussian grid, with shape (nlats, 2*nlats). nlats is the number of latitudes. |
|
|
createGenericGrid(latArray, lonArray, latBounds=None, lonBounds=None, order="yx", mask=None) Create a generic grid, that is, a grid which is not typed as Gaussian, uniform, or equal-area. The grid is not associated with a file or dataset. latArray is a NumPy array of latitude values. lonArray is a NumPy array of longitude values latBounds is a NumPy array having shape (len(latArray),2), of latitude boundaries. lonBounds is a NumPy array having shape (len(lonArray),2), of longitude boundaries. order is a string specifying the order of the axes, either "yx" for (latitude, longitude), or "xy" for the reverse. mask (optional) is an integer-valued NumPy mask array, having the same shape and ordering as the grid. |
|
|
Generate a grid for calculating the global mean via a regridding operation. The return grid is a single zone covering the range of the input grid. |
|
|
createRectGrid(lat, lon, order, type="generic", mask=None) Create a rectilinear grid, not associated with a file or dataset. This might be used as the target grid for a regridding operation. lat is a latitude axis, created by cdms.createAxis. lon is a longitude axis, created by cdms.createAxis. order is a string with value "yx" (the first grid dimension is latitude) or "xy" (the first grid dimension is longitude). type is one of 'gaussian','uniform','equalarea',or 'generic' If specified, mask is a two-dimensional, logical Numeric array (all values are zero or one) with the same shape as the grid. |
|
|
createUniformGrid(startLat, nlat, deltaLat, startLon, nlon, deltaLon, order="yx", mask=None) Create a uniform rectilinear grid. The grid is not associated with a file or dataset. The grid boundaries are at the midpoints of the axis values. startLat is the starting latitude value. nlat is the number of latitudes. If nlat is 1, the grid latitude boundaries will be startLat +/- deltaLat/2. deltaLat is the increment between latitudes. startLon is the starting longitude value. nlon is the number of longitudes. If nlon is 1, the grid longitude boundaries will be startLon +/- deltaLon/2. deltaLon is the increment between longitudes. order is a string with value "yx" (the first grid dimension is latitude) or "xy" (the first grid dimension is longitude). If specified, mask is a two-dimensional, logical Numeric array (all values are zero or one) with the same shape as the grid. |
|
|
createUniformLatitudeAxis(startLat, nlat, deltaLat) Create a uniform latitude axis. The axis boundaries are at the midpoints of the axis values. The axis is designated as a circular latitude axis. startLat is the starting latitude value. |
|
|
Create a zonal grid. The output grid has the same latitude as the input grid, and a single longitude. This may be used to calculate zonal averages via a regridding operation. |
|
|
createUniformLongitudeAxis(startLon, nlon, deltaLon) Create a uniform longitude axis. The axis boundaries are at the midpoints of the axis values. The axis is designated as a circular longitude axis. startLon is the starting longitude value. |
|
|
createVariable(array, typecode=None, copy=0, savespace=0, mask=None, fill_value=None, grid=None, axes=None, attributes=None, id=None) This function is documented in Table 2.31. |
|
|
Return 1 if s is a variable, 0 otherwise. See also: asVariable . |
|
|
Open or create a Dataset or CdmsFile. url is a Uniform Resource Locator, referring to a cdunif or XML file. If the URL has the extension '.xml' or '.cdml', a Dataset is returned, otherwise a CdmsFile is returned. If the URL protocol is 'http', the file must be a '.xml' or '.cdml' file, and the mode must be 'r'. If the protocal is 'file' or is omitted, a local file or dataset is opened. mode is the open mode. See Table 2.22. Example: Open an existing dataset: f = cdms.open("sampleset.xml") |
|
|
order2index (axes, orderstring) Find the index permutation of axes to match order. Return a list of indices |
|
|
If mode is 'on' (the default), the getBounds method will automatically generate boundary information for an axis or grid, if the boundaries are not explicitly defined. If mode is 'off', and no boundary data is explicitly defined, the bounds will NOT be generated; the getBounds method will return None for the boundaries. |
|
|
Set the grid classification mode. This affects how grid type is determined, for the purpose of generating grid boundaries. If mode is 'on' (the default), grid type is determined by a grid classification method, regardless of the value of grid.getType(). If mode is 'off', the value of grid.getType() determines the grid type |
A CdmsObj is the base class for all CDMS database objects. At the application level, CdmsObj objects are never created and used directly. Rather the subclasses of CdmsObj (Dataset, Variable, Axis, etc.) are the basis of user application programming.
All objects derived from CdmsObj have a special attribute .attributes. This is a Python dictionary, which contains all the external (persistent) attributes associated with the object. This is in contrast to the internal, non-persistent attributes of an object, which are built-in and predefined. When a CDMS object is written to a file, the external attributes are written, but not the internal attributes.
Example: get a list of all external attributes of obj.
extatts = obj.attributes.keys()
All attributes may be accessed and set using the Python dot notation (`.')
An Axis is a one-dimensional coordinate object.
An Axis is contained in a Dataset. Setting a slice of an Axis writes data to the Dataset, referencing an Axis slice reads data from the Dataset. Axis objects are also used to define the domain of a Variable.
An axis in a CdmsFile may be designated the `unlimited' axis, meaning that it can be extended in length after the initial definition. There can be at most one unlimited axis associated with a CdmsFile.
For an axis in a dataset, the .partition attribute describes how an axis is split across files. It is a list of the start and end indices of each axis partition.
For example, See Partitioned axis. shows a time axis, representing the 36 months, January 1980 through December 1982, with December 1981 missing. The first partition interval is (0,12), the second is (12,23), and the third is (24,36), where the interval (i,j) represents all indices k such that i <= k < j. The .partition attribute for this axis would be the list:
Note that the end index of the second interval is strictly less than the start index of the following interval. This indicates that data for that period is missing.
|
cdms.createAxis(data, bounds=None) Create an axis which is not associated with a dataset or file. See Table 2.2. |
|
Create an Axis in a Dataset. (This function is not yet implemented. ) |
|
CdmsFile.createAxis(name,ar,unlimited=0) name is the string name of the Axis. ar is a 1-D data array which defines the Axis values. It may have the value None if an unlimited axis is being defined. At most one Axis in a CdmsFile may be designated as being 'unlimited', meaning that it may be extended in length. To define an axis as unlimited, either: |
|
cdms.createEqualAreaAxis(nlat) See Table 2.2. |
|
Read a slice of data from the external dataset. Data is returned in the physical ordering defined in the dataset. See Table 2.9 for a description of slice operators. |
|
|
Write a slice of data to the external dataset. (axes in CdmsFiles only) |
|
|
asComponentTime(calendar=None) Array version of cdtime tocomp. Returns a list of component times. |
|
|
Array version of cdtime torel. Returns a list of relative times. |
|
|
Return a copy of the axis, as a transient axis. If copyData is 1 (the default) the data itself is copied. |
|
|
designateCircular(modulo, persistent=0) Designate the axis to be circular. modulo is the modulus value. Any given axis value x is treated as equivalent to x+modulus If persistent is true, the external file or dataset (if any) is modified. By default, the designation is temporary. |
|
|
designateLatitude(persistent=0): Designate the axis to be a latitude axis. If persistent is true, the external file or dataset (if any) is modified. By default, the designation is temporary. |
|
|
Designate the axis to be a vertical level axis. If persistent is true, the external file or dataset (if any) is modified. By default, the designation is temporary. |
|
|
designateLongitude(persistent=0, modulo=360.0) Designate the axis to be a longitude axis. modulo is the modulus value. Any given axis value x is treated as equivalent to x+modulus If persistent is true, the external file or dataset (if any) is modified. By default, the designation is temporary. |
|
|
designateTime(persistent=0, calendar = cdtime.MixedCalendar) Designate the axis to be a time axis. If persistent is true, the external file or dataset (if any) is modified. By default, the designation is temporary. |
|
|
Get the associated boundary array. The boundary array has shape (n,2), where n is the length of the axis. If a boundary array is not explicitly defined and autoBounds mode is on, a default array is generated by calling genGenericBounds. Otherwise if autoBounds mode is off, the return value is None. See setAutoBoundsMode . |
|
|
Returns the calendar associated with the (time) axis. Possible return values, as defined in the cdtime module, are:
Note: If the axis is not a time axis, the global, file-related calendar is returned. |
|
|
Returns true if the axis has circular topology. |
|
|
Same as mapIntervalExt, but returns only the tuple (i,j), and wraparound is restricted to one cycle. |
|
|
Map a coordinate interval to an index interval. interval is a tuple having one of the forms:
(x,y) where x and y are coordinates indicating the interval [x,y), and: indicator is a two or three-character string, where the first character is 'c' if the interval is closed on the left, 'o' if open, and the second character has the same meaning for the right-hand point. If present, the third character specifies how the interval should be intersected with the axis:
The default indicator is 'ccn' , that is, the interval is closed, and nodes in the interval are selected. If cycle is specified, the axis is treated as circular with the given cycle value. By default, if axis.isCircular() is true, the axis is treated as circular with a default modulus of 360.0. An interval of None or ':' returns the full index interval of the axis. |
|
|
The method returns the corresponding index interval as a 3-tuple (i,j,k), where k is the integer stride, and [i,j) is the half-open index interval i<=k<j (i>=k>j if k<0), or None if the intersection is empty. For an axis which is circular (axis.topology == 'circular'), [i,j) is interpreted as follows (where N = len(axis)): |
|
|
setCalendar(calendar, persistent=1) Set the calendar for this (time) axis. calendar is defined as in getCalendar(). If persistent is true, the external file or dataset (if any) is modified. This is the default. |
|
|
Create an axis associated with the integer range [i:j:k]. The stride k can be positive or negative. Wraparound is supported for longitude dimensions or those with a modulus attribute. |
|
|
Every kth element, starting at i, through but not including j |
|
Example: A longitude axis has value [0.0, 2.0, ..., 358.0], of length 180. Map the coordinate interval -5.0 <= x < 5.0 to index interval(s), with wraparound. The result index interval -2<=n<3 wraps around, since -2<0, and has a stride of 1. This is equivalent to the two contiguous index intervals -2<=n<0 and 0<=n<3
A CdmsFile is a physical file, accessible via the cdunif interface. netCDF files are accessible in read-write mode. All other formats (DRS, HDF, GrADS/GRIB, POP, QL) are accessible read-only.
As of CDMS V3, the legacy cuDataset interface is also supported by CdmsFiles. See cu Module.
|
fileobj = cdms.open(path, mode) Open the file specified by path returning a CdmsFile object. path is the file pathname, a string. mode is the open mode indicator, as listed in Table 2.22. |
|
Calling a CdmsFile object as a function reads the region of data specified by the selector. The result is a transient variable, unless raw=1 is specified. See Selectors. For example, the following reads data for variable 'prc', year 1980: |
|
|
Get the persistent variable, axis or grid object having the string identifier. This does not read the data for a variable. gets the persistent variable v, equivalent to v=f.variables['prc'] . gets the axis named 'time', equivalent to t=f.axes['time'] . |
|
|
Copy axis values and attributes to a new axis in the file. The returned object is persistent: it can be used to write axis data to or read axis data from the file. If an axis already exists in the file, having the same name and coordinate values, it is returned. It is an error if an axis of the same name exists, but with different coordinate values. axis is the axis object to be copied. newname, if specified, is the string identifier of the new axis object. If not specified, the identifier of the input axis is used. |
|
|
Copy grid values and attributes to a new grid in the file. The returned grid is persistent. If a grid already exists in the file, having the same name and axes, it is returned. An error is raised if a grid of the same name exists, having different axes. grid is the grid object to be copied. newname, if specified is the string identifier of the new grid object. If unspecified, the identifier of the input grid is used. |
|
|
createAxis(id, ar, unlimited=0) Create a new Axis. This is a persistent object which can be used to read or write axis data to the file. id is an alphanumeric string identifier, containing no blanks. ar is the one-dimensional axis array. Set unlimited to cdms.Unlimited to indicate that the axis is extensible. |
|
|
createRectGrid(id, lat, lon, order, type="generic", mask=None) Create a RectGrid in the file. This is not a persistent object: the order, type, and mask are not written to the file. However, the grid may be used for regridding operations. lat is a latitude axis in the file. lon is a longitude axis in the file. order is a string with value "yx" (the first grid dimension is latitude) or "xy" (the first grid dimension is longitude). type is one of 'gaussian','uniform','equalarea',or 'generic' If specified, mask is a two-dimensional, logical Numeric array (all values are zero or one) with the same shape as the grid. |
|
|
createVariable(String id, String datatype,List axes, fill_value=None) Create a new Variable. This is a persistent object which can be used to read or write variable data to the file. id is a String name which is unique with respect to all other objects in the file. datatype is an MA typecode, e.g., MA.Float, MA.Int. |
|
|
createVariableCopy(var, newname=None) Create a new Variable, with the same name, axes, and attributes as the input variable. An error is raised if a variable of the same name exists in the file. var is the Variable to be copied. newname, if specified is the name of the new variable. If unspecified, the returned variable has the same name as var. Note: Unlike copyAxis, the actual data is not copied to the new variable. |
|
|
write(var, attributes=None, axes=None, extbounds=None, id=None, extend=None, fill_value=None, index=None, typecode=None) Write a variable or array to the file. The return value is the associated file variable. If the variable does not exist in the file, it is first defined and all attributes written, then the data is written. By default, the time dimension of the variable is defined as the 'unlimited' dimension of the file. If the data is already defined, then data is extended or overwritten depending on the value of keywords extend and index, and the unlimited dimension values associated with var. var is a Variable, masked array, or Numeric array. attributes is the attribute dictionary for the variable. The default is var.attributes. axes is the list of file axes comprising the domain of the variable. The default is to copy var.getAxisList(). extbounds is the unlimited dimension bounds. Defaults to var.getAxis(0).getBounds() id is the variable name in the file. Default is var.id. extend=1 causes the first dimension to be 'unlimited': iteratively writeable. The default is None, in which case the first dimension is extensible if it is time.Set to 0 to turn off this behaviour. fill_value is the missing value flag. index is the extended dimension index to write to. The default index is determined by lookup relative to the existing extended dimension. Note: data can also be written by setting a slice of a file variable, and attributes can be written by setting an attribute of a file variable. |
A Database is a collection of datasets and other CDMS objects. It consists of a hierarchical collection of objects, with the database being at the root, or top of the hierarchy. A database is used to:
The figure below illustrates several important points:
"variable=ua, dataset=ncep_reanalysis_mo,database=CDMS".
db = cdms.connect("ldap://dbhost.llnl.gov/database=CDMS,ou=PCMDI,o=LLNL,c=US")
The database searchFilter method searches the database. A single database connection may be used for an arbitrary number of searches.
For example, to find all observed datasets:
result = db.searchFilter("category=observed",tag="dataset")
Searches can be restricted to a subhierarchy of the database. This example searches just the dataset `ncep_reanalysis_mo':
result = db.searchFilter(relbase="dataset=ncep_reanalysis")
for entry in result:
print entry.name, entry.attributes
If the id of a dataset is known, the dataset can be opened directly with the open method:
dset = db.open("ncep_reanalysis_mo")
The searchFilter method is used to search a database. The result is called a search result, and consists of a sequence of result entries.
In its simplest form, searchFilter takes an argument consisting of a string filter. The search returns a sequence of entries, corresponding to those objects having an attribute which matches the filter. Simple filters have the form (tag = value), where value can contain wildcards. For example:
Simple filters can be combined with the logical operators `&', `|', `!'. For example,
'(&(id = bmrc*)(project = AMIP2))'
matches all objects with id starting with 'bmrc', and a 'project' attribute with value 'AMIP2'.
Formally, search filters are strings defined as follows:
filtercomp ::= "&" filterlist | # and
filterlist ::= filter | filter filterlist
"<=" | # lexicographically less than or equal to
">=" # lexicographically greater than or equal to
value ::= string attribute value, may include '*' as a wild card
Attribute names are defined in the chapter on Climate Data Markup Language (CDML). In addition, some special attributes are defined for convenience:
The set of objects searched is called the search scope. The top object in the hierarchy is the base object. By default, all objects in the database are searched, that is, the database is the base object. If the database is very large, this may result in an unnecessarily slow or inefficient search. To remedy this the search scope can be limited in several ways:
A search result is accessed sequentially within a for loop:
result = db.searchFilter('(&(category=obs*)(id=ncep*))')
for entry in result:
print entry.name
Search results can be narrowed using searchPredicate. In the following example, the result of one search is itself searched for all variables defined on a 94x192 grid:
>>> result = db.searchFilter('parentid=ncep*',tag="variable")
>>> result2 = result.searchPredicate(lambda x: x.getGrid().shape==(94,192))
>>> for entry in result2: print entry.name
variable=rluscs,dataset=ncep_reanalysis_mo,database=CDMS,ou=PCMDI, o=LLNL, c=US
variable=rlds,dataset=ncep_reanalysis_mo,database=CDMS,ou=PCMDI, o=LLNL, c=US
variable=rlus,dataset=ncep_reanalysis_mo,database=CDMS,ou=PCMDI, o=LLNL, c=US
A search result is a sequence of result entries. Each entry has a string name, the name of the object in the database hierarchy, and an attribute dictionary. An entry corresponds to an object found by the search, but differs from the object, in that only the attributes requested are associated with the entry. In general, there will be much more information defined for the associated CDMS object, which is retrieved with the getObject method.
|
The attributes returned from the search. attributes[key] is a list of all string values associated with the key. |
In the next example, a portion of variable 'ua' is read from dataset 'ncep_reanalysis_mo':
In the following examples, db is the database opened with
This defaults to the database defined in environment variable CDMSROOT.
List all variables in dataset 'ncep_reanalysis_mo':
for entry in db.searchFilter(filter="parentid=ncep_reanalysis_mo", tag="variable"):
Find all axes with bounds defined:
for entry in db.searchFilter(filter="bounds=*",tag="axis"):
for entry in db.searchFilter(filter="Conventions=GDT*",tag="dataset"):
Find all variables with missing time values, in observed datasets:
return time.length != time.partition_length
result = db.searchFilter(filter="category=observed")
for entry in result.searchPredicate(missingTime):
Find all CMIP2 datasets having a variable with id "hfss":
for entry in db.searchFilter(filter="(&(project=CMIP2)(id=hfss))",tag="variable"):
print entry.getObject().parent.id
Find all observed variables on 73x144 grids:
result = db.searchFilter('category=obs*')
for entry in result.searchPredicate(lambda x: x.getGrid().shape==(73,144),tag="variable"):
Find all observed variables with more than 1000 timepoints:
result = db.searchFilter('category=obs*')
for entry in result.searchPredicate(lambda x: len(x.getTime())>1000, tag="variable"):
print entry.name, len(entry.getObject().getTime())
Find the total number of each type of object in the database
print len(db.searchFilter(tag="database")),"database"
print len(db.searchFilter(tag="dataset")),"datasets"
print len(db.searchFilter(tag="variable")),"variables"
A Dataset is a virtual file. It consists of a metafile, in CDML/XML representation, and one or more data files.
As of CDMS V3, the legacy cuDataset interface is supported by Datasets. See cu Module.
|
datasetobj = cdms.open(String uri, String mode='r') Open the dataset specified by the Universal Resource Indicator, a CDML file. Returns a Dataset object. mode is one of the indicators listed in Table 2.22. |
|
datasetobj = cdms.createDataset(String path, String directory, String fileTemplate) (Note: this function is not yet implemented) Create a new dataset, returning a Dataset object. path is the filepath of a CDML file. fileTemplate describes how the dataset is to be partitioned. It is a pathname, relative to the directory, which contains zero or more template specifiers (see Table 2.23). A template may contain directory names as well as file names. A template specifier is a string of the form '%X' or '%eX', where X is one of the characters listed Table 2.23. The form '%eX' may be used to specify the end time or level value. A specifier may appear more than once in a template. |
|
read-write. Open the file if it exists, otherwise create a new file |
|
For example, the file template
ccsr-a/mo/%v/ccsr-a/%v_ccsr-a_%Y.%n-%eY.%en.nc
contains the specifiers %v (variable name), %Y (year), %eY (end year), and %en (end month). One of the files in the dataset might have the path (relative to the parent directory)
ccsr-a/mo/ta/ccsr-a/ta_ccsr-a_1979.01-1979.12.nc
|
datasetobj (varname, selector) Calling a Dataset object as a function reads the region of data defined by the selector. The result is a transient variable, unless raw=1 is specified. See Selectors. For example, the following reads data for variable 'prc', year 1980: |
|
|
The square bracket operator applied to a dataset gets the persistent variable, axis or grid object having the string identifier. This does not read the data for a variable. Returns None if not found. gets the persistent variable v, equivalent to v=f.variables['prc'] . gets the axis named `time', equivalent to t=f.axes['time'] . |
|
|
createRectGrid(id, lat, lon, order, type="generic", mask=None) Create a RectGrid in the dataset. This is not a persistent object: the order, type, and mask are not written to the dataset. However, the grid may be used for regridding operations. lat is a latitude axis in the dataset. lon is a longitude axis in the dataset. order is a string with value "yx" (the first grid dimension is latitude) or "xy" (the first grid dimension is longitude). type is one of 'gaussian','uniform','equalarea',or 'generic' If specified, mask is a two-dimensional, logical Numeric array (all values are zero or one) with the same shape as the grid. |
|
|
Get a sorted list of pathnames of datafiles which comprise the dataset. This does not include the XML metafile path, which is stored in the .uri attribute. |
|
The fundamental CDMS data object is the variable. A variable is comprised of:
The MV module is a work-alike replacement for the MA module, that carries along the domain and attribute information where appropriate. MV provides the same set of functions as MA. However, MV functions generate transient variables as results. Often this simplifies scripts that perform computation. MA is part of the Python Numeric package, documented at http://numpy.sourceforge.net.
MV can be imported with the command:
allows use of MV commands without any prefix.
Table 2.25 lists the constructors in MV. All functions return a transient variable. In most cases the keywords axes, attributes, and id are available. axes is a list of axis objects which specifies the domain of the variable. attributes is a dictionary. id is a special attribute string that serves as the identifier of the variable, and should not contain blanks or non-printing characters. It is used when the variable is plotted or written to a file. Since the id is just an attribute, it can also be set like any attribute:
For completeness MV provides access to all the MA functions. The functions not listed in the following tables are identical to the corresponding MA function: allclose, allequal, common_fill_value, compress, create_mask, dot, e, fill_value, filled, get_print_limit, getmask, getmaskarray, identity, indices, innerproduct, isMA, isMaskedArray, is_mask, isarray, make_mask, make_mask_none, mask_or, masked, pi, put, putmask, rank, ravel, set_fill_value, set_print_limit, shape, size, sort. See the documentation at http://numpy.sourceforge.net for a description of these functions.
The following table describes the MV non-constructor functions. With the exception of argsort, all functions return a transient variable.
A RectGrid is a two-dimensional, horizontal, rectilinear grid. A rectGrid can be defined in terms of a pair of axes, one longitude and one latitude. A two-dimensional, logical mask array may optionally be associated with a rectGrid.
|
cdms.createRectGrid(lat, lon, order, type="generic", mask=None) Create a grid not associated with a file or dataset. See Table 2.2. |
|
CdmsFile.createRectGrid(id, lat, lon, order, type="generic", mask=None) Create a grid associated with a file. See Table 2.12. |
|
Dataset.createRectGrid(id, lat, lon, order, type="generic", mask=None) Create a grid associated with a dataset. See Table 2.24. |
|
cdms.createGaussianGrid(nlats, xorigin=0.0, order="yx") See Table 2.2. |
|
cdms.createGenericGrid(latArray, lonArray, latBounds=None, lonBounds=None, order="yx", mask=None) |
|
cdms.createRectGrid(lat, lon, order, type="generic", mask=None) |
|
cdms.createUniformGrid(startLat, nlat, deltaLat, startLon, nlon, deltaLon, order="yx", mask=None) |
A Variable is a multidimensional data object, consisting of:
A Variable which is contained in a Dataset or CdmsFile is called a persistent variable. Setting a slice of a persistent Variable writes data to the Dataset or file, and referencing a Variable slice reads data from the Dataset. Variables may also be transient , not associated with a Dataset or CdmsFile.
Variables support arithmetic operations. The basic Python operators are +,-,*,/,**, abs, and sqrt, together with the operations defined in the MV module. The result of an arithmetic operation is a transient variable, that is, the axis information is transferred to the result.
The methods subRegion and subSlice return transient variables. In ddition, a transient variable may be created with the cdms.createVariable method. The vcs and regrid module methods take advantage of the attribute, domain, and mask information in a transient variable.
The cu module defines the original CDAT I/O interface. It is maintained for backward compatibility. As of CDMS V3, CDMS variables support the cu Slab interface defined in cu Module.
|
The name of the variable in the file or files which represent the dataset. If different from id, the variable is `aliased'. |
||
|
Read a slice of data from the file or dataset, resulting in a transient variable. Singleton dimensions are `squeezed' out. Data is returned in the physical ordering defined in the dataset. The forms of the slice operator are listed in Table 2.33. |
|
|
Write a slice of data to the external dataset. The forms of the slice operator are listed in Table 2.21 on page 32. (Variables in CdmsFiles only) |
|
|
Calling a variable as a function reads the region of data defined by the selector . The result is a transient variable, unless raw=1 keyword is specified. `See Selectors. |
|
|
Write the entire data array. Equivalent to var[:] = ar. (Variables in CdmsFiles only). |
|
|
Cast the variable to a new datatype. Typecodes are as for MV, MA, and Numeric modules. |
|
|
Return a copy of a transient variable. If copyData is 1 (the default) the variable data is copied as well. If copyData is 0, the result transient variable shares the original transient variable's data array. |
|
|
crossSectionRegrid(newLevel, newLatitude, method="log", missing=None, order=None) Return a lat/level vertical cross-section regridded to a new set of latitudes newLatitude and levels newLevel. The variable should be a function of latitude, level, and (optionally) time. newLevel is an axis of the result pressure levels. newLatitude is an axis of the result latitudes. method is optional, either " log " to interpolate in the log of pressure (default), or " linear " for linear interpolation. missing is a missing data value. The default is var.getMissing() order is an order string such as "tzy" or "zy". The default is var.getOrder() |
|
|
Return the index of the axis specificed by axis_spec. Return -1 if no match. |
|
|
getAxisList(axes=None, omit=None, order=None) Get an ordered list of axis objects in the domain of the variable.. If axes is not None, include only certain axes. Otherwise axes is a list of specifications as described below. Axes are returned in the order specified unless the order keyword is given. If omit is not None, omit those specified by an integer dimension number. Otherwise omit is a list of specifications as described below. order is an optional string determining the output order. Specifications for the axes or omit keywords are a list, each element having one of the following forms:
order can be a string containing the characters t,x,y,z , or - . If a dash ('-') is given, any elements of the result not chosen otherwise are filled in from left to right with remaining candidates. |
|
|
getAxisListIndex(axes=None, omit=None, order=None) Return a list of indices of axis objects. Arguments are as for getAxisList. |
|
|
Get the domain. Each element of the list is itself a tuple of the form (axis,start,length,true_length) where axis is an axis object, start is the start index of the domain relative to the axis object, length is the length of the axis, and true_length is the actual number of (defined) points in the domain. |
|
|
Return the associated grid, or None if the variable is not gridded. |
|
|
Get the order string of a spatio-temporal variable. The order string specifies the physical ordering of the data. It is a string of characters with length equal to the rank of the variable, indicating the order of the variable's time, level, latitude, and/or longitude axes. Each character is one of:
't': time Example: A variable with ordering "tzyx" is 4-dimensional, where the ordering of axes is (time, level, latitude, longitude). Note: The order string is of the form required for the order argument of a regridder function. |
|
|
Get the file paths associated with the index region specified by intervals. intervals is a list of scalars, 2-tuples representing [i,j), slices, and/or Ellipses. If no argument(s) are present, all file paths associated with the variable are returned. Returns a list of tuples of the form (path,slicetuple), where path is the path of a file, and slicetuple is itself a tuple of slices, of the same length as the rank of the variable, representing the portion of the variable in the file corresponding to intervals. |
|
|
Get the file template associated with this variable. If no template is associated with the variable, the dataset template is returned. |
|
|
The length of the first dimension of the variable. If the variable is zero-dimensional (scalar), a length of 0 is returned. |
|
|
pressureRegrid (newLevel, method="log", missing=None, order=None) Return the variable regridded to a new set of pressure levels newLevel. The variable must be a function of latitude, longitude, pressure level, and (optionally) time. newLevel is an axis of the result pressure levels. method is optional, either " log " to interpolate in the log of pressure (default), or " linear " for linear interpolation. missing is a missing data value. The default is var.getMissing() order is an order string such as "tzyx" or "zyx". The default is var.getOrder() |
|
|
regrid (togrid, missing=None, order=None, mask=None) Return the variable regridded to the horizontal grid togrid. missing is a Float specifying the missing data value. The default is 1.0e20. order is a string indicating the order of dimensions of the array. It has the form returned from variable.getOrder(). For example, the string "tzyx" indicates that the dimension order of array is (time, level, latitude, longitude). If unspecified, the function assumes that the last two dimensions of array match the input grid.
mask is a Numeric array, of datatype Integer or Float, consisting of ones and zeros. A value of 0 or 0.0 indicates that the corresponding data value is to be ignored for purposes of regridding. If mask is two-dimensional of the same shape as the input grid, it overrides the mask of the input grid. If the mask has more than two dimensions, it must have the same shape as array. In this case, the missing data value is also ignored. Such an n-dimensional mask is useful if the pattern of missing data varies with level (e.g., ocean data) or time. |
|
|
Set all axes of the variable. axislist is a list of axis objects. |
|
|
subRegion(*region, time=None, level=None, latitude=None, longitude=None, squeeze=0, raw=0) Read a coordinate region of data, returning a transient variable. A region is a hyperrectangle in coordinate space. region is an argument list, each item of which specifies an interval of a coordinate axis. The intervals are listed in the order of the variable axes. If trailing dimensions are omitted, all values of those dimensions are retrieved. If an axis is circular (axis.isCircular() is true) or cycle is specified (see below), then data will be read with wraparound in that dimension. Only one axis may be read with wraparound. A coordinate interval has one of the forms listed in Table 2.34. Also see axis.mapIntervalExt. The optional keyword arguments time, level, latitude , and longitude may also be used to specify the dimension for which the interval applies. This is particularly useful if the order of dimensions is not known in advance. An exception is raised if a keyword argument conflicts with a positional region argument. The optional keyword argument squeeze determines whether or not the shape of the returned array contains dimensions whose length is 1; by default this argument is 0, and such dimensions are not 'squeezed out'. The optional keyword argument raw specifies whether the return object is a variable or a masked array. By default, a transient variable is returned, having the axes and attributes corresponding to the region read. If raw =1, an MA masked array is returned, equivalent to the transient variable without the axis and attribute information. |
|
|
subSlice(*specs, time=None, level=None, latitude=None, longitude=None, squeeze=0, raw=0) Read a slice of data, returning a transient variable. This is a functional form of the slice operator [] with the squeeze option turned off. specs is an argument list, each element of which specifies a slice of the corresponding dimension. There can be zero or more positional arguments, each of the form: (a) a single integer n, meaning slice(n, n+1) (b) an instance of the slice class (c) a tuple, which will be used as arguments to create a slice (d) ':', which means a slice covering that entire dimension (e) Ellipsis (...), which means to fill the slice list with ':' leaving only enough room at the end for the remaining positional arguments (f) a Python slice object, of the form slice(i,j,k) If there are fewer slices than corresponding dimensions, all values of the trailing dimensions are read. The keyword arguments are defined as in subRegion . There must be no conflict between the positional arguments and the keywords. In (a)-(c) and (f), negative numbers are treated as offsets from the end of that dimension, as in normal Python indexing. |
|
Example: Get a region of data.
Variable ta is a function of (time, latitude, longitude). Read data corresponding to all times, latitudes -45.0 up to but not including +45.0, longitudes 0.0 through and including longitude 180.0:
data = ta.subRegion(':', (-45.0,45.0,'co'), (0.0, 180.0))
data = ta.subRegion(latitude=(-45.0,45.0,'co'), longitude=(0.0, 180.0)
Read all data for March, 1980:
data = ta.subRegion(time=('1980-3','1980-4','co'))
A selector is a specification of a region of data to be selected from a variable. For example, the statement
x = v(time='1979-1-1', level=(1000.0,100.0))
means `select the values of variable v for time '1979-1-1' and levels 1000.0 to 100.0 inclusive, setting x to the result.' Selectors are generally used to represent regions of space and time.
The form for using a selector is
where v is a variable and s is the selector. An equivalent form is
where f is a file or dataset, and `varid' is the string ID of a variable.
A selector consists of a list of selector components . For example, the selector
time='1979-1-1', level=(1000.0,100.0)
has two components: time='1979-1-1' , and level=(1000.0,100.0) . This illustrates that selector components can be defined with keywords, using the form:
Note that for the keywords t ime, level, latitude , and longitude , the selector can be used with any variable. If the corresponding axis is not found, the selector component is ignored. This is very useful for writing general purpose scripts. The required keyword overrides this behavior. These keywords take values that are coordinate ranges or index ranges as defined in Table 2.34.
The following keywords are available:
|
Restrict the axis with ID axisid to a value or range of values. |
See Table 2.34 |
|
|
Restrict latitude values to a value or range. Short form: lat |
See Table 2.34 |
|
|
Restrict vertical levels to a value or range. Short form: lev |
See Table 2.34 |
|
|
Restrict longitude values to a value or range. Short form: lon |
See Table 2.34 |
|
|
Return a masked array (MA.array) rather than a transient variable. |
0: return a transient variable (default); =1: return a masked array. |
|
|
0: leave singleton dimensions (default); 1: remove singleton dimensions. |
||
|
See Table 2.34 |
Another form of selector components is the positional form, where the component order corresponds to the axis order of a variable. For example:
x9 = hus(('1979-1-1','1979-2-1'),1000.0)
reads data for the range ('1979-1-1','1979-2-1') of the first axis, and coordinate value 1000.0 of the second axis. Non-keyword arguments of the form(s) listed in Table 2.34 are treated as positional. Such selectors are more concise, but not as general or flexible as the other types described in this section.
Selectors are objects in their own right. This means that a selector can be defined and reused, independent of a particular variable. Selectors are constructed using the cdms.selectors.Selector class. The constructor takes an argument list of selector components. For example:
from cdms.selectors import Selector
sel = Selector(time=('1979-1-1','1979-2-1'), level=1000.)
For convenience CDMS provides several predefined selectors, which can be used directly or can be combined into more complex selectors. The selectors time, level, latitude, longitude , and required are equivalent to their keyword counterparts. For example:
x = hus(time('1979-1-1','1979-2-1'), level(1000.))
x = hus(time=('1979-1-1','1979-2-1'), level=1000.)
are equivalent. Additionally, the predefined selectors latitudeslice, longitudeslice, levelslice , and timeslice take arguments (startindex, stopindex[, stride]):
from cdms import timeslice, levelslice
x = v(timeslice(0,2), levelslice(16,17))
Finally, a collection of selectors is defined in module cdutil.region :
NH=NorthernHemisphere=domain(latitude=(0.,90.)
SH=SouthernHemisphere=domain(latitude=(-90.,0.))
Tropics=domain(latitude=(-23.4,23.4))
NPZ=AZ=ArcticZone=domain(latitude=(66.6,90.))
SPZ=AAZ=AntarcticZone=domain(latitude=(-90.,-66.6))
Selectors can be combined using the & operator, or by refining them in the call:
from cdms.selectors import Selector
CDMS provides a variety of ways to select or slice data. In the following examples, variable `hus' is contained in file sample.nc, and is a function of (time, level, latitude, longitude). Time values are monthly starting at 1979-1-1. There are 17 levels, the last level being 1000.0. The name of the vertical level axis is `plev'. All the examples select the first two times and the last level. The last two examples remove the singleton level dimension from the result array.
x = hus(time=('1979-1-1','1979-2-1'), level=1000.)
# Interval indicator (see mapIntervalExt)
x = hus(time=('1979-1-1','1979-3-1','co'), level=1000.)
x = hus(time=('1979-1-1','1979-2-1'), plev=1000.)
x9 = hus(('1979-1-1','1979-2-1'),1000.0)
x = hus(time('1979-1-1','1979-2-1'), level(1000.))
from cdms import timeslice, levelslice
x = hus(timeslice(0,2), levelslice(16,17))
x = f('hus', time=('1979-1-1','1979-2-1'), level=1000.)
x = hus(time=slice(0,2), level=slice(16,17))
from cdms.selectors import Selector
sel = Selector(time=('1979-1-1','1979-2-1'), level=1000.)
sel2 = Selector(time=('1979-1-1','1979-2-1'))
# Squeeze singleton dimension (level)
x = hus(time=('1979-1-1','1979-2-1'), level=1000., squeeze=1)
In this example, two datasets are opened, containing surface air temperature (`tas') and upper-air temperature (`ta') respectively. Surface air temperature is a function of (time, latitude, longitude). Upper-air temperature is a function of (time, level, latitude, longitude). Time is assumed to have a relative representation in the datasets (e.g., with units "months since basetime").
Data is extracted from both datasets for January of the first input year through December of the second input year. For each time and level, three quantities are calculated: slope, variance, and correlation. The results are written to a netCDF file. For brevity, the functions corrCoefSlope and removeSeasonalCycle are omitted.
# Calculate variance, slope, and correlation of
# surface air temperature with upper air temperature
# by level, and save to a netCDF file. 'pathTa' is the location of
# the file containing ta, 'pathTas' is the file with contains tas.
# Data is extracted from January of year1 through December of year2.
def ccSlopeVarianceBySeasonFiltNet(pathTa,pathTas,month1,month2):
# Open the files for ta and tas
# Get the surface temperature for the closed interval [time1,time2]
tas = ftas('tas', time=(month1,month2,'cc'))
newaxes = taObj.getAxisList(omit='time')
newshape = tuple([len(a) for a in newaxes])
cc = MV.zeros(newshape, typecode=MV.Float, axes=newaxes, id='correlation')
b = MV.zeros(newshape, typecode=MV.Float, axes=newaxes, id='slope')
v = MV.zeros(newshape, typecode=MV.Float, axes=newaxes, id='variance')
# Remove seasonal cycle from surface air temperature
tas = removeSeasonalCycle(tas)
# For each level of air temperature, remove seasonal cycle
# from upper air temperature, and calculate statistics
ta = taObj.subRegion(time=(month1,month2,'cc'), \
level=slice(ilev, ilev+1), squeeze=1)
cc[ilev], b[ilev] = corrCoefSlope(tas ,ta)
v[ilev] = MV.sum( ta**2 )/(1.0*ta.shape[0])
# Write slope, correlation, and variance variables
f = cdms.open('CC_B_V_ALL.nc','w')
pathTa = '/pcmdi/cdms/sample/ccmSample_ta.xml'
pathTas = '/pcmdi/cdms/sample/ccmSample_tas.xml'
ccSlopeVarianceBySeasonFiltNet(pathTa,pathTas,'80-1','81-12')
In the next example, the pointwise variance of a variable over time is calculated, for all times in a dataset. The name of the dataset and variable are entered, then the variance is calculated and plotted via the vcs module.
# Calculates gridpoint total variance
# Wait for return in an interactive window
print 'Hit return to continue: ',
# Calculate pointwise variance of variable over time
# Returns the variance and the number of points
# for which the data is defined, for each grid point
# Check that the first axis is a time axis
raise 'First axis is not time, variable:', x.id
variance = (n*sumxx - (sumx * sumx))/(n * (n-1.))
print 'Enter dataset path [/pcmdi/cdms/obs/erbs_mo.xml]: ',
path = string.strip(sys.stdin.readline())
if path=='': path='/pcmdi/cdms/obs/erbs_mo.xml'
# Select a variable from the dataset
print 'Variables in file:',path
varnames = dataset.variables.keys()
var = dataset.variables[varname]
print '%-10s: %s'%(varname,long_name)
varname = string.strip(sys.stdin.readline())
# Calculate variance, count, and set attributes
variance.id = 'variance_%s'%var.id
variance.units = '(%s)^2'%var.units
The result of running this script is as follows:
Enter dataset path [/pcmdi/cdms/sample/obs/erbs_mo.xml]:
Variables in file: /pcmdi/cdms/sample/obs/erbs_mo.xml
albtcs : Albedo TOA clear sky [%]
rlcrft : LW Cloud Radiation Forcing TOA [W/m^2]
rlut : LW radiation TOA (OLR) [W/m^2]
rlutcs : LW radiation upward TOA clear sky [W/m^2]
rscrft : SW Cloud Radiation Forcing TOA [W/m^2]
rsdt : SW radiation downward TOA [W/m^2]
rsut : SW radiation upward TOA [W/m^2]
rsutcs : SW radiation upward TOA clear sky [W/m^2]
<The number of points is plotted>