Next: , Previous: Whats New, Up: Introduction


1.10 Limitations of NetCDF

The netCDF data model is widely applicable to data that can be organized into a collection of named array variables with named attributes, but there are some important limitations to the model and its implementation in software.

Currently, netCDF offers a limited number of external numeric data types: 8-, 16-, 32-bit integers, or 32- or 64-bit floating-point numbers. This limited set of sizes may use file space inefficiently compared to packing data in bit fields. For example, arrays of 9-bit values must be stored in 16-bit short integers. Storing arrays of 1- or 2-bit values in 8-bit values is even less optimal.

With the classic netCDF file format, there are constraints that limit how a dataset is structured to store more than 2 GiBytes (2^30 or 1,073,741,824 bytes, as compared to a Gbyte, which is 1,000,000,000 bytes) of data in a single netCDF dataset. (see NetCDF Classic Format Limitations). This limitation is a result of 32-bit offsets used for storing relative offsets within a classic netCDF format file. Since one of the goals of netCDF is portable data and some computing platforms still can't deal with files larger than 2 GiB, it is best to keep files that must be portable below this limit. Nevertheless, it is possible to create and access netCDF files larger than 2 GiB on platforms that provide support for such files (see Large File Support).

The new 64-bit offset format (introduced version 3.6.0) allows large files, and makes it easy to create to create fixed variables of about 4 GiB, and record variables of about 4 GiB per record. (see NetCDF 64 bit Offset Format Limitations). However, old netCDF applications will not be able to read the 64-bit offset files until they are upgraded to at least version 3.6.0 of netCDF.

Another limitation of the classic (and 64-bit offset) model is that only one unlimited (changeable) dimension is permitted for each netCDF data set. Multiple variables can share an unlimited dimension, but then they must all grow together. Hence the netCDF model does not permit variables with several unlimited dimensions or the use of multiple unlimited dimensions in different variables within the same dataset. Variables that have non-rectangular shapes (for example, ragged arrays) cannot be represented conveniently.

The extent to which data can be completely self-describing is limited: there is always some assumed context without which sharing and archiving data would be impractical. NetCDF permits storing meaningful names for variables, dimensions, and attributes; units of measure in a form that can be used in computations; text strings for attribute values that apply to an entire data set; and simple kinds of coordinate system information. But for more complex kinds of metadata (for example, the information necessary to provide accurate georeferencing of data on unusual grids or from satellite images), it is often necessary to develop conventions.

Specific additions to the netCDF data model might make some of these conventions unnecessary or allow some forms of metadata to be represented in a uniform and compact way. For example, adding explicit georeferencing to the netCDF data model would simplify elaborate georeferencing conventions at the cost of complicating the model. The problem is finding an appropriate trade-off between the richness of the model and its generality (i.e., its ability to encompass many kinds of data). A data model tailored to capture the shared context among researchers within one discipline may not be appropriate for sharing or combining data from multiple disciplines.

The classic netCDF data model does not support nested data structures such as trees, nested arrays, or other recursive structures. (This limitation also applies to 64-bit offset files.) Through use of indirection and conventions it is possible to represent some kinds of nested structures, but the result may fall short of the netCDF goal of self-describing data.

Finally, concurrent access to a netCDF dataset is limited. One writer and multiple readers may access data in a single dataset simultaneously, but there is no support for multiple concurrent writers.