Common Format Features¶
Certain format specifications are shared between the bottle and CTD WHP-exchange files. All WHP-exchange text files MUST be UTF-8 encoded. Unix style line endings, LINE FEED (U+000A), SHOULD be used, DOS line endings, CARRIAGE RETURN (U+000D) followed by LINE FEED (U+000A), MAY be used, other line endings SHOULD NOT be used.
UTF-8 was chosen as the encoding for WHP-Exchange files because it is backwards compatable with ASCII. Valid ASCII files are also valid UTF-8 files. UTF-8 allows for the full range of unicode points to display non ASCII text. Non ASCII text should only be encountered in the comment lines of an enchnage file.
Be careful if editing or creating files on Windows as the default text encoduing is UTF-16. UTF-16 is not compatable with UTF-8 or ASCII.
For both CTD and Bottle files the first rows must be the following and in the presented order:
File Identification Stamp¶
The first line of a WHP-exchange file contains the file identifier and a creation stamp seperated by a COMMA (U+002C)
The file itendifier will be either
BOTTLE in the case of water samples or
CTD in the case of a CTD profile.
The creation stamp contains information on when the file was created and who created it.
A bottle file identifier will look like:
A CTD file identifier will look like:
If while attempting to read a WHP-exchange file and the first line does not start with
CTD an attempt to read the rest of the file will likely fail.
When writing a WHP-exchange format reader, always check if this identification stamp is present and has a valid value.
The creation stamp contians the following information:
- 20140716CCHSIOSCD: A date stamp in the from of YYYYMMDD (ISO 8601)
- 20140716CCHSIOSCD: The division (or group) of the instituion that wrote the file, typically three characters. The CCHDO uses CCH as the division.
- 20140716CCHSIOSCD: The instituion that the group is associated with, typically three characters. The CCHDO is locaded at the Scripps Instituion of Oceanography, thus SIO is used.
- 20140716CCHSIOSCD: The initials of the person who wrote the file, typically three characters. Use only code points U+0041 to U+005A and for the initials. In this example, SCD.
Do not rely on the creation stamp to be the same legnth in every WHP-exchange file. While all the same elemnts will be present, their lengths may vary.
Optional Comment Lines¶
After the File Identification Stamp any number of comment line, including none may appear.
Comment lines start with a NUMBER SIGN (U+0023)
Comment lines typically contain information about the file history and will often contain data citation information.
# This is one line of comments # An additional line of comments
An example of the begining of a file, including the File Identification Stamp:
BOTTLE,20140716CCHSIOSCD # This is a comment line # BOTTLE,20130215CCHSIOSCD
Notice that an older File Identification Stamp is in a comment line. This is a convention often used by the CCHDO to record when changes were made to files
Comments may contain UTF-8 encoded code points above U+007F, especially in proper names that may be present with data citation information. If writing your own WHP-exchange reader, ensure that it can handle code points above U+007F or have it skip comment lines without trying to read them.
Parameter and Unit Lines¶
There are additional headers specific to CTD WHP-exchange files. See the Additional CTD Headers section for details on these additional headers.
After any format specific headers, the parameter and unit lines are next. The parameter names are first, units are second.
Parameter names are COMMA (U+002C)
, seperated values that define the columns the exchange file will contain.
The names must be unique, capitalized, contain no empty fields, and not end with a trailing comma.
The parameter names must contain only code points in the range U+0021 to U+007E except a COMMA (U+002C)
A trailing comma, or a comma that occurs at the end of the line with nothing else after it, MUST NOT be included on the parameter line.
Certain parameter names, or parameter combinations, are required to be present.
See the respective sections on Required Bottle Parameters and CTD required headers for information specific to each format.
The unit line contains information for the units of each parameter listed in the parameter line.
The unit line, like the parameters, are comma seperated values.
Like the parameter names, units must contain only code points in the range U+0021 to U+007E except a COMMA (U+002C)
A trailing comma MUST NOT be included in the unit line.
Units may contain empty fields if the parameter has no units.
Units for a paramter must be in the same column as that paramter, essentialy, the sname number of commas occur before the parameter name and its unit.
Parameter names and units MUST NOT contain commas as part of the name or unit. Commas are reserved for seperating the, names, units, and data into columns.
The parameter and unit lines of a CTD file might look like this:
Note the presence of quality flag column (suffixed with
_FLAG_W) which has the corrisponding units of nothing denoted by two commas next to each other.
For more information on quality flags, see the Quality Codes section.
White space MUST have no meaning in the exchange format so it may be included for purly asthetic reasons.
The parameter and units could very easially have looked like:
CTDPRS, CTDPRS_FLAG_W, CTDTMP, CTDSAL, CTDOXY DBAR, , ITS-90, PSS-78, UMOL/KG
Some technical details for formatting the whitespace.
While not strictly requiered, parameter, units, and data lines may contain whitespace matching the length of the print format of the paramter. This is a convention followed by the CCHDO to ease reading of files by humans. Quality flag columns usually have a 1 character width which will often cause the parameter/units and data to not be aligned into pretty columns.
The data lines occur directly after the unit line.
Each line of data contains COMMA (U+002C)
, seperated values of related data.
Each data point of the data line may contain any combination of characters from U+0020 to U+007F except a COMMA (U+002C)
Like the Parameter and Unit Lines, a trailing comma MUST NOT be included at the end of each line.
Data points for each parameter of the Parameter and Unit Lines must be in the same column as that paratemer, i.e. the same number of commas occur before the parameter label and the datum.
Numeric data which occurs on the data lines MUST only contain numbers, spaces, an optional decimal marker, and an optional negative sign.
All whitespace within data lines has no symantic meaning.
Integers may be represented as bare numerals with no decimal marker.
All real numeric data (i.e. data that are real numbers) MUST be decimal and MUST represent their decimal mark using a FULL STOP (U+002E)
For both negative real numbers and integers, prepend a HYPHEN-MINUS (U+002D)
- to the numeric portion, positive real numbers MUST NOT be prefixed by a PLUS SIGN (U+002B)
The validity of each datum is determined by the parameter column in which it occurs.
For example, the EXPOCODE column may contain any combination of letter, numbers, or symbols (except a comma).
A CTDPRS column may only contain real decimal numbers (U+0030 to U+0039) using a FULL STOP (U+002E)
. as the decimal mark.
Parameters may have a different precision depending on how the measurement was made. The CCHDO maintains a list of parameter names which includes precisions for historic reasons. Previous versions of the Exchange format specification stated the CCHDO would pad “meaningless” zeros to the end of any data without enough precision. Newer software allows the CCHDO to keep the precision as reported, both less and more precise. For these and other reasons, a mix of precisions may occur in a column of data.
Always report the precision as measured.
The exchange format currently has no support for quoted strings within the parameter, unit, and data lines. This means it is not possible for any meaningful whitespace to be included.
After all datalines, the end of the data is indicated by a line containing only
Here is a short example of what exchange data might look like:
2.0,2, 19.1840, 34.6935, 220.8 4.0,2, 19.1992, 34.6924, 220.7 6.0,2, 19.2002, 34.6922, 220.5 8.0,2, 19.2022, 34.6920, 220.5 END_DATA
Post Data Content¶
END_DATA line, any additional content may be included without format restriction.
Additional content after
END_DATA MUST continue to be UTF-8 encoded.