Decoding and Encoding of BUFR Messages At The UK Met Office

DataManagement Technical Note Number 1

Author: C Long

$Revision: 1.4 $
$Date: 2000/11/15 16:09:02 $

     INTRODUCTION



1    TABLES



1.1  Table B

1.2  Table D

1.3  Local tables B & D

1.4  Code figures and flags

1.5  Descriptor representation



2    DECODING



2.1  BUFR message structure and decoding strategy

2.2  Replication

2.3  Basic BUFR operations and structure of decode

2.4  Bit manipulation to construct values

2.5  Output and display 

2.6  Coordinates and instrumentation

2.7  Increments



3    ENCODING



3.1  Compression

3.2  Setting up descriptor sequences

3.3  Preparation of values to be encoded

3.4  Run-length encoding



4    QUALITY OPERATIONS



4.1  Bit maps

4.2  Bit maps and operators

4.3  Assumptions made to clarify specification

4.4  Programming strategy

4.5  Use of decode output in application programs

4.6  A comparable UK Met Office extension



5    TO SET UP A BUFR SYSTEM



5.1  Table access

5.2  Programs to handle messages

5.3  Calls to encode & decode



Crown Copyright 1990, 1993, 1995



Meteorological Office,

London Road, BRACKNELL, RG12 2SZ



Note:       This paper has not been published.  Permission to quote 

            from it should be obtained from the Director of  Met 

            Office User Services.

Introduction

BUFR is a Binary Universal Form for Representing data.

BUFR is universal in that it contains a description of the data as well as the values. The description gives a list of the elements whose values follow. It does this in a coded form that requires a set of tables to interpret it. BUFR was developed for meteorological data, but can transmit whatever elements have table entries.

BUFR is binary in that values are not confined to some number of decimal digits, as with a character-based code , or to a machine-dependent word-length, but coded in a number of bits given by one of the above tables, which can be changed if necessary by appropriate operations.

The simplest BUFR message consists of a number of descriptors followed by the values of the corresponding elements. But not all BUFR descriptors correspond to elements: some descriptors represent operations to change the way a value is coded (as above), others make the description more concise by repeating descriptors or getting sequences of descriptors from a table rather than including them in the message.

So the most essential component of a BUFR system is the table of elements, Table B. Less essential, in that messages can be made without them, are the table of sequences (Table D) and the set of possible operations.

For each element the entry in Table B gives a name, the SI units, the number of bits in which to code a value, a scale factor which can be changed by a power of 10, and a reference value to be subtracted from the original value to leave a positive number to be encoded.

The operations enable the number of bits, the scale and the reference value to be changed. They also make it possible to add quality control flags, values and differences, to skip fields and so on - the list may be further extended.

Space taken up by the description can be saved by replication and use of Table D sequences. Space in the data section can be saved by compression if several similar sets of values are coded together, by expressing the set of values of each element as a minimum, an increment width and a set of increments (in a reduced number of bits) to be added to that minimum.

The sequence of descriptors can arrange data in ways not covered by existing code forms. Space and time (coordinate) elements locate the values that follow them, space and time increments can be defined, so time sequences of regularly occurring values can be encoded. Run-length encoding is provided for images.

The sections which follow describe how the tables have been set up and the various operations encoded at the UK Met Office. These notes are to be read in conjunction with the section on FM 94 BUFR in the WMO Manual on Codes, and concentrate on points which could cause confusion.

1 TABLES

1.1 Table B

The table of elements, Table B, is fundamental to both encoding and decoding in BUFR, whereas the other tables are not always necessary.

Table B has 64 element classes, each with room for 256 entries for elements in the class. A class contains e.g. temperatures, or various humidity elements, or year, month, day... second. No information is conveyed by the choice of class for an element (but see 2.6: the distinction between coordinate classes and others is important): it just groups related or similar elements together to show at a glance what entries already exist in that field.

An entry consists of: descriptor, name, units, scale factor, reference value and number of bits used to encode a value (or rather to encode value*(10^scale)-refval ). In the format defined in the WMO Manual on Codes for exchange of Table B each entry takes 95 characters, though this allows only 40 characters for the name, which is elsewhere defined as up to 64 characters . In this form a full Table B would take up more than a megabyte, which may not be a practical proposition, so we use a more compact machinable form to make the information accessible more efficiently.

Most names of elements are much less than the maximum length, and the table is still only sparsely filled, although elements are accumulating rapidly. So a packed version of the table is possible, the names and units being replaced by 8-bit lengths followed by that number of characters.

This is the approach adopted by the UK Met Office. Our operational Table B was based on the idea that the most frequently used entries were likely to be for elements 1-64 in classes 1-32, an eighth of the table. (This seemed sensible in the 1980's, but now looks less consistent with the existing entries!) Faster access is provided to these "kernel" entries, by means of pointers which can be located immediately from the class and element number, whereas "local" entries (all the rest) can only be found by sequential searches within classes.

This gives a table with the following components:

  1. Packed "kernel" entries one after another in a character string (no need for the element number or the total length)
  2. Pointers to the "kernel" entries in a 32*64 array
  3. Packed "local" entries one after another in a character string (including the element number and the total length)
  4. Pointers to the "local" classes with the number of elements in the class, through which a sequential search will be necessary.
Each entry consists of:
length in    description of

  octets          field



    1        length of entry    (only if "local")

    1        element number     (only if "local")

    1        length of name     (=LN)

   LN        name

    1        length of units    (=LU)

   LU        units

    1        format (R: real, N: numeric, F: flag/code, C: chars)

    1        scale

    4        reference value

    1        field width
(The "format" - not in the printed Table B - was introduced to help in decisions about which elements an operation applied to. But the rules have since been simplified, and the distinction between flags, code figures and characters could be made using the units field.)

1.2 Table D

Table D consists of sequences of descriptors which were expected to be in frequent use. Some sequences are simply for location, others correspond to particular kinds of report, and Table D is divided correspondingly into categories of sequences for the same kind of data, like Table B into classes. Again no information is conveyed by the category, but it is convenient if entries for complete messages go into categories simply related to those in Table A.

Table D is merely a form of shorthand to cut down the length of the description in a BUFR message. The fact that a sequence occurs in a particular category should not be taken as providing information about, say, the instruments used (for which appropriate elements should be used). And the words in the right-hand column of the Manual on Codes' list of sequences should not be taken to imply information which is not given explicitly by the descriptors themselves: no words appear in the Manual on Codes' BNF definition of the format of Table D!

Our operational Table D is constructed on the same principles as Table B. Each "kernel" entry is ND descriptors (two octets each) preceded by a length 1+ND*2. A "local" entry has the number of the sequence in the category inserted between its length (1+1+ND*2) and the descriptors. The pointers are set up in the same way.

An arbitrary limit of 16 descriptors in a sequence was suggested in the early stages of BUFR, in the hope that any long descriptor sequence could be broken down into sequences useful in other contexts. We find that sequences of more than 2 or 3 descriptors are seldom useful in more than one context, and that spliting up sequences to restrict the maximum to 16 just fills up the table faster (any change needs a new sub-sequence and a new overall sequence to include it), so we prefer to describe messages in single local sequences not restricted to 16 descriptors.

1.3 Local Tables B and D

The WMO Manual on Codes gives BNF definitions of Table B, Table C (pointlessly) and Table D (wrongly: it ends up with simply one descriptor following another, with nothing to delimit the sequences!) - and then an "exchange format" (with no binary fields) for Table B only. But Class 0 provides for transmission of new or local table entries in messages, so that it is an exchange format too, satisfactorily defined for Table D as well as Table B, but unfortunately inconsistent with the BNF exchange format.

Class 0 exchange could be used, for instance, to send one centre's local entries, needed in the decoding of the data which follows, to another centre. Local entries from another centre may well clash with our own local entries, so we must let them override our entries for the duration of a particular decoding task but not update Table B or D permanently. Having established this need, we can take advantage of the system to define sequences for internal use, only required while a certain kind of message is being handled.

Coding a Local Table B

For Table B we use 112-character entries with fields defined by Class 0 rather than the BNF format, so that input from a data set is consistent with input from a message.

On the IBM mainframe, the Local B file should be given the DDNAME LOCALELM e.g.

//GO.LOCALELM DD DSN=MDB.BUFR.LOCALB,DISP=SHR
On an HP, T3E or IBMSP unix machine, the Local B file should be given the name or symbolic link LOCALELM in the run directory (the directory the BUFR executable will run in), or if using the environment variable BUFR_LIBRARY, put LOCALELM in the library pointed to by $BUFR_LIBRARY.

Example LOCALELM file:

LOCAL BUFR TABLE B ENTRIES IN EXCHANGE FORMAT, AS DEFINED IN WMO MANUAL ON CODES ON CODES (112-BYTE ENTRIES)

FXXYYYNAME1...........................NAME2...........................UNITS...................SCALREFVAL.....WID

002196satellite classification                                        CODE TABLE                 0          0  9

002197satellite channel centre frequency                              HZ                        -8          0 26

002198satellite channel band width                                    HZ                        -8          0 26

002221segment size at nadir in x direction                            M                          0          0 18

002222segment size at nadir in y direction                            M                          0          0 18

002231height assignment method                                        CODE TABLE                 0          0  4

IMPORTANT NOTE: There must be at least 1 blank line at the end of LOCALELM. If not, BUFR encoding or decoding will almost certainly fail because TABLEB will not be opened later.

Coding a Local Table D

No exchange format being laid down for Table D, we set up our input as in the example below, in a form designed to be readable and self-explanatory rather than a bare string of FXXYYY values, and either put it in a file to be read when Table D is consulted or call LOCALD, having read the sequence, to make it available through later calls. This is not so much an extension to BUFR as a better documented way of putting information in Table D.

On the IBM mainframe, the Local D file should be given the DDNAME LOCALSEQ e.g.

//GO.LOCALSEQ DD DSN=MDB.BUFR.LOCALD,DISP=SHR
On a HP, T3E or IBMSP unix machine, the Local D file should be given the name or symbolic link LOCALSEQ in the run directory (the directory the BUFR executable will run in), or if using the environment variable BUFR_LIBRARY, put LOCALSEQ in the library pointed to by $BUFR_LIBRARY.

Example LOCALSEQ file:

309255   UPPER AIR SIGNIFICANT TEMPERATURES AND WINDS



001001, 001002, 001011        STATION NUMBER OR CALL SIGN 

005002, 006002, 007001        LATITUDE & LONGITUDE, STATION HEIGHT

004001, 004002, 004003        DATE (YEAR, MONTH, DAY)

004004, 004005                HOUR & MINUTE (IF KNOWN) OF LAUNCH

002011, 002014,               SONDE TYPE, TRACKING SYSTEM

002013                        RADIATION CORRECTION

022042                        WATER TEMPERATURE

104000, 031001, 008002        CLOUD DATA (LOW, MIDDLE, HIGH)

020012, 020011, 020013        CLOUD TYPE, AMOUNT & BASE FOR EACH LEV

008001, 106000, 031001        SEMI-STANDARD LEVELS (775MB ETC)

010004, 010003                PRESSURE & HEIGHT

012001, 012003                TEMPERATURE & DEW POINT

011001, 011002                WIND SPEED & DIRECTION

008001, 103000, 031001        SIGNIFICANT TEMPERATURES

010004, 012001, 012003        PRESSURE, TEMPERATURE & DEW POINT

008001, 103000, 031001        SIGNIFICANT WINDS

010004, 011001, 011002        PRESSURE, WIND SPEED & DIRECTION

1.4 Code figures and flags

BUFR Table A is not an essential part of the encoding/decoding system, but more for data base or telecommunications use. Table C is essential, but is not a table in any formal sense, consisting of plain-language descriptions of operations which have to be programmed in different ways.

But one further table can usefully be made, for decoding purposes only, or rather for displaying data coded in BUFR: it consists of brief descriptions corresponding to the code figures and flags. It seems best to avoid (as far as posible) displaying the code figures themselves: even where these correspond to existing WMO codes, not all users can be expected to know the codes, and many code and flag tables have been made specially for BUFR, either from scratch or by combining existing tables.

The problem is that (unlike element names) descriptions of code figures can be very long, especially where effectively several code figures and flags have been combined, as for present weather. This means that a brief form, say 12 characters, displayable in a table column, is not always easy to find.

But most of the code figures have, despite this, been compressed into a 12-character form which hopefully remains meaningful: those remaining will apppear as figures in a display, leaving the user to look them up in the Manual on Codes.

The structure of the code figure table is as follows. Each description of a code figure (maximum 12 octets) is preceded by a length (1 octet), each set of code figures in a table by a count (1 octet). For each table there is an index entry consisting of the descriptor (2 octets) and a pointer (2 octets), these index entries being stored sequentially with a count of code tables at the start.

1.5 Descriptor representation

The Table B described in 1.1 is for use by decoding and encoding programs. If the output from a decode consists of parallel arrays of descriptors and values (but see 2.3: this is not always possible in the sense that the n-th row or column in the values array will consist of values of the n-th element in the descriptor array - some descriptors may have to be skipped), then a calling program needs to be able to recognise descriptors.

Descriptors appear as 6-figure numbers in the BUFR documentation. But if F, XX & YYY in FXXYYY are fields of 2, 6 & 8 bits respectively, then the numerical value of a descriptor is not equal to FXXYYY read as a single integer, but F*16384+X*256+Y rather than F*100000+X*1000+Y.

We therefore need several functions for converting from one form to another: from a 16-bit field in section 3 of a BUFR message to separate values of F, X & Y and hence the 6-figure displayable form as above (for, say, error messages), and from a 6-figure form (as in the documentation, and therefore more readable) to the 16-bit form used in encoding and decoding. DESFXY (DESCR,F,X,Y) converts a 16-bit descriptor to values of F, X & Y (all integer) and the function IDES (FXXYYY) converts from 6-figure form F*100000+X*1000+Y to 16-bit form.

But note that to find a given meteorological element in a message it is generally not enough to find a single descriptor: to find an element like tropopause temperature means finding two descriptors, not necessarily consecutive: 008002 with a value of 3 for the tropopause and only then a temperature descriptor. So in practice a data base interface is needed between a BUFR decode as described here and a meteorological user.

2 DECODING

2.1 BUFR message structure and decoding strategy

A BUFR message consists of a start and end (ASCII characters 'BUFR' and '7777' respectively, 'BUFR' being followed by the total length of the message in edition 2) delimiting 4 sections, each starting with a length, which is always an even number of octets. The first 2 sections (if present: the second is optional) are for handling the BUFR message as a whole (during transmission or in a data base), giving a rough classification of the data and a single "representative" time (which does not mean that the time can be omitted from the data, or that the data can't have more complex time structures). Decoding is concerned with sections 3 and 4, the description and values respectively (section 3 starts with the number of "sets of values" - "reports" in traditional terms - and a compression flag, set if the reports are encoded together with compression, not set if they follow one another, reusing the same description).
BUFR

Section 1: length in octets 1-3, originating centre in octets 5-6, 

           flag for section 2 in octet 8, type of data in octets 9-

           10, date/time in octets 13-17

Section 3: length in octets 1-3, number of reports in octets 5-6, 

           compression flag in octet 7, descriptors in octets 8-9, 

           10-11 etc

Section 4: length in octets 1-3, bit string starting in octet 5

7777
The task of decoding as defined here is to achieve a correspondence between descriptors and bits in the data section, so that we know how many bits make up a value, what element it is a value of, any scale changes etc, and then return arrays of descriptors and values in such a way that it remains clear to a calling program which value corresponds to which descriptor.

Conceptually this is a matter of taking Table B entries, perhaps with modified scale figure etc, and adding a further column to give the corresponding value. But in fact there is no need to set up the whole of such an array, which could well occupy a megabyte for a large message. If the aim is to display the contents of the message, then lines can be output as they are set up rather than held in core; if not, then what is wanted as output is an array of values with all operations performed and a corresponding array of descriptors to identify the elements (the other columns are only used while an element is being handled and can be discarded when the next element is reached - except when quality operations are possible (see 4.2)).

So, although at first it might seem convenient to separate expansion of the description, that is the process of looking up sequences, performing replications, adding quality control fields etc, from the bit manipulation involved in finding the corresponding values, this is not advisable for reasons of efficiency.

But there are more fundamental reasons for combining expansion of descriptor sequences and bit manipulation. To see why, we need further consideration of the replication operation.

2.2 Replication

The operation called replication has grown more complicated as BUFR has developed. There are now three complications, which will be treated in turn.

First we must distinguish between explicit and delayed replication. A replication descriptor says how many descriptors to repeat. It may also say how many times to repeat them, but this count may be set to zero, in which case it has to be found in the data. This makes sense where, say, the number of levels in a profile is not known beforehand and may vary from profile to profile: delayed replication enables the same sequence of descriptors to be used for all profiles (though obviously not with compression if the count varies).

A descriptor sequence which includes delayed replication cannot be expanded in isolation from the data. It would be possible to find the replication counts before the values of the elements (by adding up the number of bits to skip) and so keep the two processes more or less separate - but there are further complications.

Replication originally applied only to descriptors: the descriptor sequence was abbreviated to save space and has to be expanded to match the data. But when a replication operator is followed by a data repetition count, rather than an ordinary delayed replication, the data value itself must be repeated the same number of times. This is for run-length encoding of images consisting of a fixed number of values of a given element, the precision being such that many successive values may be the same.

For instance, any line of a radar image can be broken up into segments consisting of identical pixel values and segments where the values vary. The first kind of segment calls for data repetition, a descriptor and a value both encoded once to be repeated N times in the output; the second requires replication, N values to be coded in the message and one descriptor repeated N times in the output to correspond. Clearly such a descriptor sequence cannot be expanded in isolation from the data.

The third complication is the replication of coordinate increments. An element in one of the time or place classes immediately before a replication operator is taken to be included in the N-fold replication as an increment to be added N times, but without any further value in the data. There can be increments for more than one coordinate element.

Now consider nested replications, say for coding an image line by line: an outer replication for the number of lines in the image and inner replications to describe each line. The outer replication is preceded by, say, a latitude increment, the inner by a longitude increment; no pixel values occur except inside the inner replication.

Clearly the increment before the outer replication must be distinguished during the decoding process from that before the inner replication, or else it will be replicated again: it must be flagged as already replicated, and only unflagged when the expansion is complete.

In other words, there are descriptor sequences which cannot be reduced to sequences of element descriptors without destroying vital features of their relationship to the data. Hence sections 3 and 4 must be handled together.

2.3 Basic BUFR operations and structure of decode

The basic structure of the decoding program follows the descriptor structure. The different kinds of descriptor are as follows (omitting operators concerned with quality operations; for these see 4.2):
F=0: element   (class X, element Y in Table B)

     an element can be character or numeric,

       a numeric element a number, code figure or flag(s),

          and any element not in Class 31 can have associated fields



F=1: replication   (of the following X descriptors Y times)

     Y>0: explicit (count in descriptor)

     Y=0: delayed  (count in data, either ordinary replication

                   or data repetition



F=2: operation

     X=1: change field width   (by Y-128 bits)

     X=2: change the scale, i.e. multiply by a power of ten

         (by 10^(Y-128))

     X=3: change reference values

     X=4: add Y-bit quality control field

     X=5: insert string of Y characters

     X=6: hide local descriptor

         [for quality operations see 4.2] 



F=3: sequence   (category X, sequence Y in Table D)







F=1           If replication is delayed, the count is found in the 

              data. Increments immediately before the replication 

              operator are counted and the increment descriptors 

              added to the end of the sequence of descriptors to be 

              replicated. Space is made (as for a sequence) and the 

              replication carried out. The values of any replicated 

              increments will be copied in the output value array.

                 If a count in the data is zero, delete all the 

              descriptors that would have been replicated, including 

              the increments, as well as the replication operator and 

              count.

                 If the count in the data indicates run-length 

              encoding, flag the element descriptor (asssuming that

              only one element at a time can be run-length encoded)

              and repeat it, leaving the operation to be completed by 

              repeating the values in the value array. We also need 

              a flag to be set when the descriptors are repeated and 

              then unset when the value has been got from the bit 

              string, to avoid looking in the bit string for further

              values.



F=2,X=1,2,4   Width increment, scale increment and stacks of Q/C 

              field width and field meanings are set accordingly and 

              used whenever values of an element are found. Each 

              value is then preceded in the output by the meaning of 

              each field and the field itself, for as many pairs of 

              meaning and value as are currently nested.



F=2,X=3       Changed reference values are listed (in parallel arrays 

              of descriptor and reference value) and the list 

              consulted whenever values of an element are found.



F=2,X=5       Inserted characters are put in the same string as 

              character values.



F=2,X=6       The descriptor and value are skipped - unless there is 

              a local Table B entry with the same data width.



F=3           Insertion of a sequence is simple. Space is made by 

              moving the remaining descriptors down; the inserted 

              descriptors overwrite the sequence descriptor itself, 

              and scanning of the descriptors continues with no 

              adjustment to the pointer, i.e. with the first 

              descriptor in the inserted sequence.

2.4 Bit manipulation to construct values

Descriptor manipulation can only be handled by a complicated program which can be given a clear structure, that of the descriptors, but not easily broken up. Only a few tasks are sufficiently self-contained to be done in subroutines: these are looking up tables (B, D and codes), already discussed, and finding a value in the bit string, where the task is to get (or put, if output) a value V in WIDTH bits after I bits in the bit string.

There are several ways of doing this. It can be done a bit at a time, testing whether a bit is set in the bit string and building up the value by doubling and either adding one or not adding accordingly.

Our Fortran program takes a slightly more complicated (but faster?) approach, working an octet at a time. We start in octet N=I/8. In this octet NINIT=I-N*8, i.e. MOD(I,8), bits have already been used. The value will extend over NOCTET=(WIDTH+NINIT+7)/8 octets, and in the last of these octets NLAST=WIDTH+NINIT-(NOCTET-1)*8 bits will be used.

The value is segmented in this way, bits being shifted in an octet by multiplying or dividing by powers of 2. A value that fits into one octet is treated as a special case.

A character value is encoded one octet at a time.

A value which is all ones, i.e. equal to 2^(WIDTH-1), is missing except in the case of a one-bit element or associated field, which is simply a flag set on or off.

Operationally we use an Assembler program which works one 32-bit integer at a time. It cuts encoding/decoding times by a third.

Skip I/32 words, load two words, shift left MOD(I,32) bits to get rid of unwanted bits in previous values and right 32-W bits to align the value, losing any bits of following values.

The Assembler method assumes that no value will be too big for an integer, in our case 32 bits, and both routines at present output integer values - but it may be that in the future there will be elements with so many bits that this loses precision.

Example: a 13-bit value is split between octets as follows:

         =====+++   ++++++++   ++====== 

          octet 1    octet 2   octet 3



NOCTET=3, NINIT=5, NLAST=2



Build up the value V as follows:                in this case:



V1=MOD(OCTET(1),TWOTO(8-NINIT))                 V1=MOD(OCTET(1),8)

V2=V1*256+OCTET(2)                              V2=V1*256+OCTET(2)

V =V2*TWOTO(NLAST)+OCTET(3)/TWOTO(8-NLAST)      V=V2*4+OCTET(3)/64



where TWOTO is an array of powers of 2.

2.5 Output and display

The array of values output from a BUFR decode must in general be a real array. If integers were used, the units would have to be as in Table B (or else the user wouldn't know what they were). This is all right when converting to those units (if there has been a scale change) means multiplying by a positive power of ten; but when it means dividing and therefore losing precision - the extra precision may be just what is wanted by the user!

For character elements the corresponding value points to a character string: the value is length*2^16 plus pointer.

Ideally the N-th descriptor in the output would correspond to the N-th value or N-th row of values, i.e. all operators would have been used and then deleted, leaving only element descriptors. But unfortunately this is not generally so.

In the expansion of the BUFR descriptor sequence the following aims at first sight seem reasonable: (1) to leave a valid sequence of descriptors after any operation, (2) to end up with a sequence in one-to-one correspondence with the values, i.e. with no operators left in it, (3) to end up with a sequence that can be used to reencode selected subsets of values (reports) from a compressed message, (4) to end up with a sequence which can be used to decode another subset (if there are several subsets in the message with no compression).

Of these aims (3) is questionable, because what is wanted in section 3 of a BUFR message is more likely to be the original than the expanded sequence, (2) requires decisions about whether delayed replication counts are to be put in the output value array and what descriptors should correspond to quality control fields, (1) is unattainable for reasons like those described in 2.2, and (4) is internal to the decoding process, so better abandoned - it's simpler to keep the original sequence and repeat the expansion.

In fact aim (2) is inconsistent with (1) and (3): if our aim is correspondence with the values, and therefore operators are deleted after use, then we're left with replication counts with no replication operators; if the operators were left, then the descriptor count (X) would have to be adjusted during subsequent operations, which would be difficult.

So the best we can aim for is some correspondence between descriptors and values (essential - though some descriptors may have to be skipped) and the possibility of reencoding starting with the original descriptor sequence (though this would depend on the operations used).

So the output descriptor and value arrays depart from one-to-one correspondence and immediate reencodability in the following ways:

These decisions, designed to avoid any repetition of descriptor manipulation in the calling program, may seem arbitrary, especially the first one: they meet our current needs (July 95) but clearly the handling of replication may seem unsatisfactory - a better general solution might be 1XX000 in the descriptor array (with XX adjusted to describe the number of output descriptors now replicated - not an easy task!) and Y, the corresponding count, in the values array.

Our BUFR decode provides an optional display of the values (one line each: element name, units, value - if the value is a code figure, then if possible it is replaced by a brief description, and flags are handled similarly, a bit at a time).

Example of display:

WMO BLOCK NUMBER                   NUMERIC            33

WMO STATION NUMBER                 NUMERIC            946

LATITUDE (COARSE ACCURACY)         DEGREES            45.00

LONGITUDE (COARSE ACCURACY)        DEGREES            34.00

HEIGHT OF STATION                  M                  205

TYPE OF STATION                    CODE TABLE         MANNED

YEAR                               YEAR               1996

MONTH                              MONTH              4

DAY                                DAY                21

HOUR                               HOUR

     0     3    6    9   12   15

WIND DIRECTION AT 10M              DEGREES TRUE

    170    0   30   60   50   230

WIND SPEED AT 10M       NUMERIC          M/S

    3.1 *********   2.1  4.1  3.1  5.1

CLOUD TYPE

NO CL CLOUD  NO CL CLOUD  NO CL CLOUD  CU CAL  NO CL CLOUD  CU CAL



CLOUD TYPE

AC TR LEVEL AC TR LEVEL  AC TR LEVEL  AC TR LEVEL  AC TR LEVEL

AC TR LEVEL



CLOUD TYPE

NO CH CLOUD CI FIB (UNC) CI SPI SHEAF CI SPI SHEAF NO CH CLOUD

NO CH CLOUD

2.6 Coordinates and instrumentation

The handling of "coordinate" elements in BUFR is problematic. Because encoding and decoding can be done without reference to the concept, vague statements like the note to 94.5.3.3 have crept in.

This vagueness has led, for instance, to a disagreement about whether a station height (007001) can be incremented by the height increment 007005 to give heights in a profile. Being in Class 7, a coordinate class, 007001 can reasonably be taken to apply to any data which follows - rather than just giving information about the station, in which case it should be in a non-coordinate class. But even so the combination of 007001 and 007005 has been objected to.

If note 94.5.3.3, about coordinate elements "contradicting" one another, is seriously meant as something programmable, if we must in principle always be able to output from a decode all the coordinates of a given element, then we need up to ten (one for each coordinate class, some of them at present reserved) 256*256 bit tables, specified as part of the BUFR documentation, to allow decisions about contradiction to be made for all possible elements.

Fortunately a looser interpretation is possible in contexts not involving increments, which leaves decisions about contradiction to the user at the data base interface. We can say that the coordinate/value distinction is entirely a matter for retrieval, i.e. selection of data from BUFR messages: the user specifies values of coordinate elements (bearing in mind, for instance, that more than one descriptor is possible for latitude and longitude!) and it is at this stage that decisions must be taken about which of several "contradictory" coordinate elements to use.

But this is only one aspect of the problem. A coordinate can be redefined by a conflicting element in the same class - but there are times when we want to say that a coordinate no longer applies rather than decide between two conflicting coordinate elements. This is especially true of class 2, instrumentation. There is some agreement that an instrumentation "coordinate" can be cancelled by a missing value of the same element, but this leaves problems of interpretation.

Suppose we use the element "sonde type" and then (for comparison, say) have a measurement not made by a sonde. There may be a corresponding instrumentation element which could be interpreted as contradicting "sonde type"; but this is certainly not true for all possible elements. Sonde type implies, among other things, a temperature-measuring instrument; but the instrument used in a screen at the surface is assumed to be known! A missing value for sonde type could mean (if the above convention is accepted) that the coordinate no longer applies: but sonde data for an unknown sonde type is quite conceivable!.

(But is Class A meant to rule out such combinations of data? This is another vague feature of the BUFR system: the classification is partly by place and partly by instrumentation. "Surface data: land" appears to cover measurements by satellite and sonde and aircraft on the ground as well as ordinary anemometers and thermometers! Is the category of satellite sea surface temperature data 0, 5 or 31? )

Note also that the proliferation of instrumentation data in BUFR has made some early element names inappropriate: 002003, "type of measuring equipment used", is clearly meant only for PILOTs when the code figures are examined.

2.7 Increments

Increments for time and place elements were a late addition to the BUFR system, perhaps not explained in sufficient detail.

Clearly the current position is obtained by adding the increment, if there is one, to the original position. But what if there is more than one increment for the same element? The general BUFR rules would say the second overrides the first, so add the second increment to the original value; but increments before replications are clearly meant to take effect cumulatively, i.e. the value before the replication count is added repeatedly to the original value.

We must then assume that if a new original position is given, any increment is cancelled. If, for instance, we reach the end of a row in scanning an image, restating the original longitude will take us back to the start of the next row. Until the longitude is restated the increments remain in force, even outside the replication which added them, so that a run-length-encoded row, consisting of several segments, each with its own replication, will accumulate increments along the whole row, rather than go back to the original value at the start of each segment.

So we must assume that increments involved in replications always (not just within the replication) take effect cumulatively: that an increment can be cancelled by resetting the original coordinate at the start of a row, but then each step is always added to the current value of the increment, however many segments there are in the row.

Our decode program replicates the increments explicitly if an increment descriptor appears before a replication operator: the increments can then be converted to incremented values of the coordinate in a further pass through the output array.

Increments before replication operators are recognised by the presence of the word 'increment' in the name. The matching up of increments and elements incremented is (fortunately) an operation that can be handled outside the basic decode. We suggest incrementing an element only if an element in the same class (in classes 4-7) and with the same units is found with the same name as far as 'increment', or at least with the word 'increment' in its name, so as not to tie the increment recognition process to the word order of English (other centres may use translated element names, and the equivalent of 'increment' could come at the start rather than the end of a name!) - but one day there may be an element with 'increment' in its name which despite that is not an increment in the sense of this section), so this is still not a satisfactory proposal.

One BUFR rule about increments is clearly stated: 94.5.4.3 says that a replicated increment is added the first time to give the coordinate of the first set of replicated data, so the original coordinate in the BUFR message must be the first position or time minus the increment.

3 ENCODING

3.1 Compression

A decoding program must provide for every BUFR possibility to be able to decode any messages received. An encoding program, on the other hand, can make simplifying assumptions about what operations are needed.

One such assumption concerns compression of character values.

Compression in general consists in taking N values of an element, finding the minimum and coding that in the current number of bits for the element, followed by an increment field width and N increments which, when added to the minimum, reconstruct the values.

Compression is done by scanning the values to find the maximum and minimum, allowing for missing data. Find the number of bits needed to code maximum minus minimum plus one (from the next highest power of 2, the smallest M such that max-min+1<2^M). That is the increment width. One is added because all ones would be taken as representing missing data; so if max-min=(2^M)-1 for some M, the number of bits needed is not M but M+1. Missing values are ignored in finding the minimum, but a flag is set if missing values exist: max=min with no values missing means no increments to be coded, but max=min with missing values means one-bit increments, set to 1 if the value is missing.

If a value cannot be encoded in the field width, it is set to missing before it can affect the range of values.

Now consider compression of characters. Character fields are left-aligned, so compression saves nothing even if all the values are short compared with the field width (only a change of the field width itself would save space). It saves a lot of work to assume (when encoding) that the "local reference value" coded before the increment width and increments is not necessarily the minimum as above (i.e. such that at least one increment is zero), but can be simply a convenient value, in this case binary zeros so that the characters are encoded unchanged. (But a check is made to see if all the character values are the same, in which case the value and a zero increment width are coded.)

During decoding, on the other hand, no such assumption can be made: other centres may well have gone through the laborious business of subtracting character strings to arrive at increments which are not in themselves characters.

Examples:



values to be coded  45, 37, 19, 22, 17

minimum = 17, max minus min = 28, hence 5 bits



values to be coded  21, 3, 13, 34, 5, 8

minimum = 3, max minus min = 31

- but an increment of 31 in 5 bits would have all 5 bits set

  and therefore mean missing, hence 6 bits are needed

3.2 Setting up descriptor sequences

Apart from the handling of compression, encoding is more or less the reverse of decoding. The only problem is how the descriptors and values should correspond.

Obviously any set of values can be encoded given a descriptor sequence which is in one-to-one correspondence. But usually a description comparable in length with the data is not acceptable when BUFR provides so many ways of abbreviating it. If data which has just been decoded is re-encoded, and the descriptor sequence in the original message is reusable, then it is reused; but there is no obvious way of making a shorter descriptor sequence automatically when no unexpanded sequence is available. Such a process would be like decompiling machine code into a high-level language.

In other words, descriptor sequences can be expanded but not contracted. We therefore need a way of checking that a sequence chosen by a user from entries in Table D and Table B will expand as expected, a way of showing clearly where values should come in an input array, what scale changes are required and so on.

A program to do this will obviously not be able to use counts in the data, but can for instance inset descriptors which will be replicated. (Because delayed replications can't be carried out, different programming techniques are called for: we need a stack of nested replications, with counts of descriptors at each level.)

One of the features of BUFR more easily overlooked when setting up descriptor sequences is the distinction between coordinate elements and others (see 2.6). Time and place precede values at that time and place, and elements in certain other classes, like instrumentation, likewise apply (until changed) to the values that follow.

This effect is not overridden by replication: if the coordinates in a group of replicated descriptors don't come first, they apply to the first values of the elements which follow in the replicated group and the second values of the elements before - then comes a further coordinate change, and so on.

Of course a user who wants all the data in a message knows how to interpret it and won't connect the values and coordinates wrongly. But a general retrieval program going through data of different kinds might well look for values of a certain element at given places and times, ignoring any other elements, and return wrong data if the coordinates are out of place.

The above-mentioned program (SCRIPT) for showing how a sequence will expand puts a blank line in front of any coordinate element (or sequence of successive coordinates), hoping that an unexpected break will warn a user that the strict interpretation may be not what is intended.

3.3 Preparation of values to be encoded

To provide an array of values for encoding, first expand the intended descriptor sequence as in 3.2: this will give a list of elements with units and scale factor specified and also lines like "replication factor" (for a delayed replication count) and "n-bit Q/C field".

If the input is a real array, then the scale column can usually be ignored. What is required is values in the units specified. The scale can be taken as a warning about what rounding will be done in the course of encoding - but then presumably the precision of the data is reflected by the description chosen by the user at an earlier stage (whether to code temperatures in whole degrees, or tenths, or hundredths - with a change of scale if necessary). The user only needs to ensure at this stage that temperatures are in Kelvin rather than Celsius. (Obviously, if the input is an integer array, then a temperature in tenths is required if the scale factor is 1.)

The reference value in Table B is likewise not the user's concern. For temperature it was possible to choose units (degrees Kelvin) which always give positive values, so no nonzero reference value was needed; for latitude that is not possible, and so the encoding process must subtract a large enough negative number to give always a positive number to encode. But this requires no action by the user.

An example may help. A temperature is normally in degrees Kelvin with a scale factor of 1, i.e. in tenths. So real input requires a value like, say, 287.6; this number will be multiplied during encoding by 10 to give 2876, the value to go into the bit string (unless, of course, there is compression).

Beware that if the scale is changed and the reference value is not zero, then it may be necessary for the user to change the reference value to go with the new scale. (But a change is not essential if the scale change leads to less precision; and the expected range of values may be such that for greater precision no change is needed - the reference value only needs to be a large enough negative number.)

Beware also of scale changes for precipitation, where negative values are really code figures and so the reference value should stay as -1 or -2 regardless of changes. So a trace is always -1 or -2 regardless of scale. The encode and decode both assume that a negative value of any class 13 element with a reference value of -1 or -2 is a trace and therefore not scaled.

For character values we make the corresponding number in the value array a pointer to a character string (see 4.3 for details of the call). There is no need for a length, which is given by Table B with any adjustment. "Inserted characters" (operation 5, which gives the length) simply follow on in the input character string with no pointer in the value array.

3.4 Run-length encoding

Class 31 in Table B defines two kinds of counts for use in repetition operations: one repeats descriptors only, the other repeats data too.

The first is for straightforward delayed replication, which is explained clearly enough in the documentation. The second is for "run-length encoding" of images: if the range of pixel values is small, so that, when an image is scanned, many successive values will be the same, it is convenient to give the number of identical values rather than encoding the value that many times.

A descriptor pattern which makes this possible without requiring a different sequence of descriptors for each image is as follows. Any row can be broken up into a set of "parcels" each consisting of a number of strings of identical values followed by a string of different ones. In this way an image can be described by a general sequence of 15 descriptors (see below), to be expanded using the counts in the data.

The basic BUFR software can encode an image in this way if passed the counts and told to use this descriptor pattern. But this is not the only possible approach to image encoding, so the sequence of descriptors is not embedded in the basic programs, and the above outline can be implemented in various ways: for instance, greater compression could be achieved (at the expense of more elaborate programming) by treating values repeated only 2 or 3 times as if they were different (the values themselves take up less space than the extra counts required).

Our method is to provide a preliminary call which takes a 2-dimensional array representing an image and returns a sequence of values with counts inserted, ready to be encoded with the descriptors which are likewise returned by the program (with the element concerned, e.g. pixel value, and increments inserted). This is only one way of run-length encoding an image: the user can, of course, replace the call to RUNLEN by any program which produces valid sequences of values and descriptors to be passed to the encoding program.

1   005001          initial latitude (minus increment)

2   005011          latitude increment from row to row

3   113000          replicate the rows of the image

4   031002          number of rows



5    006001         initial longitude (minus increment)

6    110000         replicate "parcels" of different and same in row

7    031002         number of parcels in row



8     006011        longitude increment along row

9     101000        repeat a string of different values

10    031002        number of different values

11    030001        descriptor for pixel element itself

12    104000        replicate runs of identical values

13    031002        number of runs



14     006011       longitude increment along row

15     101000       replicate a string of identical values

16     031012       number of identical values

17     030001       descriptor for pixel element itself

4 Quality Operations

The quality operations finally accepted in 1994 were the first major extension of BUFR and called for extensive reprogramming. The definitions of the operations are not clearly expressed and some points remain ambiguous, so the ideas involved, the assumptions made and the programming involved will be discussed here in some detail.

Because this proposal was under consideration for so long we had to implement a comparable scheme to meet UK Met Office needs long before acceptance. Its code tables are the same as those accepted and its operators can be distinguished from the new ones, so provision for it has been left in the software as we have no plans for immediate conversion to the new operations in our own data base. This chapter therefore ends with a brief description of our BUFR extension.

4.1 Bit maps

All the operations to add flags or values depend on the relation between operators and bit maps, so we start with a definition of a bit map.

A bit map is a set of values of the one-bit flag element 031031 (0 - data present, 1 - data not present). An N-bit map defines a subset of the N elements (elements rather than descriptors!) preceding an operator of the form 2XX000, where XX=22, 23, 24, 25 or 32. Elements here means effectively values in the data section, i.e. any delayed replication counts are included.

If M bits in a bit map are zero, then values of the corresponding M elements will follow in the data section as the result of any operation which uses this bit map. These values will be corrections, original values, differences, statistics etc as indicated by XX (together with 008023 or 008024 if XX is 24 or 25) or Class 33 elements in the case of 222000. But the values may not follow immediately and may not be consecutive; their positions in the data that follows will be shown by M place-holders of the form 2XX255 or M Class 33 descriptors. The I-th place-holder corresponds to a value of the I-th of the M elements with zeros in the bit map, encoded with its scale, data width & reference value as modified by any operations in force for the original value.

4.2 Bit maps and operators

That much is clear. But we need to relate bit maps to operators. Each quality operator needs a bit map, in the same way as a delayed replication operator is completed by a following count; but the bit map is much less closely tied to a particular operator. And the elements a bit map refers back to may be those before a previous operator.

The set of operators finally accepted has redundancies resulting from the different versions the proposal went through. Of the four operators added later, 236000, 237000, 237255 & 235000, only 235000 is essential as the proposal now stands, and its definition is too restrictive.

236000 defines a bit map for use later, but a bit map can be recognised without it. 237000 reuses a bit map, but only one bit map can be currently defined, so again the descriptor is unnecessary. 237255 cancels a bit map, but a new bit map, taken to supersede the old one, would have the same effect. Only 235000 is essential: it unsets the end of the set of values referred back to by a bit map, leaving the next 2XX000 (where XX is 22 to 25 or 32) to reset it. Without this all quality operations would refer back to the same point.

4.3 Assumptions made where specification is ambiguous

We want our decode to be successful with messages from as many different encoders as possible, so we adopt the least restrictive interpretation: we assume that any replication of the single element 031031 is a bit map and overrides any previously defined bit map. By replication we mean that a replication operator is used rather than the descriptor 031031 just being repeated so many times; the replication can be delayed or not. The point from which to count back is defined by the first 2XX000 operator with XX = 22 to 25 or 32 or by the first such operator after 235000. So different bit maps can be used to refer back to (different subsets of) the same set of elements.

Our decode also allows the same bit map to be used for different sets of elements. This possibility is, strictly speaking, ruled out by the operations as currently defined, but taking the least restrictive approach we see no reason why 235000 should cancel the bit map at the same time as changing the set of elements referred back to. If a new bit map follows, it will override the previous one; if not, the previous bit map can be left in force.

The only alternative is to stop the decode because a rule has been broken, whereas it may well be possible to continue successfully. But remember that, while this may be a useful feature, messages should still be encoded to follow the rules as closely as possible, or more restrictive decodes may fail!

4.4 Outline of program changes

These new operations call for a different programming approach. Without them there was no need to preserve details of how a value was encoded: field width, scale etc could be used and then discarded. But now we must be able to refer back to these details, which may be different from those in Table B. We must therefore keep a log of values encoded or decoded, keeping field width, scale & reference value, as well as the subscript of the descriptor in the expanded descriptor array (N.B there is not always a one-to-one correspondence between descriptors and values!). This sounds like a big overhead, but in fact it makes no noticeable difference to the times taken (which implies that BUFR encode/decode is a very time-consuming process already!).

Given this log, we need action to carry out quality operations at the following points:

4.5 Interface between decode and calling programs

The above is enough if the aim of a BUFR decode is to print out the values in a message. But the interface is not so clear-cut. There is information in the sequence of values that may be better made explicit. For instance, our decode currently precedes each associated field with a meaning (in case there is more than one such field) rather than leave the meaning set just once with the operator, but does nothing to show which coordinates apply to a particular value.

In the case of quality operations, if the message contains several temperatures and a correction to one of them, the decode as described above would print out a temperature but not make it clear which original value was being corrected. Rather than leave higher-level programs with the same manipulation of bit maps to repeat, we need pointers to link original value and correction in the output descriptor array. This array already needs to include scale change and (modified) replication operators as well as element descriptors, because (as explained in 2.3) information which may be needed would otherwise be lost.

As pointers we use the place-holders (because XX gives information about the value added by the quality operation) with numbers set in the top bits. Each place-holder was replaced above by a descriptor; to set these pointers we keep a list of descriptors to be inserted in the sequence before completing the decode. The n-th insertion in this list puts a place-holder with n in the top bits after the original value and an identical place-holder with n set after the correction or whatever value is added. More than one such pointer can follow the original value. We can then get from original value to correction or vice versa by searching for a uniquely identified descriptor.

4.6 An extension used for a similar purpose at the UK Met Office

So far (July 95) we have made no use of the above operations at the UK Met Office. But since 1990 we have used a comparable extension to keep model values and flags.

We need a set of comparable values or differences (observed minus analysed, observed minus forecast etc) to be attached to reported values. As well as the differences themselves we need descriptors to say whether the values are analysed, forecast, statistics, previous or neighbouring values, and also which model produced the forecast, the times of fields and so on.

We introduce an operator 223YYY which puts the Y descriptors which follow before each non-coordinate element to which the operation applies. The Y descriptors (or their expansion) may include 008023 and 008024, which will be taken as markers for added values or differences (respectively) of the element to which the sequence is attached; the descriptor for the element itself will be inserted as the attached sequence is expanded. The last descriptor of the sequence must be 008023 with a value of zero, to indicate that observed data (the original value) follows.

Added values of the element are encoded like the original value, with any changes of data width, scale and reference value in force; differences are encoded (at present) with a data width of N and a reference value of -2^(N-1), where N is the data width for the original descriptor. It would be better to use a width of N+1 and reference value of -2^N (as in the new operations, which had the benefit of our experience!), giving a range twice the original and centred on zero, which can encode all possible differences; at present any large relative humidity difference, for instance, may have to be set to missing.

The only changes of data width etc which apply to other elements in the attached sequence are those defined within it, which lapse when the original value is reached; any coordinate changes made within the sequence are likewise assumed to lapse at the end.

So the action taken for a descriptor with F=2, X=23 and Y>0 is as follows:

The start and end (marked by a descriptor 008023 indicating an observed value) of the descriptors defining the sequence of values are kept, the sequence being kept in its place until the operation is cancelled and inserted after any element descriptor, the element descriptor itself being put in the sequence after each 008023/024. An operator 223000 is put in the output descriptor array to mark the start of each occurrence of such a sequence.

5 TO SET UP A BUFR SYSTEM

5.1 Table access

The tables can be set up in machinable form as follows:
Table       Made by     from input    Size    Rough number of

            program     (file name)           entries (Oct 93)



  B         NEWTABLB     EDITABLB      44K      450 elements

  D         NEWTABLD     EDITABLD      16K      150 sequences

Codes       NEWCODES     EDITCODE      16K     120 code tables



Note: 120 code table is about 1000 code figures.
The input data (readable) can of course be edited to add new entries. These should be inserted so as to leave the descriptor numbers in sequence (rather than putting new entries at the end; i.e. no sort is done by NEWTABLEB etc). Remember that new sequences (in this version of Table D) must have no more than 16 descriptors (i.e. longer sequences should be broken up).

The three tables can be accessed (with the same file names as above) by the following programs:

TABLEB (X,Y,SCALE,REFVAL,WIDTH,FORMAT,NAME,UNITS)
returns the fields of the Table B entry for 0XXYYY,

where X & Y (integers) are input and the rest (3 integers and 3 character strings) are returned.

(WIDTH=0 if there is no entry 0XXYYY in Table B)

TABLED (X,Y,SEQ,NSEQ)

returns the sequence 3XXYYY in Table D,

where X & Y are input and NSEQ is the number of descriptors returned in SEQ

(all arguments are integer, NSEQ=0 if no sequence 3XXYYY in Table D)

CODE (DESCR,VALUE,WORDS)
returns in WORDS a description (not more than 12 characters) corresponding to the code figure VALUE of the descriptor DESCR (both integers)

(WORDS=' ' if no such code figure or value)

5.2 Programs to handle messages

(For a system with EBCDIC characters there are calls to EB2ASC and ASC2EB to translate from EBCDIC to ASCII and the other way round; these are replaced by dummies in a system with ASCII characters. ENCODI and DECODI, for values in integer arrays rather than real, but only a subset of the possible operations, and LOCALB/LOCALD can likewise be dummies if there is no intention to use them)
VALUE (STRING,IBEFOR,WIDTH)



      gets a value in WIDTH bits after the first IBEFOR bits of 

      STRING, where STRING is section 4 of a BUFR message (starting 

      with the length).



VALOUT (STRING,IBEFOR,WIDTH,value)



      puts a value in WIDTH bits after the first IBEFOR bits of

      STRING.

5.3 Calls for encoding and decoding

Once the programs have been compiled and the tables made, the following calls encode or decode messages:
ENCODE A VERSION 2 BUFR MESSAGE

===============================



CALL ENBUFV2(DESCR,VALUES,NDESCR,NELEM,NOBS,NAMES,DATIME,MESAGE,CMP,L,

             EDITION,MASTERTABLE,ORIGCENTRE,DATATYPE,DATASUBTYPE,

             VERMASTAB,VERLOCTAB,EXTRASECT1,CHARSECT1,EXTRASECT2,

             CHARSECT2,SECT3TYPE)



where



DESCR       Integer i/p then o/p : Is an integer list of BUFR descriptors,

            in an array big enough for any expansion needed. The array is

            changed following a BUFR encode, so needs to be reset if another

            encode is to be attempted with the orginal descriptors.



VALUES      Real i/p : Is a NOBS*NELEM real array of values to be 

            encoded (in the units given by Table B; set missing values to

            -9999999.0)



NDESCR      Integer i/p then o/p : Is the number of descriptors (if this 

            is zero, the descriptor sequence in MESAGE will be used; if

            the string needs expansion, NDESCR will be found changed on

            return).



NELEM       Integer i/p then o/p : Is the number of values implied by

            the descriptor sequence (not always the final value of NDESCR,

            because the output descriptors include some operators - see 2.3)



NOBS        Integer i/p : Is the number of sets of values to be encoded 

            together



NAMES       Character i/p : Is a character string containing any character

            values (for each of which, except those "inserted" by 205YYY, 

            the VALUES array contains a subscript pointing to the 

            start of a field in this string (the length coming from 

            Table B))



DATIME      Integer i/p : Is a 5-integer date/time (year, month, day, hour, 

            minute)



MESAGE      Character o/p : Is a character string for the output BUFR message

            (i.e. it will consist of binary data)



CMP         Logical i/p : Is TRUE if compression is required, FALSE if not



L           Integer o/p : Is the length of the BUFR message in octets



EDITION     Integer i/p : The BUFR edition number (section 1).

            Code -99 for the default (=2)



MASTERTABLE Integer i/p : The BUFR master table (section 1). Code -99

            for the default (=0)

            

ORIGCENTRE  Integer i/p : Originating centre (section 1). Code -99

            for the default (=74)

            

DATATYPE    Integer i/p : Data category type (section 1). Code -99

            for the default (=255)

            

DATASUBTYPE Integer i/p : Data category subtype (section 1). Code -99

            for the default (=0)

            

VERMASTAB   Integer i/p : Version number of master tables (section 1).

            Code -99 for the default (=2)

            

VERLOCTAB   Integer i/p : Version number of local tables (section 1).

            Code -99 for the default (=1)

            

EXTRASECT1  Logical i/p : Code TRUE if there is extra data to be added

            to the end of section 1. Is so, the data in CHARSECT1 will

            be added.

            

CHARSECT1   Character i/p : Extra data to add to the end of section 1.



EXTRASECT2  Logical i/p : Code TRUE if there is data to be to put in

            section 2. Is so, the data in CHARSECT2 will be added.

            

CHARSECT2   Character i/p : Extra data to put in section 2.



SECT3TYPE   Integer i/p : section 3, byte 7 (type of data). Code 1 for

            observed, o for other. Code -99 for default (=1)
(The length of MESAGE cannot be much more than the total length of the three inputs DESCR, VALUES & NAMES. The dimension of DESCR may have to be greater than NELEM, because some manipulations expand before deleting.)
DECODE ANY BUFR MESSAGE

=======================



CALL DEBUFR(DESCR,VALUES,NAMES,NDESCR,NOBS,MESAGE,DSPLAY)



where



DESCR       will be returned as an integer list of descriptors in 16-

            bit form (see 1.4),



VALUES      will be returned as a NOBS*NDESCR real array of values in 

            the units given by Table B,



NAMES       is a character string for any character values returned 

            (for each of which the VALUES array will contain 

            length*(2^16) plus a subscript pointing to the start of a 

            field in this string, the corresponding descriptor being 

            flagged by adding 2^17),



NDESCR      must be input as the length of DESCR and will be returned 

            as the output descriptor count. This must be at least twice the

            the number of descriptors actually returned as some workspace is

            needed by the DECODE routine,



NOBS        must be input as the length of VALUES and will be 

            returned as the number of sets of values (reports, 

            profiles),



MESAGE      is the input BUFR message,



DSPLAY      is set to TRUE for a display of element names and values.
(Unfortunately there is no way of telling how big DESCR, VALUES and NAMES must be without first decoding the message, hence dimensions are passed in NDESCR and NOBS to avoid overwriting.)