| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

Prototype grammar for CF-metadata "standard names" (XSugar version)

Page history last edited by robertm 15 years, 8 months ago

[This note was prepared as an internal working document within Plasmo (Plant Systems Biology Modelling), a project funded by the UK's Biotechnology and Biological Sciences Research Council to develop a web-based portal for plant growth models.   It is posted here in case it is of interest to other groups concerned with the development of naming schemes for environmental variables.]

 

Introduction

This page documents an exercise in formalising a set of rules for composing "standard names" for variables from a set of basic terms.

 

I was pointed to the CF ("Climate and Forecasting") metadata initiative by Bryan Lawrence, who is head of BADC (British Atmospheric Data Centre), following a posting on his blog.

 

The CF-metadata aims to provide a metadata schema for NetCDF, a standardised data format for environmental data. Part of this initiative involves developing a set of standard terms (what they call "standard names") for environmental variables.

 

This is still largely based on the use of long atomic names, e.g. 'atmosphere_net_rate_of_absorption_of_longwave_energy'.   Recent discussions on the CF-metadata mailing list suggest that they are now entertaining the idea of allowing a couple of basic elements of a name (e.g. a chemical) to be represented separately in NetCDF, but that is still at a very rudimentary level. Each individual atomic name has to go though an approval process before it can be added to the CF-metadata dictionary (the "CF standard name table")

 

However, what they have done is to come up with a reasonably sophisticated set of Guidelines for the Construction of CF Standard Names. It is important to appreciate that, from the CF-metadata community's point-of-view, the name is still an atomic string, consisting of words and underscore characters. The Guidelines merely give a set of rules which can be used to construct this atomic string.

 

This is a great starting point for our (Plasmo's) goal of coming up with a compound naming scheme for model variables. The Guidelines give sets of terms (e.g. for 'process' and 'medium'), as well as specifying rules for combining these together to make standard names. While some of the actual terms will differ from those typically used to name variables in plant models, there are numerous cases where the same terms are used (e.g. 'area'). Moreover, the rules for combining them (in phrases such as 'ratio_of') will be common to both domains. And the fact that some group has already done a lot of the thinking about how to form composite names is a great help.

 

The XSugar stylesheet

XSugar is an ideal technology to use for formalising the CF-metadata guidelines, for the following reasons:

  • It uses something close to a standard grammar notation (BNF - Backus-Naur Form) for expressing the rules which are used to compose the CF-metadata standard names.
  • The "right-hand side" of each rule expresses the structure of the corresponding XML, so we can automatically generate XML for a composite name (just like MathML is a way of expressing an equation as a compound term which corresponds to the simple textual representation of the equation).
  • The bi-directionality of XSugar means that we can just as easily generate a simple atomic text string, equivalent to those in the CF-metadata table of standard names, from the XML representation.

 

I have developed an XSugar stylesheet which is very closely based on the CF-metadata Guidelines.   A brief overview of the stylesheet is given here. It would help if you have the CF-metadata Guidelines beside you as you read the rest of this note.

 

Transformations

This section relates closely to the Transformations table given in the Guidelines.

 

The heart of the stylesheet is the term 'standard_name'. This is defined recursively in terms of itself. This is rather like in mathematics having a rule which says that "an expression can be a square-root function whose argument is an expression". So, for example, we have the rule:

 

standard_name : "change_over_time_in_" [standard_name X]
              = <change_over_time_in> standard_name X </>

The rule begins with a name ('standard_name'). The rest of the first line defines the syntax for one possible form of standard_name: the text "change_over_time_in_" followed by... a standard_name. This means that if

   height

is a standard_name, then so is

  change_over_time_in_height

The second line of this rule gives the corresponding XML format. In this case, we use an element <change_over_time_in>. So, if this rule applied to the above example, the resulting XML would look like:

 

<change_over_time_in> 
  (some XML notation for representing height) 
</change_over_time_in>

 

Qualification

This section relates closely to the handling of Qualifications in the Guidelines.

The Guidelines give this template for constructing standard_names from various terms:

 

[surface] [component] standard_name [at surface] [in medium] 
[due to process] [where type] [assuming condition]

 

where each bit enclosed in square brackets is optional.

 

This already looks pretty close to an XSugar rule, so it is easy to make the rule. The corresponding rule is:

 

standard_name      : [surface S] [component C] [standard_name N] 
                     [at_surface AS] [in_medium M] [due_to_process P] 
                     [where_type T] [assuming_condition AC]
                   = [surface S] [component C] [standard_name N] 
                     [at_surface AS] [in_medium M] [due_to_process P] 
                     [where_type T] [assuming_condition AC]

In fact, there are only two differences. First, the rule has the XML half (after the = sign) as well as the plain-text half. Second, each term has a variable, which enables matching to be done between the plain-text and the XML parts of the rule.

 

Each part is defined by its own rule, which all have the same format. We can take 'surface' as an example:

 

surface            : [Surface S] "_"
                   = <surface> [Surface S] </>

surface            : =

There are 2 rules for this. The second rule allows for the case where we do not have a surface term in the standard name: it is XSugar's way of capturing the fact that it is optional. The first bit looks like the rule explained above. The main difference is that the first word in the square brackets starts with a capital letter. This means that we use a regular expression rather than an XSugar rule to define it. In this (and all other cases in this stylesheet), the regular expression is simply a list of possible words, taken directly from the Guidelines. So, in the case of Surface, it is:

 

Surface = toa|tropopause_|surface|adiabatic_condensation_level|cloud_top|
  convective_cloud_top|cloud_base|convective_cloud_base|freezing_level|
  ground_level|maximum_wind_speed_level|sea_floor|sea_ice_base|sea_level|
  top_of_atmosphere_boundary_layer|top_of_atmosphere_model|
  top_of_dry_convection

You can find exactly the same list in the Guidelines.

 

Generic names

What we have seen so far is how to express rules for composing standard_names from standard_names. At some stage this process has to bottom out. The Guidelines point to this by giving a table of what they call 'Generic names' (and are called BaseName in the current XSugar stylesheet). These are defined by another regular expression, which simply lists the names given in the 'Generic names' table in the Guidelines.

 

This regular expression is thus:

 

BaseName = amount|area|area_fraction|density|energy|energy_content|
  energy_density|frequency|frequency_of_occurence|heat_flux|heat_transport|
  horizontal_streamfunction|horizontal_velocity_potential|mass|mass_flux|
  mass_fraction|mass_mixing_ratio|mass_transport|mole_fraction|mole_flux|
  momentum_flux|partial_pressure|period|power|pressure|probability|
  radiative_flux|specific_eddy_kinetic_energy|speed|stress|temperature|
  thickness|velocity|volume|volume_flux|volume_fraction|volume_transport|
  vorticity

 

Example

The following is a text file created to test the stylesheet:

 

area
change_over_time_in_area
ratio_of_log10_frequency_to_density
cloud_top_change_over_time_in_area
 

These all correspond to the composition rules given in the Guidelines, but they are not necessarily accepted "standard names" - these have to go through an approval process.

 

And here is the XML created from these names.

 

<?xml version="1.0" encoding="windows-1252"?>
<standard_names>
  <standard_name>
    <basename>area</basename>
  </standard_name>
  <standard_name>
    <change_over_time_in>
      <basename>area</basename>
    </change_over_time_in>
  </standard_name>
  <standard_name>
    <ratio_of>
      <log10>
        <basename>frequency</basename>
      </log10>
      <basename>density</basename>
    </ratio_of>
  </standard_name>
  <standard_name>
    <surface>cloud_top</surface>
    <change_over_time_in>
      <basename>area</basename>
    </change_over_time_in>
  </standard_name>
</standard_names>

 

The complete XSugar stylesheet

The full XSugar stylesheet, containing both the regular expressions and the grammar rules described above, is available here.

 

The XSugar stylesheet is also shown on this page.  This is an experiment, using a wiki page both as the source document for XSugar itself, and as a human-readable page.  

Comments (0)

You don't have permission to comment on this page.