| 
View
 

Prototype grammar for CF-metadata "standard names" (Prolog version)

Page history last edited by robertm 12 years, 4 months ago

This page documents a grammar for CF-metadata "standard names" expressed using Prolog's built-in grammar notation.  

 

The CF-metadata initiative aims to provide a metadata schema for NetCDF, a standardised data format for environmental data. Part of this initiative involves developing a set of standard terms (what they call "standard names") for environmental variables.

 

This is still based on the use of long atomic names, e.g. 'atmosphere_net_rate_of_absorption_of_longwave_energy'.   Recent discussions on the CF-metadata mailing list suggest that they are now entertaining the idea of allowing a couple of basic elements of a name (e.g. a chemical) to be represented separately in NetCDF, but that is still at a very rudimentary level. Each individual atomic name has to go though an approval process before it can be added to the CF-metadata dictionary (the "CF standard name table")

 

However, what they have done is to come up with a reasonably sophisticated set of Guidelines for the Construction of CF Standard Names. It is important to appreciate that, from the CF-metadata community's point-of-view, the name is still an atomic string, consisting of words and underscore characters. The Guidelines merely give a set of rules which can be used to construct this atomic string.

 

 

 

What is "Prolog's built-in grammar notation"?

A Prolog ("Programming in Logic) program consists of a collection of facts and rules which together define a body of information.  This information can be queried by asking a question.  The Prolog interpreter tries to see if it can answer the questin by reasoning with the facts and rules.

 

Prolog is a logic-base programming language, widely use in artificial intelligence because of its ability to represent and reason with knowledge about some aspect of the world.

 

One use of Prolog is to express how compound structures can be built up from their parts - for example, how a sentence can be built up from noun-phrases and verb-phrases, which themselves can be built up from words.

 

Because this is such a common use, and corresponds closely to the concept of a grammar, Prolog has a special notation for expressing grammar rules.

 

For example, a simple Prolog grammar for English could be:

   sentence --> noun_phrase, verb_phrase.

   noun_phrase --> [the], noun.

   verb_phrase --> verb, noun_phrase.

   noun --> [dog].

   noun --> [bone

   verb --> [eats].

 

We can check whether a particular sentence is valid according the grammar by entering a query suh as the following into the Prolog interpreter:

?-sentence([the,dog,eats,the,bone],[]).

to which the Prolog interpreter answers 'yes' (in this case), or 'no' (if the sentence is not valid).

 

Don't worry about the syntactic details (it is easy to hide these from the user).  

 

Expressing the CF-metadata "standard name" guidelines as grammar rules

 

Transformations

This section relates closely to the Transformations table given in the Guidelines.

 

The heart of the stylesheet is the term 'standard_name'. This is defined recursively in terms of itself. This is rather like in mathematics having a rule which says that "an expression can be a square-root function whose argument is an expression". So, for example, we have the rule:

 

standard_name --> [change,over,time,in], standard_name.

The rule begins with a name ('standard_name'). The rest of the first line defines the syntax for one possible form of standard_name: the text "change_over_time_in_" followed by... a standard_name. The only difference is in the surface syntax, with words which stand for themselves enclosed in [...], and words separated by a comma instead of an underscore. 

 

This means that if

   height

is a standard_name, then so is

  change_over_time_in_height

 

Qualification

This section relates closely to the handling of Qualifications in the Guidelines.

The Guidelines give this template for constructing standard_names from various terms:

 

[<em>surface</em>] [<em>component</em>] standard_name [at <em>surface</em>] [in <em>medium</em>] 
[due to <em>process</em>] [where <em>type</em>] [assuming <em>condition</em>]

 

Note that here (i.e. in the Guidelines) the square brackets denote an optional term, whereas in the Prolog syntax square brackets denote a word which stands for itself.

 

This already looks pretty close to an XSugar rule, so it is easy to make the rule. The corresponding rule is:

 

standard_name <span style="font-size: 85%;" _mce_style="font-size: 85%;">--&gt; surface, component, standard_name, at_surface, 
                  in_medium, due_to_process, where_type, assuming_condition.</span>
 

Each part is defined by its own rule, which all have the same format. We can take due_to_process as an example:

 

due_to_process --> [due,to], process.

due_to_process --> [].

 

The first rule states that due_to_process consists of the words 'due' and 'to', followed by tehg name of a process (see below).      The second says that it can be empty - i.e. having a dure_to_process term is optional.

 

The individual porcesses are listed as follows, coreesponding exactly with the list of processes in the Guidelines.

 

process --> [advection]. 

process --> [convection].

process --> [deep,convection].

process --> [diabatic,processes].

process --> [diffusion].

process --> [dry,convection].

process --> [gravity,wave,drag].

process --> [gyre].

process --> [isostatic,adjustment].

process --> [large,scale,precipitation].

process --> [longwave,heating].

process --> [moist,convection].

process --> [overturning].

process --> [shallow,convection].

process --> [shortwave,heating].

process --> [thermodynamics].

 

 

Generic names

What we have seen so far is how to express rules for composing standard_names from standard_names. At some stage this process has to bottom out. The Guidelines point to this by giving a table of what they call 'Generic names' (and are 'basename' in the Prolog grammar). These are defined as a series of grammar terminals, as illustrated by the list below, which correspond to the first few in the Guidelines.

 

basename --> [amount].

basename --> [area].

basename --> [area,fraction].

basename --> [density].

basename --> [energy].

basename --> [energy,content].

basename --> [energy,density].

basename --> [frequency].

basename --> [frequency,of,occurence].

basename --> [heat,flux].

 

 

Examples

The following are examples of standard names parse successfully according to the grammar rules: 

downward_northward_stress_at_sea_ice_base

upward_heat_flux_at_ground_level_in_soil

 

  

 

Listing of the current version of the Prolog grammar rules for CF-metadata "standard names"

 

This listing gives the full current version of the granmar for CF-metadata "standard names".   It is based closely on the Guidelines.  However, only some small fraction of the current "standard names" are valid accordingto this grammar, so as experiment, I have added in a few more rules to illustrate how the grammar can be extended though the recognition of pattern in the "standard names".     

 

standard_name1 --> surface, component, standard_name, at_surface, in_medium,

due_to_process, where_type, assuming_condition.

 

standard_name --> basename.

standard_name --> [change,over,time,in], standard_name.

standard_name --> [convergence], [of], standard_name.

standard_name --> [horizontal], [convergence], [of], standard_name.

standard_name --> [correlation,of], standard_name, [and], standard_name.

standard_name --> [correlation,of], standard_name, [and], standard_name, [over], standard_name.

standard_name --> [covariance,of], standard_name, [and], standard_name.

standard_name --> [covariance,of], standard_name, [and], standard_name, [over], standard_name.

standard_name --> direction, [derivative,of], standard_name.

standard_name --> [derivative,of], standard_name, [wrt], standard_name.

standard_name --> [direction,of], standard_name.

standard_name --> [divergence,of], standard_name.

standard_name --> [horizontal,divergence_of], standard_name.

standard_name --> [histogram,of], standard_name.

standard_name --> [histogram,of], standard_name, [over], standard_name.

standard_name --> [integral,of], standard_name, [wrt], standard_name.

 

basename --> [amount].

basename --> [area].

basename --> [area,fraction].

basename --> [density].

basename --> [energy].

basename --> [energy,content].

basename --> [energy,density].

basename --> [frequency].

basename --> [frequency,of,occurence].

basename --> [heat,flux].

basename --> [heat,transport].

basename --> [horizontal,streamfunction].

basename --> [horizontal,velocity,potential].

basename --> [mass].

basename --> [mass,flux].

basename --> [mass,fraction].

basename --> [mass,mixing,ratio].

basename --> [mass,transport].

basename --> [mole,fraction].

basename --> [mole,flux].

basename --> [momentum,flux].

basename --> [partial,pressure].

basename --> [period].

basename --> [power].

basename --> [pressure].

basename --> [probability].

basename --> [radiative,flux].

basename --> [specific,eddy,kinetic,energy].

basename --> [speed].

basename --> [stress].

basename --> [temperature].

basename --> [thickness].

basename --> [velocity].

basename --> [volume].

basename --> [volume,flux].

basename --> [volume,fraction].

basename --> [volume,transport].

basename --> [vorticity].

 

 

assuming_condition --> [assuming], condition.

assuming_condition --> [].

 

condition --> [clear,sky].

condition --> [deep,snow].

condition --> [no,snow].

 

 

component --> direction, direction.

component --> direction.

component --> [].

 

direction --> [upward].

direction --> [downward].

direction --> [northward].

direction --> [southward].

direction --> [eastward].

direction --> [westward].

direction --> [x].

direction --> [y].

direction --> [].

 

 

in_medium --> [in], medium.

in_medium --> [].

 

medium --> [air].

medium --> [atmosphere,boundary,layer].

medium --> [mesosphere].

medium --> [sea,ice].

medium --> [sea,water].

medium --> [soil].

medium --> [soil,water].

medium --> [stratosphere].

medium --> [thermosphere].

medium --> [troposphere].

 

 

at_surface --> [at], surface.

at_surface --> [].

surface --> [toa].

surface --> [tropopause].

surface --> [surface].

surface --> [adiabatic,condensation,level].

surface --> [cloud,top].

surface --> [convective,cloud,top].

surface --> [cloud,base].

surface --> [convective,cloud,base].

surface --> [freezing,level].

surface --> [ground,level].

surface --> [maximum,wind,speed,level].

surface --> [sea,floor].

surface --> [sea,ice,base].

surface --> [sea,level].

surface --> [top,of,atmosphere,boundary,layer].

surface --> [top,of,atmosphere,model].

surface --> [top,of,dry,convection].

surface --> [].

 

 

due_to_process --> [due,to], process.

due_to_process --> [].

 

process --> [advection].

process --> [convection].

process --> [deep,convection].

process --> [diabatic,processes].

process --> [diffusion].

process --> [dry,convection].

process --> [gravity,wave,drag].

process --> [gyre].

process --> [isostatic,adjustment].

process --> [large,scale,precipitation].

process --> [longwave,heating].

process --> [moist,convection].

process --> [overturning].

process --> [shallow,convection].

process --> [shortwave,heating].

process --> [thermodynamics].

 

 

where_type --> [where], wheretype.

where_type --> [].

 

wheretype --> [cloud].

wheretype --> [land].

wheretype --> [open,sea].

wheretype --> [sea].

wheretype --> [sea,ice].

wheretype --> [vegetation].

wheretype --> [].

 

Comments (0)

You don't have permission to comment on this page.