Guideline for data management in the Asian Monsoon Year

(draft version 2-beta by Masuda on 1 Nov. 2007)

1. [Principle]

Each particinant of the Asian Monsoon Year (AMY) will make its data products evantually "open". The meaning of "open" here is essentially the same as "the free and unrestricted international exchange of data and products" as mentioned in "WMO Resolution 40 (Cg-XII)" and "WMO Resolution 25 (Cg-XIII)". License condition of each data product with respect to commercial use and with respect to redistribution may not be uniform (see Section 2).

Observational research projects will make data from their regular observations available either "open" or as "AMY internal data sets" (see Section 3) by 1 year after the observation, and data from their all observations "open" by 2 years after the observation.

While consolidated data management of some part of AMY data is desirable (see section 6), large part of data management of AMY will be conducted in a distributed manner. Each project participating in AMY should assign its data depository (Section 4). The Data Management Working Group for AMY (DMWG) will make efforts of coordination, including compilation of a common data catalog (Section 5).

2. [Condition on use and distribution]

2.1 [AMY data products]

AMY data products should be "open". That is, they may be accessed by anyone, not limited to the participants of AMY. (However, those who have violated the condition of use and distribution may be excluded from access, as sanction.)

Users should acknowledge the originating project when they publish the results which utilize the data products. The documentation of the data should contain the identification of the products that can be used for citation.

In the following aspects, the condition of license may differ from one data product to another:

[Note 1] We use this term according to the WMO Resolutions as far as applicable. The Annex 4 to WMO Resolution 40 (Cg-XII) defines "For commercial purposes" as "For recompense beyond the incremental cost of reproduction and delivery." This definition refers to commercial distributors of meteorological information, and not to end users. The conditions for end users may be specified in the documentation. Otherwise we base on common sense with hints from the WMO Resolutions.
[Note 2] If redistribution is permitted, potential users may receive the data from another user. If it is not, they must contact the depository.
[Note 3] If a substantial part of a data product is incorporated in another product in such a manner that the original information can be effectively reconstructed, and the resulting product is distributed, then it is considered as a kind of redistribution.

The depository should make the condition readily visible to the users, and the users should respect the condition.

In AMY, the condition applied by default (i.e. unless otherwise specified) is "Those who want to do (a) or (b) must obtain an explicit agreement of the representative of the originating project. If the originating project has expired, the primary depository will decide on behalf of it." (This policy is not much different from that of CEOP Phase 1 or that of GAME, though not exactly the same.)

2.2 [AMY internal data sets]

Some data sets may be put in the state where only the participants of AMY can access them. This condition is applied only temporarily.

If a depository has data of this category, it should check identification of the user to make sure that the user is a participant of AMY.

2.3 [Special data sets]

It may happen that some data sets cannot be made "open" because of issues such as privacy, national security or intellectual property rights, but that they may be shared with limited members of collaborative research. Such data are not considered AMY data products, but information about such data may be included in the common catalog if the originating project as well as the DMWG consider it appropriate.

3. [Time frame]

This section refers to observational research projects only, but other participants will share the same spirit.

Data from regular observations should be brought either to the "open" state or to the state of "AMY internal data set" by 1 year after the observation. By "regular", we mean that the stations are monitored regularly, and that the work flow of observations and data management has been established when the activities of AMY start.

Every project participating in AMY, including the above cases, should make its data products "open" by 2 years after the observation.

It is encouraged to make data available earlier than these time limits.

Also it is encouraged to make data available in real-time, either through the operational telecommunication channel or the Internet. It will be helpful to both real forecasts and operations of observational research projects. Copies of the data which has been transmitted in real-time should also be included in the depository, preferably with better quality checking.

4. [Distributed data management]

Each project will assign a primary depository which is responsible for making its data product publicly available.

DMWG will help to find a depository when the originating project cannot find by itself.

By agreement with the originating project, the dataset may also be made publicly available from secondary depositories.

If the primary depository is going to disappear, it should be made sure that its role will be taken over by another institution.

The depository should make the data products available either by placing them on the Internet either permanently or responding to users' requests. (Sometimes the depository may need to deliver the data products off-line, but this may be limited to the cases when the user has little Internet access.)

The depository may modify the data format or add some quality checks, but it should also retain the original version which they have received.

5. [Common data catalog]

DMWG will organize AMY Data Management Mailing List (Data-ML) which includes representatives of all depositories.

With contributions by Data-ML members, DMWG will make a catalog of provisional data products, and continuously revise it. Eventually the catalog will contain information of actual data products with links to depositories. The editors of the catalog, assigned by DMWG, will frequently contact individual members of Data-ML to keep the catalog current.

The catalog will contain such information as, at least,

It is desirable that it also contains more detailed information about instrumentation and surrounding condition of the stations.

6. [Prospects of consolidated data management]

It is desirable to have various data in the same archive where a user can use them in an integrated manner. It requires, however, huge effort to unify the format of data as well as metadata (information of data). At the present stage, there is no concrete plan for consolidated data management. With a good focus of research, consolidation of data related to that theme will be fruitful. AMY will encourage such initiatives. In particular,

In order to make data available from the consolidated data depository to users, agreements between the depository and the originating project. DMWG will help achieving such agreements when requested.

Even in distributed data management, standardization of data formats and metadata is desirable. This subject will be coordinated in Data-ML.

References


2007-Nov-01
MASUDA Kooiti (One of co-chairs, AMY Data Management Working Group)
MAHASRI Data Management Team in Yokohama and Yokosuka
mahadmyy(at)jamstec.go.jp