Construction of the Automated Data Processing and Delayed-Mode
Quality Control System for Profiling Floats
Yasushi TAKATSUKI*1 Yasuko ICHIKAWA*2 Taiyo KOBAYASHI*2
Keisuke MIZUNO*1 Kensuke TAKEUCHI*2
An automated data processing and quality management system for
profiling floats of the ARGO project has been constructed.
The system automatically processes profiling float data after
about 20 hours from its descent, and stores data in the database system.
The float data is publicized through World Wide Web.
For the quality control of float data, we prepared historical
database such as WOA98 and Hydrobase, and collect
spatially/temporally neighboring ocean data in geographical/time
on GTS via NEAR-GOOS Regional Real-Time Data Base.
The system enables us to quality management using such as
overlay plot of the float data and historical/neighboring data and so on.
Keywords: ARGO project, Automated data processing, quality control, database, WWW
*1 Ocean Observation and Research Department
*2 Frontier Observational Research System for Global Change
An international project was launched in 2000 for the construction of
a new ocean observation system through the worldwide deployment of
subsurface floats (profiling floats) that measure
vertical profiles of water temperature and salinity from the sea
surface down to a depth of 2,000 m.
The floats that will be used in this project feature the same design
as ALACE (Davis et al., 1992). A float drift
at the preset depth (normally 2,000 m) and resurface at specified
intervals (normally 10 days).
During its ascent to the surface, a float measures vertical
profiles of temperature and salinity, and transmits the data
at the surface via satellite. The
Argo project will collect 100,000 temperature/salinity
profiles annually by deploying
approximately 3,000 floats in oceans worldwide (Roemmich and Owens, 2000).
The Argo project is part of the Global Ocean
Observing System (GOOS) and will contribute to the
Climate Variability and Predictability Study (CLIVAR) and
the Global Ocean Data Assimilation Experiment (GODAE).
The project is supported by the World
Meteorological Organization (WMO) and
the Intergovernmental Oceanographic Commission (IOC) of UNESCO.
In Japan, the 5-year project, "Construction of the Advanced
Ocean Monitoring System (ARGO project)," was launched in 2000
as part of the millennium project (Mizuno, 2000).
This project is unique to Japan,
but it is also actual contribution to the international Argo project.
The Ocean Observation and Research Department of
the Japan Marine Science and Technology Center and the Frontier
Observational Research System for Global Change (JAMSTEC/FORSGC)
will be in charge of
constructing an observational data-processing and management system
for this project, which will
consist of storing the data from deployed floats,
performing delayed-mode quality control, and
conducting data distribution through the "database system."
Here, we will describe a summary of the
automated data-processing and quality control system implemented
in FY 2000, which is part of the "database system".
2. System Overview
Data quality control in the Argo project is performed in two steps:
the automated quality control
performed within 1 to 2 days after data acquisition, and
delayed-mode quality control is performed
within 3 months after data acquisition.
The system carries out both real-time data processing,
which performs data acquisition including automated
quality control on a server, and the delayed-mode
quality-control processing, which compares the acquired data
with historical data and/or in-situ data
using a quality control program on a PC,
and also performs quality flag assignment and data
correction. Figure 1 shows the conceptual design of the system.
The system is operated on local network in JAMSTEC,
so data retrieval from the Internet must be conducted by e-mail
through a mail server of JAMSTEC and
by the use of telnet/ftp through the firewall.
Due to security problems, data distribution to the Internet
at the present time will be performed using the
HTTP protocol on the World Wide Web.
The data will be managed by the database system and
manupulated by SQL.
Figure 1. Conceptual drawing of the data processing system for profiling floats.
3. System Design
The following points were taken into consideration
in the system construction. First, all processing,
from float data reception to automated quality control
to distribution via the Web, will basically be
performed automatically. Even in the event of a system failure,
services will continue whenever
possible. Second, the quality-control procedures for
the system should require minimal knowledge
of the database system management and the like.
The programs were written
primarily in Perl, a scripting language, for
easy maintenance, and IDL was used as the graphics tool.
3.1. Improvement of system availability
Figure 2 shows the hardware components of the server part.
To prevent an interruption in service
due to hardware failure, two server machines with identical
components are provided, and
clustering software (VERITAS Cluster Server) is installed on
both machines. The clustering
software monitors the operational status of the each
other machine through a direct Ethernet
connection and periodically monitors the operational
status of the softwares it manages. Each
service managed by the clustering software is assigned
a virtual host name ("floatdb1" for the
database server function and "floatdb2" for
the Web server function) and an IP address. By
accessing the virtual host, the user may access a service
without having to know which server machine actually
performs the service. Normally, the database
server (floatdb1) runs on the machine float1adm,
and the Web server (floatdb2) runs on the
machine float2adm. However,
in the event of a functional failure in one of the servers, the service
of the failed machine is terminated and the other
machine takes over the service.
Figure 2. Hardware components of Database server and Web server.
To enable data recovery in the event of a failure
in the database file, a backup of the database is
made daily by the automatic backup software (VERITAS Netbackup)
and the DLT tape changer.
Currently, the backup data is set to be stored on tape for at least 7 days.
To protect the entire system from unexpected power failures,
it is connected to an uninterruptible
power supply (UPS) unit. The entire system is shut down
if the power failure lasts longer than 10
minutes. After power is restored, the system restarts automatically.
3.2. Automated Data Processing
The system automatically processes float data at constant
time intervals. The flow of the
automated processing is shown in Fig. 3. The details of
each step are provided in the following
sections. The data processing functions described in
Sections 3.2.1 to 3.2.4 are performed on the
database server, and those in Section 3.2.5 are performed
on the Web server.
Figure 3. Flowchart of automatic float data processing.
3.2.1. Retrieve and Classify Data
A float transmits data using the ARGOS system.
The ARGOS system currently (July 2001)
has five polar orbiting satellites. These satellites have
sun-synchronous polar orbits with orbiting
altitudes of 850 km. It takes about 102 minutes
to complete one revolution around the Earth,
and shift its orbit 25 degrees
westerly per revolution.
Two satellites pass over a certain
region 6 to 7 times per day at the equatorial
regions, and roughly 28 times per day at the polar regions.
Transmitting data is possible for 8-15 minutes
each time it passes over a float (CLS/Argos, 1996).
The ARGOS system normally delivers only the data received by
two satellites, but use of the multi-satellite service makes it
possible to acquire all available data received by
satellites in operation.
The maximum data volume that can be
the ARGOS system is limited to 32 bytes.
The size of data transmitted
by a float in a single resurfacing (float
internal information and the pressure, temperature and
salinity data measured for approximately 60
layers) is approximately 400 bytes.
Therefore, the float divides
the data into 12 to 14 message blocks and
transmits them in sequence. Due to the restrictions of
the ARGOS system, the transmission repetition period
of the ARGOS Platform Transmitter Terminal (PTT)
onboard the floats that have been
deployed thus far has been either 44 or 90 seconds.
It is therefore difficult to transmit all of the
data during single satellite pass over the float.
Furthermore, as the position of the float on the surface is
determined by the ARGOS system, all of the float data
obtained while it is on the surface should also
be collected. The data received by the ARGOS satellite is
delivered by e-mail via ground stations.
With the current e-mail delivery settings, all received
data should be delivered. However,
compared to the data delivered by floppy disk once per month,
the data sent by e-mail is sometimes lack some data blocks.
To prevent problems in the e-mail delivery,
the system also access to the
Service ARGOS host by telnet every
6 hours in order to retrieve data.
The data acquired by e-mail or telnet is classified by ARGOS ID,
and is then stored in the respective
work files for each ARGOS ID to process later.
During the classification process,
data not in the ARGOS format is rejected as invalid.
Figure 4 shows the histogram distribution of the time required
to deliver data
by e-mail. The time required for a satellite to move to a
position above a ground station is included
in this time. Over 40% of the data is delivered within 1 hour
after data transmission, and over 98% of
the data is delivered within 12 hours. The longest elapsed time
has been 22 hours.
The current system begins the decoding process when no new data has been
added by e-mail nor telnet within
the past 12 hours, since (1) the flyover interval of a satellite
is 2 to 3 hours, (2) more than 20 hours
may be required for the satellite to deliver the received data,
and (3) the ARGOS system may
recalculate the positioning data. As parallel acquisition of
data is made by e-mail and telnet, there
have yet to be any cases in which new data was acquired after
the decoding processing during
normal operations. In rare cases in which new data is acquired,
it is handled as a duplicate data
error by the database when insert data if the data
contains profile-number information.
However, if the profile-number information is lacked,
the data will be inserted to database as new data
and must be confirmed and deleted manually from the database.
Therefore, all steps in the processing are
recorded in the job report, and we need to check
the decoded data against the float schedule.
Figure 4. Histogram and cumulative receive rate as a function of the
elapsed time after transmission.
3.2.2. Decoding Data
The received data may contain errors occured during transmission.
Therefore, the first byte of
the 32 bytes of transmission data is assigned to the 8-bits Cyclic
Redundancy Check (CRC) code calculated from
the remaining 31 bytes. The CRC effectively detects burst errors
(successive bit errors) and has revealed that
nearly 20% of the received data
contains errors (Nakajima et al., 2001).
An average of 120 pieces of data are received
every time a float resurfaces, so if data containing CRC errors
is removed, an average of 90 pieces
of data are obtained. As a single float datum contains
12 to 14 message blocks, the entire datum
cannot be decoded unless all of the blocks are received.
It can be seen from the number of
messages received up to July 2001 (Fig. 5) that
approximately 1% of the message blocks were not
received at all. The probability means that one
incomplete profile will occur per resurfacing of 10 floats.
Fortunately, the non-received block tends
to be found in a specific profile, so of the
235 profiles obtained thus far, only 10 were incomplete (less than 5%).
Figure 5. Histogram of duplicate number for each data message block.
For the 8-bits CRC, a bit error cannot be detected
with a probability of 1/2^8. Several
cases were found in which the data sent from the floats
contained errors that could not be detected
by the CRC (Fig. 6). However, as the probability of the same
errors occur for different transmission is very low,
erroneous data is rejected in
the actual processing based on most frequently data pattern.
If there is no difference in the
pattern frequency or there is only one piece of
data for the corresponding block, there is a possibility that
the block contains an error. Therefore, these blocks are
recorded in the job report.
Figure 6. Example of error data that passed CRC check.
3.2.3. Automated Quality Control
For the Argo project, common automated quality-control
processing has been adopted
internationally. Currently, discussions are underway
regarding the actual processing method, and
a decision should be made at the Argo Data Management Team
Meeting scheduled for September
2001. As soon as the contents of the automated quality-control
processing are decided, quality
control at JAMSTEC/FORSGC will be conducted in accordance
with the decided procedure.
Meanwhile, some form of automated quality control should be
performed to prevent erroneous data
from being distributed on the WWW. Therefore,
we perform the automated quality-control procedure
referenced to the Real-time Quality Control Manual of the
Global Temperature and Salinity Pilot Program (IOC, 1990).
The current automated quality-control procedures are described below.
If the data applies to any of the check items above,
the data flag is set to 3,
that means "possibly erroneous value".
- Positioning Check:
The position of a float is
calculated from the Doppler effect of the signal
received by the ARGOS satellite.
On average, 10 positions are
determined in this manner while a float is on the surface.
The routine confirms that the
float's drifting velocity calculated from these positions and
time does not exceed 5 kt,
and also that the drifting velocity does not
exceed the average drifting velocity during
the surfacing by a factor greater than 2.5.
- Global Range Check of Temperature, Salinity, and Depth:
The routine confirms that
the data values fall within the range of -2 to 35 deg-C
for temperature, 0 to 40 for salinity,
and 0 to 10,000 km for depth, which are values normally
observed in the open sea.
- Depth Check:
The depth data, ETOPO5 (NOAA, 1988)
for a point nearest the
resurfacing point of the float and the maximum depth
recorded in the float data is
compared to confirm that the seafloor depth is deeper.
- Pressure Inversion Check:
Confirms that the pressure
value increases in the proper sequence.
- Range Check by Depth:
Confirms that the temperature
and salinity values by depth
do not fall outside the range specified in Table 1.
- Freezing Point Check:
Confirms that the water temperature is not lower than the
freezing point calculated by the following equation
T=-0.0575 S+1.710523 E-3 S^(3/2)
-2.154996E-4 S^2-7.53E-4 P
Here, S is the practical salinity and P is the pressure (dbar).
- Spike Check:
Confirms that the value calculated
for the vertical profile of temperature
and salinity using the following equation has no
spike higher than the threshold value
(water temperature of 2.0 deg-C and salinity of 0.3):
Vtest = |V2 - (V3 + V1)/2|-|V1 - V3|/2
Here, V2 is the value of the layer to be tested,
and V1 and V3 are the values of the
layers directly above and below the tested layer, respectively.
- Slope Check:
Confirms that the value calculated using
the following equation for the
vertical profile of temperature and salinity has no slope
steeper than the threshold value
(temperature of 10 deg-C and salinity of 5.0):
Vtest = |V2 - (V3 + V1)/2|
Here, V2 is the value of the layer to be tested,
and V1 and V3 are the values of the
layers directly above and below the tested layer, respectively.
- Density Inversion Check:
Confirms that there
are no inversions in the density
calculated from the temperature and salinity values.
Table 1. Range check value of temperature and salinity for each depth range used for automatic quality control.
3.2.4. Data Insert to the Database
The number of messages received by the ARGOS satellite, and
the time and date of the last update
is inserted to the database along with the pressure,
temperature, salinity, and float internal
information obtained through decoding; all positioning
information from the ARGOS system; and
the data flag following quality control processing.
If the observation values are corrected or a
change is made to the data flag in later quality control
processing, the entire revision history and the
values prior to revision are also recorded in the database.
3.2.5. Automated Update of Webpages
The webpages currently has the structure shown in Fig. 7.
The system checks the database for new or updated data every 3 hours,
and generate/update the pages for the corresponding float data.
All pages are updated once every 24 hours.
Figure 7. World Wide Web Site structure of the
"Japan ARGO Delayed-mode Data base"
3.3. Float Information Management
Metadata such as the float type, serial number,
float deployment information, and settings for
the drift/resurface time is inserted to the database,
as with float observation data. Insertion to
or updating of the database is made using a Web browser
for the sake of convenience (Fig. 8).
These pages are only accessible from the internal network,
and another http server is dedicated
to float information management.
Figure 8. Top page of the "Float information management system."
3.4. Quality control of Float Data
The Argo project aims to achieve a accuracy of 0.005 deg-C
in temperature and 0.01 in practical salinity.
However, conductivity sensors are greatly affected by
slight physical deformations and/or impurities on
the sensor surface and likely to suffer large deviations in
accuracy. It is therefore extremely difficult
to maintain high accuracy during long-term operations.
Conductivity sensors with long-term stability
are currently under development, but it is also
important to control the quality of data acquired by
floats that have already been deployed. In the past,
some studies have been made by Freeland (1997) and
Bacon et al. (2001) to correct salinity data obtained by the floats
using historical data or
temporally/spatially neighboring data. At JAMSTEC as well,
quality-control methods are being
examined in the ARGO project (Kobayashi et al., 2001).
The developed system enables a comparison between the float data and the
historical data (Fig. 9), a
comparison of pieces of float data, and comparison of the float
data to temporally/spatially
neighboring data on a PC. The system also makes it possible to
change the flag on the screen
and to update the database according to the above comparison.
The World Ocean Atlas 1998 (NOAA, 1999),
published by the National Oceanographic Data Center of
the National Oceanic and Atmospheric Administration,
and HydroBase (Macdonald et al., 2001) data are
prepared in the database as historical data
for quality control.
For temporally/spatially neighboring data, the global
subsurface temperature and salinity data obtained through
the GTS managed by the Regional Real-Time Database (RRTDB) of
the North East Asian Regional GOOS (Yoshida and Toyoshima, 2001)
retrieved every day by ftp and inserted to the database for comparison.
Figure 9. Sample screen shot of the "Float data quality control program."
Comparison graph of Float data and historical data (WOA98).
4. System Improvement
The data has been publicized on the WWW by the present system since
April 1, 2001. On one
occasion, the database server went down due to a file system full
caused by inappropriate script
settings, but otherwise there have been no major problems in the
automated data processing.
The following items are being examined as themes for the future
with the progress of the Argo project.
4.1. Full adaption to All Argo Floats
The present system is designed to handle only data from the
profiling floats deployed by JAMSTEC.
At the International Argo Data Management Group workshop held in
October 2000, it was agreed
that all float data should have a common format, and that all
data should be exchanged through
two global centers. The common format is to be decided at the
Argo Data Management Team
Meeting scheduled for September 2001. When the format has been
finalized and the global
centers begin operations, the system at JAMSTEC will be
revised so that the database
will be able to handle all float data and that JAMSTEC
will be able to provide global Argo float data.
4.2. Periods to real-time processing
At present, 12 to 32 hours are required after a float ends
transmission and descends for the
completion of all real-time processing in automated data
processing (Fig. 10). The average
processing time is approximately 20 hours, and data processing is
complete within 24 hours for 90%
of the data. To achieve the goal of the millennium project,
which is to improve the precision of
long-term prediction, the real-time data should quickly be
assimilated into the model. To reduce
the time required for the real-time processing of float data,
it is necessary to reduce the time
required for data acquisition, the most time-consuming process.
It is also required to examine the suitable standby time
before the decoding processing is performed, which is currently
uniformly set at 12 hours.
Adopting a satellite communication system other than the ARGOS system
may also reduce the time required for data acquisition.
Figure 10. Histogram of the elapsed time from the float descent
to decode data.
4.3. Reconstruction of Error Data
As previously mentioned in Section 3.2.2, the present communication
system has a high error rate,
and CRC errors may even be detected in all data for certain data block.
In the performance test on the ARGOS
system conducted by Sherman (1992), no difference was found
in the characteristics of errors for the two
patterns of a repetition of 1 and an alternation of 1 and 0,
and it was therefore concluded that
errors in the ARGOS system result not from simple bit loss,
but rather from a noise burst that causes
errors in several successive bits.
Figure 11 shows an example
of data block with CRC errors. There are
some portions in which many bits have been inverted, and
other portions in which only a few bits
have been inverted. However, even if all the data received
for certain message data block contain errors,
it may be possible to reconstruct the most
likely data sequence by comparing each bit in the received
blocks and calculating the CRC.
Figure 12 shows an example of such reconstruction for the data in Fig. 11.
It is preferable to reconstruct data by examining the
results of decoding and following this
method. We are therefore examining convenient methods for the
reconstruction of error data on the quality-control PC.
Figure 11. Example of bit error in ARGOS message.
Figure 12. Example of recovery data from bit error shown in Fig. 11.
4.4. Efficient Delayed-Mode Quality Control
Delayed-mode quality control at JAMSTEC is basically performed
for data acquired by floats
deployed by JAMSTEC. With the progress of the Argo project,
the number of floats collecting data
for quality control will increase significantly, and it will
therefore be necessary periodically to execute
a data quality-control program that supports the quality-control
jobs, and to efficiently perform
quality control using the report produced by the program.
We must examine what type of
information the report should contain to advance the
quality-control jobs and incorporate it into the system.
A system for automated data-processing and quality control of
float data was constructed as a
database system to store data from deployed floats, perform
delayed-mode quality control, and
publicize data, for the processing and management of
observation data collected by the project,
"Construction of the Advanced Ocean Monitoring System (ARGO Project),"
which is part of the
Millennium Project. The system has two main functions:
automated real-time data-processing of
float data, and delayed-mode advanced quality control.
The automated real-time data processing
can distribute observation data on the WWW within one to
two days after the float's resurfacing.
For quality-control purposes, historical data (WOA98, HydroBase)
and the temporally/spatially
neighboring data are provided to enable on-screen comparison
with the float data.
In 2001, the system will be revised for compatibility
with data from all Argo floats, to enable
reconstruction of error data, and to perform efficient
delayed-mode quality control.
D. Swift of the University of Washington provided the informations
on the data processing system for the
profiling floats operated at the University of Washington.
Dr. R. Molinari and the staff members at
the GOOS Center of the Atlantic Oceanographic and
Meteorological Laboratory (AOML) of the
National Oceanic and Atmospheric Administration (NOAA)
provided the informations on the quality control
processing conducted at the GOOS Center. We would like to
express our deep appreciation to all
of these individuals.
- Davis, R.E., D. C. Webb, L. A. Regier, and J. Dufour,
"The Autonomous Lagrangian Circulation Explorer (ALACE)",
J. Atm. and Oceanic Technol., 9 (3), 264-285 (1992).
- Roemmich, D. and W. B. Owens,
"The Argo project: global ocean observations for
understanding and prediction of climate variablity",
Oceanography, 13, 45-50 (2000).
- Mizuno, K.,
"A plan of the establishment of Advanced Ocean Observation
System (Japan ARGO)" (in Japanese),
Techno Marine, 854, 485-490 (2000).
- CLS/Service Argos, Users Manual 1.0.
(CLS/Service Argos, Inc., January 1996).
- Nakajima, H., Y. Takatsuki, K. Mizuno, K. Takeuchi, and N. Shikama,
"Data communication status of the ARGO floats"
(in Japanese with English abstract),
JAMSTECR, 44, 153-161 (2001).
- IOC, Manuals and Guides #22
"GTSPP Real-time Quality control manual" (1990).
Data Announcement 88-MGG-02,
Degital relief of the Surface of the Earth.
(NOAA, National Geophysical Data Center, Boulder, Colorado, 1988).
"Argorithms for Computation of Fundamental Properties of Seawater."
UNESCO Technical Papers in marine science, 44 (1983).
- Freeland, H.,
"Calibration of the Conductivity Cells on P-ALACE Floats"
1997 U.S. WOCE Report, 37-38, (1997).
- Bacon, S., L. R. Centurioni and W. J. Gould,
"The Evaluation of Salinity measurements from PALACE Floats",
J. Atm. and Oceanic Technol., 18 (7), 1258-1266 (2001).
- Kobayashi, T., Y. Ichikawa, Y. Takatsuki, T. Suga, N. Iwasaka,
K. Ando, K. Mizuno, N. Shikama, and K. Takeuchi,
"Quality control of Argo data based on high quality climatological
data set (HydroBase) I"
(in Japanese with English abstract),
JAMSTECR, 44, 101-114 (2001).
World Ocean Atlas 1998 (WOA98),
(NOAA, National Oceanographic Data Center, Ocean Climate Laboratory,
- Macdonald, A. M., T. Suga and R. G. Curry,
"An isopycnally averaged North Pacific climatology",
J. Atm. and Oceanic Technol., 18 (3), 394-420 (2001).
- Yoshida, T. and S. Toyoshima,
"Present status and future view of
data management in NEAR-GOOS" (in Japanese),
Kaiyo Monthly, 33 (5), 311-316 (2001).
- Sherman, J.,
"Observations of Argos Performance",
J. Atm. and Oceanic Technol., 9 (6), 323-328 (1992).
Appendix. Data Format of the Profiling Float
The floats that have been deployed to date by the Japan Marine
Science and Technology Center
and the Frontier Observational Research System for Global Change
(JAMSTEC/FORSGC) use the
ARGOS system for data transmission. The data is transmitted in
blocks of 32 bytes in hexadecimal
notation (Fig. A1). There are currently two types of data formats,
which are presented in Fig. A2.
The conversion to water temperature (T), salinity (S), and
pressure (P) is performed using the
equations shown below. Here, BH and BL are the values of the
higher byte and lower byte,
respectively (both have the range 0x00 - 0xFF [0-255 in decimal notation]).
the power supply voltage (V),
the internal pressure of the float (p), and the piston
motor drive time (t) are calculated
using the following equations. Here, B is the value of the
corresponding byte and BH and BL
are the values of the higher and lower bytes, respectively.
||(deg-C, for 0<=(BH *256+BL)<=62536)
||(deg-C, for (BH *256+BL)>62536)
|S=(BH *256+BL)/1000||(in PSS78)
In addition, note that 5 dbars are added to the
"pressure at the surface immediately before the last
descent" in the encoding, so it will be necessary to
subtract 5 dbars from the value converted using
the equation above.
All other items take the value of the corresponding byte.
|t=(BH *256+BL)*2||(sec *only type A2)
Figure A1. Sample message from ARGOS system.
Figure A2. Data format arrangement of profiling floats in operation.
(a) First message of type A1
(b) First message of type A2
(c) Other messages
Table A1. Data table of "Profile termination flag byte" in hexadecimal notation.
|00||Pressure reached surface pressure (Normaly terminated)|
|02||Pressure reached zero.|
|04||Pressure unchanged for 25 minutes. (Does not termintae profile)|
|08||Piston fully extended before surface.|
|10||UP time expired before surface and UP time was reset. (only for type A1)|