RDA Dynamic Data Citation for frequently modifying High Resolution Climate Data Adopted By The Climate Change Centre Austria (CCCA)

  • Home
  • RDA Dynamic Data Citation for frequently modifying High Resolution Climate Data Adopted By The Climate Change Centre Austria (CCCA)

The Climate Change Centre Austria (CCCA) Data Centre expected a comprehensive project outcome of completely new simulated High Resolution Climate Scenarios for Austria in the time range from 1965 till 2100 on a daily basis. For consumption,  13 model runs, 5 meteorological parameters like temperature, 3 emission scenarios, over 1600 NetCDF files with an average size of 13 GB were calculated. How could we implement proper data management processes on such data packages? We were looking for best practices on persistent identifiers and sub-setting tools for such big data containers. By chance, I met members of the RDA Data Citation Working Group. The idea of using the RDA recommendation on dynamic data citation as a pilot “NetCDF Pilot Implementation of Climate Scenarios” was born. 

 

High Resolution Climate Data modify frequently, due to their complex dependencies and statistical methods for downscaling. In order to re-use these data and services in a reproducible manner, to share and cite, data analysts and researchers need a possibility to identify the exact version used.  Chris SchubertHead of CCCA-Data Centre

 

With the operational application for Dynamic Data Citation the data becomes significantly more attractive for data analysts. The user gets a dynamic generated citation text, which contains the original author, label of the dataset, versions, selected and applied subsetting parameters as well the alignment to the persistent identifier. For a new created and published subset, all metadata are inherited from the original ones and supplemented by the defined arguments, like the adapted bounding box, observed parameter and the name of the subset creator.

If we had not adopted the Dynamic Data Citation Sub-Set Service, CCCA's users would be forced to download data themselves and thus create an unintended first disruptive point against data provenance information. Data would still, for example, be prepared by selecting the area of interest and time range on the user’s desktop computer. Dynamic data citation clearly increases the handling of data quality through redraw-able corrections and improvements.