The IOOS Catalog (https://data.ioos.us/) is an open data portal containing IOOS’ portfolio of oceanographic observations and forecast products provided by IOOS’ 11 Regional Associations (RAs), functional Data Assembly Centers (DACs) such as the HF Radar DAC and Glider DAC, and IOOS’ federal partners. The Catalog inventories all IOOS Data Management (DMAC)-compliant data access service endpoints provided by these entities in a single metadata repository, for discovery by end users.
The Catalog is populated by ISO 19115 metadata records that describe the observations taken and forecast model outputs produced by the RAs and DACs using DMAC-recommended standard vocabularies and data formats wherever possible (netCDF-CF , ACDD ). The RAs and DACs publish their metadata to web accessible folders, or OGC CS-W services, and the Catalog harvests metadata from these locations on a daily basis. Because IOOS data provider metadata is often produced in an automated fashion by software reading native data file attributes (such as CF attribution in a netCDF file, for example), a daily automated harvest of the ISO XML metadata is necessary, in order to keep frequently varying information such as dataset time coverage current.
The Catalog provides a searchable user graphical user interface (https://data.ioos.us/) for interactive data discovery by users, as well as a native API (https://data.ioos.us/api/3) and OGC CS-W (https://data.ioos.us/csw?service=CSW&request=GetCapabilities) -compatible service for machine-based access to its inventory of data products. Downstream national IOOS products such as the IOOS Environmental Sensor Map (https://sensors.ioos.us) use the Catalog inventory for real-time sensor observation services to include in its map portal.
The IOOS Catalog is made up of three main components:
- the Data Catalog (the public user interface, based on CKAN open data catalog software)
- the Harvest Registry (an internal site used by data providers to manage metadata harvest sources)
- the Service Monitor (service uptime monitoring)
Additional information about the IOOS Catalog and components, including instructions for IOOS data providers, can be found here: https://ioos.github.io/catalog.
Catalog GitHub Repository: https://github.com/ioos/catalog. For information on project development timelines, Catalog source code, or to file an issue, please see the GitHub repo.
Frequently Asked Questions
What are the dataset filtering options provided by the IOOS Data Catalog?
The current faceted filtering options provided by the CKAN software underlying the Data Catalog are:
- Location (i.e. geographic bounding box), Organization (e.g. PacIOOS, GLOS, etc)
- Tags/keywords (e.g. sea_water_temperature, sea_water_electrical_conductivity)
- Formats (e.g. HTML, SOS, OPeNDAP, etc).
- Date and time
- Data providers
How can I search for datasets by time window?
The datasets search page provides an option to search by start and/or end time.
I sometimes get different search results from the CS-W service and the Data Catalog. Why is this?
The CS-W service is provided by pycsw, a Python-based OGC Catalog Service software. It has its own internal search implementation that sometimes gives different results than CKAN’s embedded search, which is powered by the Apache Solr search indexing software. There is a long-term plan to abstract pycsw’s search implementation to allow external pluggable search implementations (including Solr) to interoperate with pycsw but this functionality is not yet available.
Can I filter search results in the Data Catalog by IOOS data provider or platform operator?
Partially. The CI_RoleCode in ISO 19115 with a value of "originator" is used as the value of the data provider field. With development of standardized metadata profiles to represent this information in incoming metadata from the IOOS RAs and DACs, the CKAN catalog software could be extended to provide richer filtering options including concepts such as RA or platform operator. For more information on planned work on this, see: https://github.com/ioos/catalog/milestone/7.
How do I identify oceanographic forecast model output datasets in the Catalog versus oceanographic observations?
At the moment, there is not a straightforward technique for this. We hope to address this deficiency in the near future. As part of an upcoming development milestone (https://github.com/ioos/catalog/milestone/7), we will be investigating enhancements to both the CKAN software as well as developing metadata conventions to better enable our data providers to tag oceanographic forecast model output in a distinctive way that could then be represented in the Data Catalog interface to users. Also, the ability to filter data by time coverage should complement a forecast model tagging capability to search for data covering present time onward.
Can I search for datasets with particular GCMD keywords and/or CF standard names
Yes. Under the hood, CKAN uses Solr for search, and can accept Solr query strings in either the dataset search on the website or the
package_search CKAN API endpoint.
CF Standard Names are primarily provided by the `extras_cf_standard_names` field. An example query is below:
GCMD keywords work somewhat differently and are organized by hierarchy
For example a query of https://data.ioos.us/dataset?q=gcmd_keywords:"Earth Science > Oceans > Ocean Chemistry > Chlorophyll"
is equivalent to specifying the following values for
"EARTH SCIENCE > OCEANS > OCEAN CHEMISTRY > CHLOROPHYLL"
"Earth Science>Oceans>Ocean Chemistry>Chlorophyll"
"Earth Science > Oceans > Ocean Chemistry > Chlorophyll"
*Water\ Temperature'Note here the use of the escaped space per the Lucene query syntax.
. Climate and Forecast Conventions: http://cfconventions.org/
. Attribute Conventions for Dataset Discovery: http://wiki.esipfed.org/index.php/Attribute_Convention_for_Data_Discovery