Requesting A Data Catalog¶
Almost every site or ‘station’ in the NWIS network collects more than one type of data. A simple way to find out what gets collected at a station would be to request everything collected over the past day, like this:
[1]:
import hydrofunctions as hf
karthaus = hf.NWIS('01542500', 'iv', period='P1D')
Requested data from https://waterservices.usgs.gov/nwis/iv/?format=json%2C1.1&sites=01542500&period=P1D
You can list what is contained in the request:
[2]:
karthaus
[2]:
USGS:01542500: WB Susquehanna River at Karthaus, PA
00010: <15 * Minutes> Temperature, water, degrees Celsius
00060: <15 * Minutes> Discharge, cubic feet per second
00065: <15 * Minutes> Gage height, feet
00095: <15 * Minutes> Specific conductance, water, unfiltered, microsiemens per centimeter at 25 degrees Celsius
00300: <15 * Minutes> Dissolved oxygen, water, unfiltered, milligrams per liter
00400: <15 * Minutes> pH, water, unfiltered, field, standard units
Start: 2021-08-09 04:00:00+00:00
End: 2021-08-10 03:30:00+00:00
The basic NWIS object will provide a list of every parameter collected at the site, the frequency of observations for that parameter, the name of the parameter, and the units of the observations. It also tells you the date and time of the first and last observation in the request.
This is great, but it doesn’t tell you when a parameter was first collected, or if a parameter was discontinued. If you leave out the ‘period’ part of the request, the USGS will give you the most recent value for every parameter, no matter how old, but this still doesn’t tell you when observations were first collected.
For more detailed information about the parameters collected at a site, request a ‘data catalog’ using the data_catalog()
function. This will return a hydroRDB
object containing a table (dataframe) with a row for every parameter that you request, and a header that describes every column in the dataset.
Some of the most useful information in the data catalog are the:
data type code: describes the frequency of observations
dv: daily values
uv, rt, or iv: ‘real time’ data collected more frequently than daily
sv: site visits conducted irregularly
ad: values listed in the USGS Annual Water Reports
more information: https://waterservices.usgs.gov/rest/Site-Service.html#outputDataTypeCd
parameter code: describes the type of data collected
statistic code: describes the statistic used to report the parameter
begin date, end date: the first and last observation made for this parameter
count_nu: the number of observations made between the start and end dates.
More information about the values in the Data Catalog are located in the header, and also from https://waterservices.usgs.gov/rest/Site-Service.html
For more information about a site and the data collected at the site, try these sources:
To access information about the site itself, use the
site_file()
function.To access the rating curve at a site (for translating water stage into discharge), use the
rating_curve()
function.To access field data collected by USGS personnel during site visits, use the
field_meas()
function.To access the annual peak discharges at a site, use the
peaks()
function.To access daily, monthly, or annual statistics for data at a site, use the
stats()
function.
Example Usage¶
[3]:
output = hf.data_catalog('01585200')
Retrieved the data catalog for site #01585200 from https://waterservices.usgs.gov/nwis/site/?format=rdb&sites=01585200&seriesCatalogOutput=true&siteStatus=all
Our new ‘output’ is a hydroRDB object. It has several useful properties, including:
.table, which returns a dataframe of the data. Each row corresponds to a different parameter.
.header, which is the original descriptive header provided by the USGS. It lists and describes the variables in the dataset.
.columns, which is a list of the column names
.dtypes, which is a list of the data types and column widths for each variable in the USGS RDB format.
If you print or evaluate the hydroRDB object, it will return a tuple of the header and dataframe table.
[4]:
print(output.header)
#
#
# US Geological Survey
# retrieved: 2021-08-09 23:51:11 -04:00 (sdas01)
#
# The Site File stores location and general information about groundwater,
# surface water, and meteorological sites
# for sites in USA.
#
# File-format description: http://help.waterdata.usgs.gov/faq/about-tab-delimited-output
# Automated-retrieval info: http://waterservices.usgs.gov/rest/Site-Service.html
#
# Contact: gs-w_support_nwisweb@usgs.gov
#
# The following selected fields are included in this output:
#
# agency_cd -- Agency
# site_no -- Site identification number
# station_nm -- Site name
# site_tp_cd -- Site type
# dec_lat_va -- Decimal latitude
# dec_long_va -- Decimal longitude
# coord_acy_cd -- Latitude-longitude accuracy
# dec_coord_datum_cd -- Decimal Latitude-longitude datum
# alt_va -- Altitude of Gage/land surface
# alt_acy_va -- Altitude accuracy
# alt_datum_cd -- Altitude datum
# huc_cd -- Hydrologic unit code
# data_type_cd -- Data type
# parm_cd -- Parameter code
# stat_cd -- Statistical code
# ts_id -- Internal timeseries ID
# loc_web_ds -- Additional measurement description
# medium_grp_cd -- Medium group code
# parm_grp_cd -- Parameter group code
# srs_id -- SRS ID
# access_cd -- Access code
# begin_date -- Begin date
# end_date -- End date
# count_nu -- Record count
#
[5]:
# Transposing the table to show all of the columns as rows:
output.table.T
[5]:
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
agency_cd | USGS | USGS | USGS | USGS | USGS | USGS | USGS | USGS | USGS | USGS | USGS | USGS | USGS | USGS |
site_no | 01585200 | 01585200 | 01585200 | 01585200 | 01585200 | 01585200 | 01585200 | 01585200 | 01585200 | 01585200 | 01585200 | 01585200 | 01585200 | 01585200 |
station_nm | WEST BRANCH HERRING RUN AT IDLEWYLDE, MD | WEST BRANCH HERRING RUN AT IDLEWYLDE, MD | WEST BRANCH HERRING RUN AT IDLEWYLDE, MD | WEST BRANCH HERRING RUN AT IDLEWYLDE, MD | WEST BRANCH HERRING RUN AT IDLEWYLDE, MD | WEST BRANCH HERRING RUN AT IDLEWYLDE, MD | WEST BRANCH HERRING RUN AT IDLEWYLDE, MD | WEST BRANCH HERRING RUN AT IDLEWYLDE, MD | WEST BRANCH HERRING RUN AT IDLEWYLDE, MD | WEST BRANCH HERRING RUN AT IDLEWYLDE, MD | WEST BRANCH HERRING RUN AT IDLEWYLDE, MD | WEST BRANCH HERRING RUN AT IDLEWYLDE, MD | WEST BRANCH HERRING RUN AT IDLEWYLDE, MD | WEST BRANCH HERRING RUN AT IDLEWYLDE, MD |
site_tp_cd | ST | ST | ST | ST | ST | ST | ST | ST | ST | ST | ST | ST | ST | ST |
dec_lat_va | 39.373639 | 39.373639 | 39.373639 | 39.373639 | 39.373639 | 39.373639 | 39.373639 | 39.373639 | 39.373639 | 39.373639 | 39.373639 | 39.373639 | 39.373639 | 39.373639 |
dec_long_va | -76.584333 | -76.584333 | -76.584333 | -76.584333 | -76.584333 | -76.584333 | -76.584333 | -76.584333 | -76.584333 | -76.584333 | -76.584333 | -76.584333 | -76.584333 | -76.584333 |
coord_acy_cd | S | S | S | S | S | S | S | S | S | S | S | S | S | S |
dec_coord_datum_cd | NAD83 | NAD83 | NAD83 | NAD83 | NAD83 | NAD83 | NAD83 | NAD83 | NAD83 | NAD83 | NAD83 | NAD83 | NAD83 | NAD83 |
alt_va | 278.13 | 278.13 | 278.13 | 278.13 | 278.13 | 278.13 | 278.13 | 278.13 | 278.13 | 278.13 | 278.13 | 278.13 | 278.13 | 278.13 |
alt_acy_va | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 |
alt_datum_cd | NAVD88 | NAVD88 | NAVD88 | NAVD88 | NAVD88 | NAVD88 | NAVD88 | NAVD88 | NAVD88 | NAVD88 | NAVD88 | NAVD88 | NAVD88 | NAVD88 |
huc_cd | 2060003 | 2060003 | 2060003 | 2060003 | 2060003 | 2060003 | 2060003 | 2060003 | 2060003 | 2060003 | 2060003 | 2060003 | 2060003 | 2060003 |
data_type_cd | ad | dv | pk | qw | qw | qw | qw | qw | qw | qw | qw | sv | uv | uv |
parm_cd | NaN | 60.0 | NaN | NaN | 10.0 | 20.0 | 28.0 | 61.0 | 65.0 | 30207.0 | 30209.0 | NaN | 60.0 | 65.0 |
stat_cd | NaN | 3.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
ts_id | 0 | 68214 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 69659 | 69660 |
loc_web_ds | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
medium_grp_cd | wat | wat | wat | wat | wat | wat | wat | wat | wat | wat | wat | wat | wat | wat |
parm_grp_cd | NaN | NaN | NaN | ALL | PHY | PHY | INF | PHY | PHY | PHY | PHY | NaN | NaN | NaN |
srs_id | 0 | 1645423 | 0 | 0 | 1645597 | 1645720 | 0 | 1645415 | 17164583 | 17164583 | 1645415 | 0 | 1645423 | 17164583 |
access_cd | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
begin_date | 2006 | 1957-07-01 | 1958-07-06 | 1965-01-06 | 1965-01-06 | 1965-01-06 | 1981-10-06 | 1965-01-06 | 1965-01-06 | 1965-01-06 | 1965-01-06 | 1957-07-10 | 1996-10-01 | 2007-10-01 |
end_date | 2020 | 2021-08-08 | 2020-07-22 | 1987-08-20 | 1987-08-20 | 1987-08-20 | 1984-01-30 | 1987-08-20 | 1987-08-20 | 1987-08-20 | 1987-08-20 | 2021-06-17 | 2021-08-09 | 2021-08-09 |
count_nu | 15 | 19913 | 54 | 119 | 119 | 119 | 7 | 119 | 118 | 118 | 119 | 511 | 9078 | 5061 |