hydrofunctions 0.2.0 package

Package contents

Hydrofunctions

Hydrofunctions is a suite of convenience functions to help you explore hydrology data interactively.

Basic Usage:

>>> import hydrofunctions as hf

>>> site = '01570500'
>>> harrisburg = hf.NWIS(site, 'iv', period='P10D')
Requested data from https://waterservices.usgs.gov/nwis/iv/?format=json%2C1.1&sites=01570500&period=P10D

>>> harrisburg.ok
True

Examine the dataset:

>>> harrisburg
USGS:01570500: Susquehanna River at Harrisburg, PA
    00045: <30 * Minutes> Precipitation, total, inches
    00060: <30 * Minutes> Discharge, cubic feet per second
    00065: <30 * Minutes> Gage height, feet
Start: 2019-04-06 00:30:00+00:00
End:   2019-04-15 23:00:00+00:00

The listing reports each of the parameters collected at the site that was requested, how frequently the data are collected, and the name of the parameter written out with units. The start and end of the dataset are given in Universal Time (UTC).

View the first five rows of a dataframe that only contains the discharge data:

>>> harrisburg.df('discharge').head()
                           USGS:01570500:00060:00000
datetimeUTC
2019-04-06 00:30:00+00:00                    44200.0
2019-04-06 01:00:00+00:00                    44000.0
2019-04-06 01:30:00+00:00                    44000.0
2019-04-06 02:00:00+00:00                    43700.0
2019-04-06 02:30:00+00:00                    43700.0

Because the .df() method returns a dataframe, you have access to all of the methods associated with Pandas, including .plot(), .describe(), and .info() !

List all of the different attributes and methods with dir():

>>> dir(response)

Read more about Hydrofunctions here: https://hydrofunctions.readthedocs.io/

hydrofunctions.charts module

hydrofunctions.charts

This module contains charting functions for Hydrofunctions.


hydrofunctions.charts.cycleplot(Qseries, cycle='diurnal', compare=None, y_label='Discharge (ft³/s)', legend=True, legend_loc='best', title='')[source]

Creates a chart to illustrate annual and diurnal cycles.

This chart will use the pandas groupby method to plot the mean and median values for a time-indexed dataframe. It helps you identify diurnal patterns by plotting the mean and median values over 24 hours for a diurnal pattern, and over a year for annual patterns.

This function will also use the ‘compare’ argument to create a series of charts to compare how well these cycles appear in different groups. For example, is the diurnal cycle more visible in December versus June? In this case, you would use:

hf.cycleplot(myDataFrame, cycle='diurnal', compare = 'month')

This will produce twelve charts, each covering 24 hours. A line will represent the mean values over 24 hours, another line represents the median, and two grey stripes represent the 0.4 to 0.6 quantile, and the 0.2 to 0.8 quantile range.

Parameters:
  • Qseries (series) –

    a Pandas series of discharge values.

    • Values should be arranged in columns
    • Should use a dateTimeIndex
  • cycle (str) –

    The period of the cycle to be illustrated, along with the method for binning. The options are:

    • diurnal (default): plots the values for a 24 hour cycle.
    • diurnal-smallest: uses the smallest increment of time available to bin the time units for a 24 hour cycle.
    • diurnal-hour: uses hours to bin measurements for a 24-hour cycle.
    • annual: plots values into a 365 day cycle.
    • annual-day: the annual cycle using 365 day-long bins.
    • annual-week: the annual cycle using 52 week-long bins.
    • annual-month: the annual cycle using 12 month-long bins.
    • weekly: a 7-day cycle using seven 24-hour long bins. Note that unlike the others, this is not a natural cycle, and is likely has anthropogenic origins.
  • compare (str) –

    The system for splitting the data into groups for a set of comparison charts.

    • None (default): No comparison will be made; only one chart.
    • month: twelve plots will be produced, one for each month.
    • weekday: seven plots will be produced, one for each day of the week.
    • weekend: two plots will be produced, one for the five weekdays, one for Saturday and Sunday.
    • night: two plots will be produced, one for night (6pm to 6am), one for day (6am to 6pm).
  • y_label (str) – The label for the y axis.
  • legend (bool) – default True. Whether the legend should be plotted.
  • legend_loc (str) –

    default is ‘best’. The location of the legend.

    • ’best’: Automatically choose the option below with the least overlap.
    • ’upper left’, ‘upper right’, ‘lower left’, ‘lower right’: place the legend at the corresponding corner of the axes/figure.
    • ’upper center’, ‘lower center’, ‘center left’, ‘center right’: place the legend at the center of the corresponding edge of the axes/figure.
    • ’center’: place the legend at the center of the axes/figure.
    • The location can also be a 2-tuple giving the coordinates of the lower-left corner of the legend in axes coordinates.
  • title (str) – default is ‘’. Text to use as a figure title. If no text is provided, no title will be created (default).
Returns:

Returns a tuple that includes a matplotlib ‘figure’ and ‘axes’. The figure is a container with all of the drawing inside of it; the axes are an array of matplotlib charts. Together, they will plot immediately in a Jupyter notebook if the command %matplotlib inline was previously issued. The figure and axes may be altered after they are returned.

Return type:

fig, ax (matplotlib.figure.Figure, matplotlib.axes.Axes)

Note

inspired by https://jakevdp.github.io/PythonDataScienceHandbook/03.11-working-with-time-series.html Jake VanderPlas. 2016. Python Data Science Handbook. O’Reilly Media, Inc.

hydrofunctions.charts.flow_duration(Qdf, xscale='logit', yscale='log', ylabel='Stream Discharge (m³/s)', symbol='.', legend=True, legend_loc='best', title='')[source]

Creates a flow duration chart from a dataframe of discharges.

Parameters:
  • Qdf (dataframe) –

    a dataframe of discharge values.

    • Values should be arranged in columns
    • No sorting necessary
    • Rows do not need an index
    • If more than one column, each column will be added as a separate color to the chart.
    • Only include columns with discharge values; no metadata
  • xscale (str, 'logit' | 'linear') – Type of x scale for plotting probabilities default is ‘logit’, so that each standard deviation is nearly the same distance on the x scale. ‘linear’ is the other option.
  • yscale (str, 'log' | 'linear') – The type of y scale for plotting discharge. Default is ‘log’.
  • ylabel (str, default ‘Stream Discharge (ft³/s)’) – The label for the Y axis.
  • xlabel (not implemented) –
  • symbol (str, '.' | ',') –

    formatting symbol for points.

    • point: ‘.’ (default)
    • pixel point: ‘,’
    • circle: ‘o’
    • triangle up: ‘^’

    See https://matplotlib.org/api/markers_api.html for full list of point formatters.

  • legend (bool, default True) – Whether the legend should be plotted.
  • legend_loc (str, default best) –

    the location of the legend.

    • ’best’: Automatically choose the option below with the least overlap.
    • ’upper left’, ‘upper right’, ‘lower left’, ‘lower right’: place the legend at the corresponding corner of the axes/figure.
    • ’upper center’, ‘lower center’, ‘center left’, ‘center right’: place the legend at the center of the corresponding edge of the axes/figure.
    • ’center’: place the legend at the center of the axes/figure.
    • The location can also be a 2-tuple giving the coordinates of the lower-left corner of the legend in axes coordinates.
  • title (str, default ‘’) – Text to use as a figure title. If no text is provided, no title will be created (default).
Returns:

Returns a tuple that includes a matplotlib ‘figure’ and ‘axes’. The figure is a container with all of the drawing inside of it; the axes are an array of matplotlib charts. Together, they will plot immediately in a Jupyter notebook if the command %matplotlib inline was previously issued. The figure and axes may be altered after they are returned.

Return type:

fig, ax (matplotlib.figure.Figure, matplotlib.axes.Axes)

hydrofunctions.exceptions module

hydrofunctions.exceptions

This module contains all of the custom exceptions defined in this package. The base class is HydroException and all custom exceptions are subclasses of HydroException.

Use the errors like this:

try:
    #some code here that might return no data
    #more code that might get encoded improperly
except HydroNoDataError('This site has no data'):
    # handle error here.
except HydroEncodeError():
    # handle this error here.
else:
    # code to complete if there is no exception raised.
finally:
    # code that you want to run whether an exception is raised or not.
    # If an exception wasn't caught, then this code gets run, and the
    # exception gets re-raised after this finally clause gets run.

Keep the try clause short: if you put too many things in there, it can be difficult to figure out what broke. On the other hand, like in my example above, it is more readable if you group a series of statements and then handle their exceptions together.

Example:

>>> raise HydroNoDataError("Oh no, NWIS doesn't have this data for you!")

https://axialcorps.com/2013/08/29/5-simple-rules-for-building-great-python-packages/


exception hydrofunctions.exceptions.HydroEncodeError[source]

Bases: hydrofunctions.exceptions.HydroException

Raised when an error occurs while encoding or decoding an argument.

Example:

try:
    # bunch of code from your package
except HydroException:
    # blanked condition to handle all errors from your package
exception hydrofunctions.exceptions.HydroException[source]

Bases: Exception

This is the base class for all exceptions created for the HydroFunctions package. This class is not meant to be raised.

exception hydrofunctions.exceptions.HydroNoDataError[source]

Bases: hydrofunctions.exceptions.HydroException

Raised when a service returns an empty dataset or indicates that it has no data for the request.

Usage:

raise HydroNoDataError("The NWIS service had no data for this request.")

Do not catch this error for interactive sessions: The user should get a useful message from the error when they try to request something that doesn’t exist.

Catch this error in automated systems so that the system can reconsider the request and either fix the request or move on to the next request.

Example:

try:
    hf.NWIS('666666666')
except HydroNoDataError as err:
    print("This is just to illustrate how to capture this error.")
    print(err)
exception hydrofunctions.exceptions.HydroUserWarning[source]

Bases: UserWarning

Warn user of a hazardous condition or when an action has been triggered that may be unexpected.

This is the base class for all warnings created for the HydroFunctions package. This class can be used if there is no more specific warning available.

Usage:

import warnings
... code
warnings.warn('This is my warning message.', HydroUserWarning)

Note

Warnings can be hidden or turned off depending on how the user is accessing Python and the settings for their interface.

Use HydroException if a process must be shut down, or is doomed to fail anyway. This will at least give the user a helpful error message.

hydrofunctions.helpers module

hydrofunctions.helpers

This module holds functions designed to help out the user in an IPython session.


hydrofunctions.helpers.count_number_of_truthy(my_list)[source]
hydrofunctions.helpers.draw_map(width=700, height=400, url='http://hydrocloud.org')[source]

Draws a map of stream gages in a Jupyter Notebook.

This function will draw an interactive map of stream gages from hydrocloud.org into an iframe and display it in a Jupyter Notebook. Each dot represents a stream gage. Click on the dot to learn its name, which you can use to request data.

Parameters:
  • width (int) – The width of the map iframe.
  • height (int) – The height of the map iframe.
  • url (str) – The URL to put inside of the IFrame. Defaults to https://hydrocloud.org
Returns:

HTML display object.

Example:

>>> import hydrofunctions as hf
>>> hf.draw_map()

A map appears.

>>> hf.draw_map(width=900, height=600)

Draws a larger map.

hydrofunctions.hydrofunctions module

hydrofunctions.hydrofunctions

This module contains the main functions used in an interactive session.


hydrofunctions.hydrofunctions.calc_freq(index)[source]
hydrofunctions.hydrofunctions.extract_nwis_df(nwis_dict, interpolate=True)[source]

Returns a Pandas dataframe and a metadata dict from the NWIS response object or the json dict of the response.

Parameters:

nwis_dict (obj) – the json from a response object as returned by get_nwis().json(). Alternatively, you may supply the response object itself.

Returns:

a pandas dataframe.

Raises:
  • HydroNoDataError – when the request is valid, but NWIS has no data for the parameters provided in the request.
  • HydroUserWarning – when one dataset is sampled at a lower frequency than another dataset in the same request.
hydrofunctions.hydrofunctions.get_nwis(site, service='dv', start_date=None, end_date=None, stateCd=None, countyCd=None, bBox=None, parameterCd='all', period=None)[source]

Request stream gauge data from the USGS NWIS.

Parameters:
  • site (str or list of strings) – a valid site is ‘01585200’ or [‘01585200’, ‘01646502’]. site should be None if stateCd or countyCd are not None.
  • service (str) –
    can either be ‘iv’ or ‘dv’ for instantaneous or daily data.
    • ’dv’(default): daily values. Mean value for an entire day.
    • ’iv’: instantaneous value measured at this time. Also known as ‘Real-time data’. Can be measured as often as every five minutes by the USGS. 15 minutes is more typical.
  • start_date (str) – should take on the form yyyy-mm-dd
  • end_date (str) – should take on the form yyyy-mm-dd
  • stateCd (str) – a valid two-letter state postal abbreviation. Default is None.
  • countyCd (str or list of strings) – a valid county abbreviation. Default is None.
  • bBox (str, list, or tuple) –
    a set of coordinates that defines a bounding box.
    • Coordinates are in decimal degrees
    • Longitude values are negative (west of the prime meridian).
    • Latitude values are positive (north of the equator).
    • comma-delimited, no spaces, if provided as a string.
    • The order of the boundaries should be: “West,South,East,North”
    • Example: “-83.000000,36.500000,-81.000000,38.500000”
  • parameterCd (str or list of strings) –
    NWIS parameter code. Usually a five digit code. Default is ‘all’. A valid code can also be given as a list: parameterCd=['00060','00065']
    • if value of ‘all’ is submitted, then NWIS will return every parameter collected at this site. (default option)
    • stage: ‘00065’
    • discharge: ‘00060’
    • not all sites collect all parameters!
    • See https://nwis.waterdata.usgs.gov/usa/nwis/pmcodes for full list
  • period (str) –
    NWIS period code. Default is None.
    • Format is “PxxD”, where xx is the number of days before today.
    • Either use start_date or period, but not both.
Returns:

a response object. This function will always return the response,

even if the NWIS returns a status_code that indicates a problem.

  • response.url: the url we used to request data
  • response.json: the content translated as json
  • response.status_code: the internet status code
  • response.ok: True when we get a ‘200’ status_code

Raises:
  • ConnectionError – due to connection problems like refused connection or DNS Error.
  • SyntaxWarning – when NWIS returns a response code that is not 200.

Example:

>>> import hydrofunctions as hf
>>> response = hf.get_nwis('01585200', 'dv', '2012-06-01', '2012-07-01')
>>> response
<response [200]>
>>> response.json()
*JSON ensues*
>>> hf.extract_nwis_df(response)
*a Pandas dataframe appears*

Other Valid Ways to Make a Request:

>>> sites = ['07180500', '03380475', '06926000'] # Request a list of sites.
>>> service = 'iv'  # Request real-time data
>>> days = 'P10D'  # Request the last 10 days.
>>> stage = '00065' # Sites that collect discharge usually collect water depth too.
>>> response2 = hf.get_nwis(sites, service, period=days, parameterCd=stage)

Request Data By Location:

>>> # Request the most recent daily data for every site in Maine
>>> response3 = hf.get_nwis(None, 'dv', stateCd='ME')
>>> response3
<Response [200]>

The specification for the USGS NWIS IV service is located here: http://waterservices.usgs.gov/rest/IV-Service.html

hydrofunctions.hydrofunctions.get_nwis_property(nwis_dict, key=None, remove_duplicates=False)[source]

Returns a list containing property data from an NWIS response object.

Parameters:
  • nwis_dict (dict) – the json returned in a response object as produced by get_nwis().json().
  • key (str) –
    a valid NWIS response property key. Default is None. The index is returned if key is None. Valid keys are:
    • None
    • name - constructed name “provider:site:parameterCd:statistic”
    • siteName
    • siteCode
    • timeZoneInfo
    • geoLocation
    • siteType
    • siteProperty
    • variableCode
    • variableName
    • variableDescription
    • valueType
    • unit
    • options
    • noDataValue
  • remove_duplicates (bool) – a flag used to remove duplicate values in the returned list.
Returns:

a list with the data for the passed key string.

Raises:
  • HydroNoDataError – when the request is valid, but NWIS has no data for the parameters provided in the request.
  • ValueError when the key is not available.
hydrofunctions.hydrofunctions.nwis_custom_status_codes(response)[source]

Raise custom warning messages from the NWIS when it returns a status_code that is not 200.

Parameters:response – a response object as returned by get_nwis().
Returns:
  • None
    if response.status_code == 200
  • response.status_code
    for all other status codes.
Raises:SyntaxWarning – when a non-200 status code is returned. https://en.wikipedia.org/wiki/List_of_HTTP_status_codes

Note

To raise an exception, call response.raise_for_status() This will raise requests.exceptions.HTTPError with a helpful message or it will return None for status code 200. From: http://docs.python-requests.org/en/master/user/quickstart/#response-status-codes

NWIS status_code messages come from:
https://waterservices.usgs.gov/docs/portable_code.html
Additional status code documentation:
https://waterservices.usgs.gov/rest/IV-Service.html#Error
hydrofunctions.hydrofunctions.read_parquet(filename)[source]
hydrofunctions.hydrofunctions.save_parquet(filename, dataframe, hf_meta)[source]
hydrofunctions.hydrofunctions.select_data(nwis_df)[source]

Create a boolean array of columns that contain data.

Parameters:nwis_df – A pandas dataframe created by extract_nwis_df.
Returns:an array of Boolean values corresponding to the columns in the original dataframe.

Example

>>> my_dataframe[:, select_data(my_dataframe)]

returns a dataframe with only the data columns; the qualifier columns do not show.

hydrofunctions.station module

hydrofunctions.station

This module contains the Station and NWIS classes, which are used for organizing and managing data for data collection sites.


class hydrofunctions.station.NWIS(site=None, service='dv', start_date=None, end_date=None, stateCd=None, countyCd=None, bBox=None, parameterCd='all', period=None, file=None)[source]

Bases: hydrofunctions.station.Station

A class for working with data from the USGS NWIS service.

Parameters:
  • site (str or list of strings) – a valid site is ‘01585200’ or [‘01585200’, ‘01646502’]. Default is None. If site is not specified, you will need to select sites using stateCd or countyCd.
  • service (str) –
    can either be ‘iv’ or ‘dv’ for instantaneous or daily data.
    • ’dv’(default): daily values. Mean value for an entire day.
    • ’iv’: instantaneous value measured at this time. Also known as ‘Real-time data’. Can be measured as often as every five minutes by the USGS. 15 minutes is more typical.
  • start_date (str) – should take on the form ‘yyyy-mm-dd’
  • end_date (str) – should take on the form ‘yyyy-mm-dd’
  • stateCd (str) – a valid two-letter state postal abbreviation, such as ‘MD’. Default is None. Selects all stations in this state. Because this type of site selection returns a large number of sites, you should limit the amount of data requested for each site.
  • countyCd (str or list of strings) – a valid county FIPS code. Default is None. Requests all stations within the county or list of counties. See https://en.wikipedia.org/wiki/FIPS_county_code for an explanation of FIPS codes.
  • bBox (str, list, or tuple) –
    a set of coordinates that defines a bounding box.
    • Coordinates are in decimal degrees.
    • Longitude values are negative (west of the prime meridian).
    • Latitude values are positive (north of the equator).
    • comma-delimited, no spaces, if provided as a string.
    • The order of the boundaries should be: “West,South,East,North”
    • Example: “-83.000000,36.500000,-81.000000,38.500000”
  • parameterCd (str or list of strings) –

    NWIS parameter code. Usually a five digit code. Default is ‘all’. A valid code can also be given as a list: parameterCd=[‘00060’,‘00065’] This will request data for this parameter.

    • if value is ‘all’, or no value is submitted, then NWIS will return every parameter collected at this site. (default option)
    • stage: ‘00065’
    • discharge: ‘00060’
    • not all sites collect all parameters!
    • See https://nwis.waterdata.usgs.gov/usa/nwis/pmcodes for full list
  • period (str) –
    NWIS period code. Default is None.
    • Format is “PxxD”, where xx is the number of days before today, with a maximum of 999 days accepted.
    • Either use start_date or period, but not both.
df(*args)[source]

Return a subset of columns from the dataframe.

Parameters:
  • '' – If no args are provided, the entire dataframe will be returned.
  • 'all' (str) – the entire dataframe will be returned.
  • 'data' (str) – all of the parameters will be returned, with no flags.
  • 'flags' (str) – Only the _qualifier flags will be returned. Unless the flags arg is provided, only data columns will be returned. Visit https://waterdata.usgs.gov/usa/nwis/uv?codes_help#dv_cd1 to see a more complete listing of possible codes.
  • 'discharge' or 'q' (str) – discharge columns (‘00060’) will be returned.
  • 'stage' (str) – Gauge height columns (‘00065’) will be returned.
  • any five digit number (int) – any matching parameter columns will be returned. ‘00065’ returns stage, for example.
  • any eight to twelve digit number (int) – any matching stations will be returned.
get_data()[source]

Deprecated since version version: 0.2.0 No longer needed. NWIS object will request data upon creation.

read(file)[source]

Read a dataframe and metadata from a parquet file.

Parameters:file (str) – the filename to read from.
save(file)[source]

Save the dataframe and metadata to a parquet file.

Parameters:file (str) – the filename to save to.
class hydrofunctions.station.Station(site=None)[source]

Bases: object

A class for organizing stream gauge data for a single request.

station_dict = {}

hydrofunctions.typing module

hydrofunctions.typing

This module contains functions for testing that user input is valid.

Why ‘pre-check’ user imputs, instead of using standard python duck typing? These functions are meant to enhance an interactive session for the user, and will check a user’s parameters before requesting data from an online resource. Otherwise, the server will return a 404 code and the user will have no idea why. Hydrofunctions tries to raise an exception (usually a TypeError) before a request is made, so that the user can fix their request. It also tries to provide a helpful error message to an interactive session user.

Suggested format for these functions:

  • first check that the input is a string,
  • then do a regular expression to check that the input is more or less valid.
  • raise exceptions when user input breaks format.

hydrofunctions.typing.check_NWIS_bBox(input)[source]

Checks that the USGS bBox is valid.

hydrofunctions.typing.check_NWIS_service(input)[source]

Checks that the service is valid: either ‘iv’ or ‘dv’

hydrofunctions.typing.check_datestr(input)[source]

Checks that the start_date or end_date parameter is in yyyy-mm-dd format.

hydrofunctions.typing.check_parameter_string(candidate, param)[source]

Checks that a parameter is a string or a list of strings.

hydrofunctions.typing.check_period(input)[source]

Checks that the period parameter in is the P##D format, where ## is the number of days before now.

hydrofunctions.usgs_rdb module

hydrofunctions.usgs_rdb

This module is for working with the various USGS dataservices that use the rdb text format. These include the statistics service, the field measurements service, the rating curve service, and the peak discharge service.

hydrofunctions.usgs_rdb.data_catalog(site)[source]

Load a history of the data collected at a site into a Pandas dataframe.

Parameters:site (str) – The gauge ID number for the site.
Returns:a hydroRDB object or tuple consisting of the header and a pandas dataframe.

Example:

>>> test = data_catalog('01542500')
>>> test
hydroRDB(header=<a mulit-line string of the header>,
         table=<a Pandas dataframe>)

You can also access the header, dataframe, column names, and data types through the associated properties header, table, columns, dtypes:

>>> test.table
<a Pandas dataframe>
hydrofunctions.usgs_rdb.field_meas(site)[source]

Load USGS field measurements of stream discharge into a Pandas dataframe.

Parameters:site (str) – The gauge ID number for the site.
Returns:a hydroRDB object or tuple consisting of the header and a pandas dataframe. Each row of the table represents an observation on a given date of river conditions at the gauge by USGS personnel. Values are stored in columns, and include the measured stream discharge, channel width, channel area, depth, and velocity.

Example:

>>> test = field_meas('01542500')
>>> test
hydroRDB(header=<a mulit-line string of the header>,
         table=<a Pandas dataframe>)

You can also access the header, dataframe, column names, and data types through the associated properties header, table, columns, dtypes:

>>> test.table
<a Pandas dataframe>
Discussion:

The USGS operates over 8,000 stream gages around the United States and territories. Each of these sensors records the depth, or ‘stage’ of the water. In order to translate this stage data into stream discharge, the USGS staff creates an empirical relationship called a ‘rating curve’ between the river stage and stream discharge. To construct this curve, the USGS personnel visit all of the gage every one to eight weeks, and measure the stage and the discharge of the river manually.

The field_meas() function returns all of the field-collected data for this site. The USGS uses these data to create the rating curve. You can use these data to see how the site has changed over time, or to read the notes about local conditions.

The rating_curve() function returns the most recent ‘expanded shift- adjusted’ rating curve constructed for this site. This is the current official rating curve.

To plot a rating curve from the field measurements, use:

>>> header, data = hf.field_meas('01581830')

>>> data.plot(x='gage_height_va', y='discharge_va', kind='scatter')

Rating curves are typically plotted with the indepedent variable, gage_height, plotted on the Y axis.

hydrofunctions.usgs_rdb.get_usgs_RDB_service(url, headers=None, params=None)[source]

Request data from a USGS dataservice and handle errors.

Parameters:
  • url (str) – a string used by Requests as the base URL.
  • header (dict) – a dict of parameters used to request the data.
  • params (dict) – a dict of parameters used to modify the url of a REST service.
Returns:

A Requests response object.

Raises:

This function will raise an exception for any non-200 status code, and in cases where the USGS service returns anything that is not obviously an RDB file. If an exception is raised, then an attempt will be made to display the error page which the USGS sometimes sends back to the user.

class hydrofunctions.usgs_rdb.hydroRDB(header, table, columns, dtypes)[source]

Bases: object

A class for holding the information from USGS rdb files.

Parameters:
  • header (str) – A multi-line string from the header of the rdb file. The header often contain important metadata and user warnings.
  • table (pandas dataframe) – This is a dataframe made from the rdb file.
  • columns (str) – A string from the rdb file that lists the column names.
  • dtypes (str) – A string from the rdb file that gives the data type and length of each column.
Properties:
header (str):
A multi-line string from the header of the rdb file. The header often contain important metadata and user warnings.
table (pandas dataframe):
This is a dataframe made from the rdb file.
columns (str):
A string from the rdb file that lists the column names.
dtypes (str):
A string from the rdb file that gives the data type and length of each column.

You can also access the header and the dataframe as a named tuple:

hydroRDB(header=<a multi-line string>, table=<pandas dataframe>)

Note

  • The args to create this object are supplied by hf.read_rdb().
  • The hydroRDB object is returned from several functions that request RDB files from a USGS data service, including: peaks(), field_meas(), rating_curve(), stats(), site_file(), and data_catalog().
  • You can read more about the RDB format here: https://pubs.usgs.gov/of/2003/ofr03123/6.4rdb_format.pdf
hydrofunctions.usgs_rdb.peaks(site)[source]

Return a series of annual peak discharges.

Parameters:site (str) – The gauge ID number for the site.
Returns:a hydroRDB object or tuple consisting of the header and a table. The header is a multi-line string of metadata supplied by the USGS with the data series. The table is a dataframe containing the annual peak discharge series.

Example:

>>> test = data_catalog('01542500')
>>> test
hydroRDB(header=<a mulit-line string of the header>,
         table=<a Pandas dataframe>)

You can also access the header, dataframe, column names, and data types through the associated properties header, table, columns, dtypes:

>>> test.table
<a Pandas dataframe>
hydrofunctions.usgs_rdb.rating_curve(site)[source]

Return the most recent USGS expanded-shift-adjusted rating curve for a given stream gage into a dataframe.

Parameters:site (str) – The gage ID number for the site.
Returns:a hydroRDB object or tuple consisting of the header and a table. The header is a multi-line string of metadata supplied by the USGS with the data series. The table is a dataframe containing the latest official rating curve for the site.

Example:

>>> test = data_catalog('01542500')
>>> test
hydroRDB(header=<a mulit-line string of the header>,
         table=<a Pandas dataframe>)

You can also access the header, dataframe, column names, and data types through the associated properties header, table, columns, dtypes:

>>> test.table
<a Pandas dataframe>
Discussion:

The USGS operates over 8,000 stream gauges around the United States and territories. Each of these sensors records the depth, or ‘stage’ of the water. In order to translate this stage data into stream discharge, the USGS staff creates an empirical relationship called a ‘rating curve’ between the river stage and stream discharge.

See hf.field_meas() to access the field data used to construct the rating curve.

Note: Rating curves change over time.

hydrofunctions.usgs_rdb.read_rdb(text)[source]

Read strings that are in rdb format.

Parameters:text (str) – A long string containing the contents of a rdb file. A common way to obtain these would be from the .text property of a requests response, as in the example usage below.
Returns:
Every commented line at the top of the rdb file is marked with a
’#’ symbol. Each of these lines is stored in this output.
outputDF (pandas.DataFrame):
A dataframe containing the information in the rdb file. site_no and parameter_cd are interpreted as a string, but every other number is interpreted as a float or int; missing values as an np.nan; strings for everything else.
columns (list of strings):
The column names, taken from the rdb header row.
dtypes (list of strings):
The second header row from the rdb file. These mostly tell the column width, and typically record everything as string data (‘s’) type. The exception to this are dates, which are listed with a ‘d’.
Return type:header (multi-line string)
hydrofunctions.usgs_rdb.site_file(site)[source]

Load USGS site file into a Pandas dataframe.

Parameters:site (str) – The gauge ID number for the site.
Returns:a hydroRDB object or tuple consisting of the header and a pandas dataframe.

Example:

>>> test = site_file('01542500')
>>> test
hydroRDB(header=<a multi-line string of the header>,
         table=<a Pandas dataframe>)

You can also access the header, dataframe, column names, and data types through the associated properties header, table, columns, dtypes:

>>> test.table
<a Pandas dataframe>
hydrofunctions.usgs_rdb.stats(site, statReportType='daily', **kwargs)[source]

Return statistics from the USGS Stats Service as a dataframe.

Parameters:
  • site (str) – The gage ID number for the site, or a series of gage IDs separated by commas, like this: ‘01546500,01548000’.
  • statReportType ('annual'|'monthly'|'daily') – There are three different types of report that you can request. - ‘daily’ (default): this
Returns:

a hydroRDB object or tuple consisting of the header and a table. The header is a multi-line string of metadata supplied by the USGS with the data series. The table is a dataframe containing the latest official statistics for the site.

Raises:

HTTPError when a non-200 http status code is returned.

Example:

>>> test = stats('01542500', 'monthly')
>>> test
hydroRDB(header=<a mulit-line string of the header>,
         table=<a Pandas dataframe>)

You can also access the header, dataframe, column names, and data types through the associated properties header, table, columns, dtypes:

>>> test.table
<a Pandas dataframe>

Note

This function is based on the USGS statistics service, described here: https://waterservices.usgs.gov/rest/Statistics-Service.html

The USGS Statistics Service allows you to specify a wide array of additional parameters in your request. You can provide these parameters as keyword arguments, like in this example:

>>> hf.stats('01452500', parameterCD='00060')

This will only request statistics for discharge, which is specified with the ‘00060’ parameter code.

Additional useful parameters include:

  • parameterCD=‘00060,00065’ Limit the request for statistics to only one parameter or to a list of parameters. The default behavior is to provide statistics for every parameter that has been measured at this site. In this example, statistics for discharge (‘00060’) and stage (‘00065’) are requested.
  • statYearType=’water’ Calculate annual statistics based on the water year, which runs from October 1st to September 31st. This parameter is only for use with annual reports. If not specified, the default behavior will use calendar years for reporting.
  • missingData=’on’ Calculate statistics even when there are some missing values. If not specified, the default behavior is to drop years that have fewer than 365 values from annual reports, and to drop months that have fewer than 30 values in monthly reports. The number of values used to calculate a statistic is reported in the ‘count_nu’ column.
  • You can read about other useful parameters here: https://waterservices.usgs.gov/rest/Statistics-Service.html#statistical_Controls