hydrofunctions 0.2.4 package

Package contents

Hydrofunctions

Hydrofunctions is a suite of convenience functions to help you explore hydrology data interactively.

Basic Usage:

>>> import hydrofunctions as hf

>>> site = '01589440'
>>> jones = hf.NWIS(site, 'iv', period='P10D')
Requested data from https://waterservices.usgs.gov/nwis/iv/?format=json%2C1.1&sites=01589440&period=P10D

Examine the dataset:

>>> jones
USGS:01589440: JONES FALLS AT SORRENTO, MD
    00060: <15 * Minutes> Discharge, cubic feet per second
    00065: <15 * Minutes> Gage height, feet
Start: 2022-10-27 17:30:00+00:00
End:   2022-11-06 17:15:00+00:00

The listing reports each of the parameters collected at the site that was requested, how frequently the data are collected, and the name of the parameter written out with units. The start and end of the dataset are given in Universal Time (UTC).

View the first five rows of a dataframe that only contains the discharge data:

>>> jones.df('discharge').head()
                           USGS:01589440:00060:00000
datetimeUTC
2022-10-27 17:30:00+00:00                    14.6
2022-10-27 17:45:00+00:00                    15.2
2022-10-27 18:00:00+00:00                    15.2
2022-10-27 18:15:00+00:00                    15.8
2022-10-27 18:30:00+00:00                    16.4

Because the .df() method returns a dataframe, you have access to all of the methods associated with Pandas, including .plot(), .describe(), and .info() !

Learn more about hydrofunctions and the NWIS object with help():

>>> help(hf)
>>> help(hf.NWIS)

Read more about Hydrofunctions here: https://hydrofunctions.readthedocs.io/

hydrofunctions.charts module

hydrofunctions.charts

This module contains charting functions for Hydrofunctions.


hydrofunctions.charts.cycleplot(Qseries, cycle='diurnal', compare=None, y_label='Discharge (ft³/s)', legend=True, legend_loc='best', title='')[source]

Creates a chart to illustrate annual and diurnal cycles.

This chart will use the pandas groupby method to plot the mean and median values for a time-indexed dataframe. It helps you identify diurnal patterns by plotting the mean and median values over 24 hours for a diurnal pattern, and over a year for annual patterns.

This function will also use the ‘compare’ argument to create a series of charts to compare how well these cycles appear in different groups. For example, is the diurnal cycle more visible in December versus June? In this case, you would use:

hf.cycleplot(myDataFrame, cycle='diurnal', compare = 'month')

This will produce twelve charts, each covering 24 hours. A line will represent the mean values over 24 hours, another line represents the median, and two grey stripes represent the 0.4 to 0.6 quantile, and the 0.2 to 0.8 quantile range.

Parameters:
  • Qseries (series) –

    a Pandas series of discharge values.

    • Values should be arranged in columns

    • Should use a dateTimeIndex

  • cycle (str) –

    The period of the cycle to be illustrated, along with the method for binning. The options are:

    • diurnal (default): plots the values for a 24 hour cycle.

    • diurnal-smallest: uses the smallest increment of time available to bin the time units for a 24 hour cycle.

    • diurnal-hour: uses hours to bin measurements for a 24-hour cycle.

    • annual: plots values into a 365 day cycle.

    • annual-day: the annual cycle using 365 day-long bins.

    • annual-week: the annual cycle using 52 week-long bins.

    • annual-month: the annual cycle using 12 month-long bins.

    • weekly: a 7-day cycle using seven 24-hour long bins. Note that unlike the others, this is not a natural cycle, and is likely has anthropogenic origins.

  • compare (str) –

    The system for splitting the data into groups for a set of comparison charts.

    • None (default): No comparison will be made; only one chart.

    • month: twelve plots will be produced, one for each month.

    • weekday: seven plots will be produced, one for each day of the week.

    • weekend: two plots will be produced, one for the five weekdays, one for Saturday and Sunday.

    • night: two plots will be produced, one for night (6pm to 6am), one for day (6am to 6pm).

  • y_label (str) – The label for the y axis.

  • legend (bool) – default True. Whether the legend should be plotted.

  • legend_loc (str) –

    default is ‘best’. The location of the legend.

    • ’best’: Automatically choose the option below with the least overlap.

    • ’upper left’, ‘upper right’, ‘lower left’, ‘lower right’: place the legend at the corresponding corner of the axes/figure.

    • ’upper center’, ‘lower center’, ‘center left’, ‘center right’: place the legend at the center of the corresponding edge of the axes/figure.

    • ’center’: place the legend at the center of the axes/figure.

    • The location can also be a 2-tuple giving the coordinates of the lower-left corner of the legend in axes coordinates.

  • title (str) – default is ‘’. Text to use as a figure title. If no text is provided, no title will be created (default).

Returns:

Returns a tuple that includes a matplotlib ‘figure’ and ‘axes’. The figure is a container with all of the drawing inside of it; the axes are an array of matplotlib charts. Together, they will plot immediately in a Jupyter notebook if the command %matplotlib inline was previously issued. The figure and axes may be altered after they are returned.

Return type:

fig, ax (matplotlib.figure.Figure, matplotlib.axes.Axes)

Note

inspired by https://jakevdp.github.io/PythonDataScienceHandbook/03.11-working-with-time-series.html Jake VanderPlas. 2016. Python Data Science Handbook. O’Reilly Media, Inc.

hydrofunctions.charts.flow_duration(Qdf, xscale='logit', yscale='log', ylabel='Stream Discharge (m³/s)', symbol='.', legend=True, legend_loc='best', title='')[source]

Creates a flow duration chart from a dataframe of discharges.

Parameters:
  • Qdf (dataframe) –

    a dataframe of discharge values.

    • Values should be arranged in columns

    • No sorting necessary

    • Rows do not need an index

    • If more than one column, each column will be added as a separate color to the chart.

    • Only include columns with discharge values; no metadata

  • xscale (str, 'logit' | 'linear') – Type of x scale for plotting probabilities default is ‘logit’, so that each standard deviation is nearly the same distance on the x scale. ‘linear’ is the other option.

  • yscale (str, 'log' | 'linear') – The type of y scale for plotting discharge. Default is ‘log’.

  • ylabel (str, default ‘Stream Discharge (ft³/s)’) – The label for the Y axis.

  • xlabel (not implemented) –

  • symbol (str, '.' | ',') –

    formatting symbol for points.

    • point: ‘.’ (default)

    • pixel point: ‘,’

    • circle: ‘o’

    • triangle up: ‘^’

    See https://matplotlib.org/api/markers_api.html for full list of point formatters.

  • legend (bool, default True) – Whether the legend should be plotted.

  • legend_loc (str, default best) –

    the location of the legend.

    • ’best’: Automatically choose the option below with the least overlap.

    • ’upper left’, ‘upper right’, ‘lower left’, ‘lower right’: place the legend at the corresponding corner of the axes/figure.

    • ’upper center’, ‘lower center’, ‘center left’, ‘center right’: place the legend at the center of the corresponding edge of the axes/figure.

    • ’center’: place the legend at the center of the axes/figure.

    • The location can also be a 2-tuple giving the coordinates of the lower-left corner of the legend in axes coordinates.

  • title (str, default ‘’) – Text to use as a figure title. If no text is provided, no title will be created (default).

Returns:

Returns a tuple that includes a matplotlib ‘figure’ and ‘axes’. The figure is a container with all of the drawing inside of it; the axes are an array of matplotlib charts. Together, they will plot immediately in a Jupyter notebook if the command %matplotlib inline was previously issued. The figure and axes may be altered after they are returned.

Return type:

fig, ax (matplotlib.figure.Figure, matplotlib.axes.Axes)

hydrofunctions.exceptions module

hydrofunctions.exceptions

This module contains all of the custom exceptions defined in this package. The base class is HydroException and all custom exceptions are subclasses of HydroException.

Use the errors like this:

try:
    #some code here that might return no data
    #more code that might get encoded improperly
except HydroNoDataError('This site has no data'):
    # handle error here.
except HydroEncodeError():
    # handle this error here.
else:
    # code to complete if there is no exception raised.
finally:
    # code that you want to run whether an exception is raised or not.
    # If an exception wasn't caught, then this code gets run, and the
    # exception gets re-raised after this finally clause gets run.

Keep the try clause short: if you put too many things in there, it can be difficult to figure out what broke. On the other hand, like in my example above, it is more readable if you group a series of statements and then handle their exceptions together.

Example:

>>> raise HydroNoDataError("Oh no, NWIS doesn't have this data for you!")

https://axialcorps.com/2013/08/29/5-simple-rules-for-building-great-python-packages/


exception hydrofunctions.exceptions.HydroEncodeError(msg='')[source]

Bases: HydroException

Raised when an error occurs while encoding or decoding an argument.

Example:

try:
    # bunch of code from your package
except HydroException:
    # blanked condition to handle all errors from your package
exception hydrofunctions.exceptions.HydroException(msg='')[source]

Bases: Exception

This is the base class for all exceptions created for the HydroFunctions package. This class is not meant to be raised.

exception hydrofunctions.exceptions.HydroNoDataError(msg='')[source]

Bases: HydroException

Raised when a service returns an empty dataset or indicates that it has no data for the request.

Usage:

raise HydroNoDataError("The NWIS service had no data for this request.")

Do not catch this error for interactive sessions: The user should get a useful message from the error when they try to request something that doesn’t exist.

Catch this error in automated systems so that the system can reconsider the request and either fix the request or move on to the next request.

Example:

try:
    hf.NWIS('666666666')
except HydroNoDataError as err:
    print("This is just to illustrate how to capture this error.")
    print(err)
exception hydrofunctions.exceptions.HydroUserWarning(msg='')[source]

Bases: UserWarning

Warn user of a hazardous condition or when an action has been triggered that may be unexpected.

This is the base class for all warnings created for the HydroFunctions package. This class can be used if there is no more specific warning available.

Usage:

import hydrofunctions as hf
import warnings
... code
warnings.warn('This is my warning message.', hf.HydroUserWarning)

Note

Warnings can be hidden or turned off depending on how the user is accessing Python and the settings for their interface.

Use HydroException if a process must be shut down, or is doomed to fail anyway. This will at least give the user a helpful error message.

hydrofunctions.helpers module

hydrofunctions.helpers

This module holds functions designed to help out the user in an IPython session.


hydrofunctions.helpers.count_number_of_truthy(my_list)[source]
hydrofunctions.helpers.draw_map(width=700, height=400, url='http://hydrocloud.org')[source]

Draws a map of stream gages in a Jupyter Notebook.

This function will draw an interactive map of stream gages from hydrocloud.org into an iframe and display it in a Jupyter Notebook. Each dot represents a stream gage. Click on the dot to learn its name, which you can use to request data.

Parameters:
  • width (int) – The width of the map iframe.

  • height (int) – The height of the map iframe.

  • url (str) – The URL to put inside of the IFrame. Defaults to https://hydrocloud.org

Returns:

HTML display object.

Example:

>>> import hydrofunctions as hf
>>> hf.draw_map()

A map appears.

>>> hf.draw_map(width=900, height=600)

Draws a larger map.

hydrofunctions.hydrofunctions module

hydrofunctions.hydrofunctions

This module contains the main functions used in an interactive session.


hydrofunctions.hydrofunctions.calc_freq(index)[source]
hydrofunctions.hydrofunctions.extract_nwis_df(nwis_dict, interpolate=False)[source]

Returns a Pandas dataframe and a metadata dict from the NWIS response object or the json dict of the response.

Parameters:
  • nwis_dict (obj) – the json from a response object as returned by get_nwis().json(). Alternatively, you may supply the response object itself.

  • interpolate (bool) – fill missing data values with interpolated values. Default False.

Returns:

a pandas dataframe.

Raises:
  • HydroNoDataError – when the request is valid, but NWIS has no data for the parameters provided in the request.

  • HydroUserWarning – when one dataset is sampled at a lower frequency than another dataset in the same request.

hydrofunctions.hydrofunctions.get_nwis(site, service='dv', start_date=None, end_date=None, stateCd=None, countyCd=None, bBox=None, parameterCd='all', period=None, verbose=True)[source]

Request stream gauge data from the USGS NWIS.

Parameters:
  • site (str or list of strings) – a valid site is ‘01585200’ or [‘01585200’, ‘01646502’]. site should be None if stateCd or countyCd are not None.

  • service (str) –

    can either be ‘iv’ or ‘dv’ for instantaneous or daily data.
    • ’dv’(default): daily values. Mean value for an entire day.

    • ’iv’: instantaneous value measured at this time. Also known as ‘Real-time data’. Can be measured as often as every five minutes by the USGS. 15 minutes is more typical.

  • start_date (str) – should take on the form yyyy-mm-dd

  • end_date (str) – should take on the form yyyy-mm-dd

  • stateCd (str) – a valid two-letter state postal abbreviation. Default is None.

  • countyCd (str or list of strings) – a valid county abbreviation. Default is None.

  • bBox (str, list, or tuple) –

    a set of coordinates that defines a bounding box.
    • Coordinates are in decimal degrees

    • Longitude values are negative (west of the prime meridian).

    • Latitude values are positive (north of the equator).

    • comma-delimited, no spaces, if provided as a string.

    • The order of the boundaries should be: “West,South,East,North”

    • Example: “-83.000000,36.500000,-81.000000,38.500000”

  • parameterCd (str or list of strings) –

    NWIS parameter code. Usually a five digit code. Default is ‘all’. A valid code can also be given as a list: parameterCd=['00060','00065']
    • if value of ‘all’ is submitted, then NWIS will return every parameter collected at this site. (default option)

    • stage: ‘00065’

    • discharge: ‘00060’

    • not all sites collect all parameters!

    • See https://nwis.waterdata.usgs.gov/usa/nwis/pmcodes for full list

  • period (str) –

    NWIS period code. Default is None.
    • Format is “PxxD”, where xx is the number of days before today.

    • Either use start_date or period, but not both.

  • verbose (bool) – If True (default); will print confirmation messages with the url before and after the request is made.

Returns:

a response object. This function will always return the response,

even if the NWIS returns a status_code that indicates a problem.

  • response.url: the url we used to request data

  • response.json: the content translated as json

  • response.status_code: the internet status code
  • response.ok: True when we get a ‘200’ status_code

Raises:
  • ConnectionError – due to connection problems like refused connection or DNS Error.

  • SyntaxWarning – when NWIS returns a response code that is not 200.

Example:

>>> import hydrofunctions as hf
>>> response = hf.get_nwis('01585200', 'dv', '2012-06-01', '2012-07-01')
>>> response
<response [200]>
>>> response.json()
*JSON ensues*
>>> hf.extract_nwis_df(response)
*a Pandas dataframe appears*

Other Valid Ways to Make a Request:

>>> sites = ['07180500', '03380475', '06926000'] # Request a list of sites.
>>> service = 'iv'  # Request real-time data
>>> days = 'P10D'  # Request the last 10 days.
>>> stage = '00065' # Sites that collect discharge usually collect water depth too.
>>> response2 = hf.get_nwis(sites, service, period=days, parameterCd=stage)

Request Data By Location:

>>> # Request the most recent daily data for every site in Maine
>>> response3 = hf.get_nwis(None, 'dv', stateCd='ME')
>>> response3
<Response [200]>

The specification for the USGS NWIS IV service is located here: http://waterservices.usgs.gov/rest/IV-Service.html

hydrofunctions.hydrofunctions.get_nwis_property(nwis_dict, key=None, remove_duplicates=False)[source]

Returns a list containing property data from an NWIS response object.

Parameters:
  • nwis_dict (dict) – the json returned in a response object as produced by get_nwis().json().

  • key (str) –

    a valid NWIS response property key. Default is None. The index is returned if key is None. Valid keys are:
    • None

    • name - constructed name “provider:site:parameterCd:statistic”

    • siteName

    • siteCode

    • timeZoneInfo

    • geoLocation

    • siteType

    • siteProperty

    • variableCode

    • variableName

    • variableDescription

    • valueType

    • unit

    • options

    • noDataValue

  • remove_duplicates (bool) – a flag used to remove duplicate values in the returned list.

Returns:

a list with the data for the passed key string.

Raises:
  • HydroNoDataError – when the request is valid, but NWIS has no data for the parameters provided in the request.

  • ValueError when the key is not available.

hydrofunctions.hydrofunctions.nwis_custom_status_codes(response)[source]

Raise custom warning messages from the NWIS when it returns a status_code that is not 200.

Parameters:

response – a response object as returned by get_nwis().

Returns:

None if response.status_code == 200

Raises:

HydroNoDataError – when a non-200 status code is returned. https://en.wikipedia.org/wiki/List_of_HTTP_status_codes

Note

NWIS status_code messages come from:

https://waterservices.usgs.gov/docs/portable_code.html

Additional status code documentation:

https://waterservices.usgs.gov/rest/IV-Service.html#Error

hydrofunctions.hydrofunctions.read_json_gzip(filename)[source]

Read a gzipped JSON file into a Python dictionary.

Reads JSON files that have been zipped and returns a Python dictionary. Usually the files should have an extension .json.gz Hydrofunctions uses this function to store the original JSON format WaterML response from the USGS NWIS.

Parameters:

filename (str) – A string with the filename and extension.

Returns:

a dictionary of the file contents.

hydrofunctions.hydrofunctions.read_parquet(filename)[source]

Read a hydrofunctions parquet file.

This function will read a parquet file that was saved by hydrofunctions.save_parquet() and return a dataframe and a metadata dictionary.

Parameters:

filename (str) – A string with the filename and extension.

Returns:

a pandas dataframe. meta (dict): a dictionary with the metadata for the NWIS data request, if it exists.

Return type:

dataframe (pd.DataFrame)

hydrofunctions.hydrofunctions.save_json_gzip(filename, json_dict)[source]

Save a Python dictionary as a gzipped JSON file.

This save function is especially designed to compress and save the original JSON response from the USGS NWIS. If no file extension is specified, then a .json.gz extension will be provided.

Parameters:
  • filename (str) – A string with the filename and extension.

  • json_dict (dict) – A dictionary representing the json content.

hydrofunctions.hydrofunctions.save_parquet(filename, dataframe, hf_meta)[source]

Save a hydrofunctions parquet file.

This function will save a dataframe and a dictionary into the parquet format. Parquet files are a compact, easy to process format that work well with Pandas and large datasets. This function will accompany the dataframe with a dictionary of NWIS metadata that is produced by the hydrofunctions.extract_nwis_df() function. This file can then be read by the hydrofunctions.read_parquet() function.

Parameters:
  • filename (str) – A string with the filename and extension.

  • dataframe (pd.DataFrame) – a pandas dataframe.

  • hf_meta (dict) – a dictionary with the metadata for the NWIS data request, if it exists.

hydrofunctions.hydrofunctions.select_data(nwis_df)[source]

Create a boolean array of columns that contain data.

Parameters:

nwis_df – A pandas dataframe created by extract_nwis_df.

Returns:

an array of Boolean values corresponding to the columns in the original dataframe.

Example

>>> my_dataframe[:, select_data(my_dataframe)]

returns a dataframe with only the data columns; the qualifier columns do not show.

hydrofunctions.logging module

hydrofunctions.logging

This module contains the tools used for internal diagnostic logging.

Logging is disabled by default. Users can start logging by using the hf._start_logging() function. This will create a file “hydrofunctions_testing.log” in the main directory. This function also allows users to set the level of severity that will be logged. The default is to capture all messages, including the lowest level ‘DEBUG’ messages.

To create log messages within a module, follow these steps:

  1. Create a custom logger for the module.
    • Place the statement logger = logging.getLogger(__name__) at the top of the module.

    • This will create a custom logger that is named after the module.

    • call the logger like this: logger.info(“Hello!”)

  2. Log a message within your code.
    • Create a message: msg = “This is the text of the message.”

    • Include the value of important variables

    • There is no need to include the time or name of the module or function. These are included in the standard message format.

    • Decide on a ‘level’ for the message:
      • DEBUG: this is the lowest level; for tracking ordinary internal values

      • INFO: internal or user events that are working as expected

      • WARNING: situations where back-up procedures are needed, unexpected situations and possibly ordinary exceptions that have been caught

      • ERROR: internal problems that prevent the software from completing an action

      • CRITICAL: serious errors that cause the shutdown of the software

    • Add the message to the log file: logger.info(msg)

    • Each level of severity has its own method: .debug(), .info(), .critical(), etc.

    • HydroExceptions such as HydroNoDataError generate their own error logs.

    • It might be useful to log other errors when they are raised.

  3. Start the logging system.
    • Logging is off by default.

    • To start logging, call hydrofunctions._start_logging()

    • You can specify the level that will be captured in the log with the loglevel parameter.

    • Set level like this: hf._start_logging(‘info’) (Case does not matter)

    • The default is to capture from the lowest level (DEBUG) up

    • Starting the logging system will create a new file “hydrofunctions_testing.log” if it doesn’t already exist; if it does, it will add new messages at the bottom under a start up message to the ‘root’ module.

  4. Read the log.
    • The file, “hydrofunctions_testing.log” will appear in the root directory

    • All messages from hydrofunctions will have the following:
      • timestamp

      • name of logger: ‘root’ for the start message, all others should be named for the module that creates the message

      • levelname: the level of the message (‘DEBUG’, ‘INFO’, etc)

      • funcName: the name of the function that sent the message to the log

      • message: the message generated by the logger function: logger.info(“Hello!”)

    • The first message created by the hf._start_logging() will be from ‘root’

    • Messages from dependencies will be captured too.


hydrofunctions.station module

hydrofunctions.station

This module contains the Station and NWIS classes, which are used for organizing and managing data for data collection sites.


class hydrofunctions.station.NWIS(site=None, service='dv', start_date=None, end_date=None, stateCd=None, countyCd=None, bBox=None, parameterCd='all', period=None, interpolate=False, file=None, verbose=True)[source]

Bases: Station

A class for working with data from the USGS NWIS service.

Parameters:
  • site (str or list of strings) – a valid site is ‘01585200’ or [‘01585200’, ‘01646502’]. Default is None. If site is not specified, you will need to select sites using stateCd or countyCd.

  • service (str) –

    can either be ‘iv’ or ‘dv’ for instantaneous or daily data.
    • ’dv’(default): daily values. Mean value for an entire day.

    • ’iv’: instantaneous value measured at this time. Also known as ‘Real-time data’. Can be measured as often as every five minutes by the USGS. 15 minutes is more typical.

  • start_date (str) – should take on the form ‘yyyy-mm-dd’

  • end_date (str) – should take on the form ‘yyyy-mm-dd’

  • stateCd (str) – a valid two-letter state postal abbreviation, such as ‘MD’. Default is None. Selects all stations in this state. Because this type of site selection returns a large number of sites, you should limit the amount of data requested for each site.

  • countyCd (str or list of strings) – a valid county FIPS code. Default is None. Requests all stations within the county or list of counties. See https://en.wikipedia.org/wiki/FIPS_county_code for an explanation of FIPS codes.

  • bBox (str, list, or tuple) –

    a set of coordinates that defines a bounding box.
    • Coordinates are in decimal degrees.

    • Longitude values are negative (west of the prime meridian).

    • Latitude values are positive (north of the equator).

    • comma-delimited, no spaces, if provided as a string.

    • The order of the boundaries should be: “West,South,East,North”

    • Example: “-83.000000,36.500000,-81.000000,38.500000”

  • parameterCd (str or list of strings) –

    NWIS parameter code. Usually a five digit code. Default is ‘all’. A valid code can also be given as a list: parameterCd=[‘00060’,’00065’] This will request data for this parameter.

    • if value is ‘all’, or no value is submitted, then NWIS will return every parameter collected at this site. (default option)

    • stage: ‘00065’

    • discharge: ‘00060’

    • not all sites collect all parameters!

    • See https://nwis.waterdata.usgs.gov/usa/nwis/pmcodes for full list

  • period (str) –

    NWIS period code. Default is None.
    • Format is “PxxD”, where xx is the number of days before today, with a maximum of 999 days accepted.

    • Either use start_date or period, but not both.

  • interpolate (bool) – Fill missing values through interpolation. Default False.

  • file (str) –

    A filename for acting as a cache for the data request. Accepts file extensions of ‘.json.gz’ (default) and ‘.parquet’. If this parameter is included, the NWIS object will first attempt to read its data from the file. If the file does not exist, it will use the other parameters to obtain the data and will then save to the provided filename.

    Zipped JSON files will save the original WaterML JSON provided by the NWIS. Parquet files will save the dataframe and the metadata for the NWIS object.

  • verbose (bool) – Print output for actions such as making data requests. Default is True.

df(*args)[source]

Return a subset of columns from the dataframe.

Parameters:
  • '' – If no args are provided, the entire dataframe will be returned.

  • 'all' (str) – the entire dataframe will be returned.

  • 'data' (str) – all of the parameters will be returned, with no flags.

  • 'flags' (str) – Only the _qualifier flags will be returned. Unless the flags arg is provided, only data columns will be returned. Visit https://waterdata.usgs.gov/usa/nwis/uv?codes_help#dv_cd1 to see a more complete listing of possible codes.

  • 'q' (str 'discharge' or) – discharge columns (‘00060’) will be returned.

  • 'stage' (str) – Gauge height columns (‘00065’) will be returned.

  • number (str any eight to twelve digit) – any matching parameter columns will be returned. ‘00065’ returns stage, for example.

  • number – any matching stations will be returned.

get_data()[source]

Deprecated since version version: 0.2.0 No longer needed. NWIS object will request data upon creation.

read(file)[source]

Read from a zipped WaterML file ‘.json.gz’ or from a parquet file.

Parameters:

file (str) – the filename to read from.

save(file)[source]

Save the dataframe and metadata to a parquet file.

Parameters:

file (str) – the filename to save to.

class hydrofunctions.station.Station(site=None)[source]

Bases: object

A class for organizing stream gauge data for a single request.

station_dict = {}

hydrofunctions.validate module

hydrofunctions.validate

This module contains functions for testing that user input is valid.

Why ‘pre-check’ user imputs, instead of using standard python duck typing? These functions are meant to enhance an interactive session for the user, and will check a user’s parameters before requesting data from an online resource. Otherwise, the server will return a 404 code and the user will have no idea why. Hydrofunctions tries to raise an exception (usually a TypeError) before a request is made, so that the user can fix their request. It also tries to provide a helpful error message to an interactive session user.

Suggested format for these functions:

  • first check that the input is a string,

  • then do a regular expression to check that the input is more or less valid.

  • raise exceptions when user input breaks format.


hydrofunctions.validate.check_NWIS_bBox(input)[source]

Checks that the USGS bBox is valid.

hydrofunctions.validate.check_NWIS_service(input)[source]

Checks that the service is valid: either ‘iv’ or ‘dv’

hydrofunctions.validate.check_datestr(input)[source]

Checks that the start_date or end_date parameter is in yyyy-mm-dd format.

hydrofunctions.validate.check_parameter_string(candidate, param)[source]

Checks that a parameter is a string or a list of strings.

hydrofunctions.validate.check_period(input)[source]

Checks that the period parameter in is the P##D format, where ## is the number of days before now.

hydrofunctions.usgs_rdb module

hydrofunctions.usgs_rdb

This module is for working with the various USGS dataservices that use the rdb text format. These include the statistics service, the field measurements service, the rating curve service, and the peak discharge service.

hydrofunctions.usgs_rdb.data_catalog(site, verbose=True)[source]

Load a history of the data collected at a site into a Pandas dataframe.

Parameters:
  • site (str) – The gauge ID number for the site.

  • verbose (bool) – If True (default), will print confirmation messages with the url before and after the request.

Returns:

a hydroRDB object or tuple consisting of the header and a pandas dataframe. The dataframe will have one row for every type of data collected at each site requested; for each data parameter it will provide information including: parameter code, date of first observation, date of last observation, and total number of observations. A full description of the data catalog is given in the header; more information is available at: http://waterservices.usgs.gov/rest/Site-Service.html

For information about the site itself, including watershed area and HUC code, use the ‘site_file’ function.

Example:

>>> test = data_catalog('01542500')
>>> test
hydroRDB(header=<a mulit-line string of the header>,
         table=<a Pandas dataframe>)

You can also access the header, dataframe, column names, and data types through the associated properties header, table, columns, dtypes:

>>> test.table
<a Pandas dataframe>
hydrofunctions.usgs_rdb.field_meas(site, verbose=True)[source]

Load USGS field measurements of stream discharge into a Pandas dataframe.

Parameters:
  • site (str) – The gauge ID number for the site.

  • verbose (bool) – If True (default), will print confirmation messages with the url before and after the request.

Returns:

a hydroRDB object or tuple consisting of the header and a pandas dataframe. Each row of the table represents an observation on a given date of river conditions at the gauge by USGS personnel. Values are stored in columns, and include the measured stream discharge, channel width, channel area, depth, and velocity.

Example:

>>> test = field_meas('01542500')
>>> test
hydroRDB(header=<a mulit-line string of the header>,
         table=<a Pandas dataframe>)

You can also access the header, dataframe, column names, and data types through the associated properties header, table, columns, dtypes:

>>> test.table
<a Pandas dataframe>
Discussion:

The USGS operates over 8,000 stream gages around the United States and territories. Each of these sensors records the depth, or ‘stage’ of the water. In order to translate this stage data into stream discharge, the USGS staff creates an empirical relationship called a ‘rating curve’ between the river stage and stream discharge. To construct this curve, the USGS personnel visit all of the gage every one to eight weeks, and measure the stage and the discharge of the river manually.

The field_meas() function returns all of the field-collected data for this site. The USGS uses these data to create the rating curve. You can use these data to see how the site has changed over time, or to read the notes about local conditions.

The rating_curve() function returns the most recent ‘expanded shift- adjusted’ rating curve constructed for this site. This is the current official rating curve.

To plot a rating curve from the field measurements, use:

>>> header, data = hf.field_meas('01581830')

>>> data.plot(x='gage_height_va', y='discharge_va', kind='scatter')

Rating curves are typically plotted with the indepedent variable, gage_height, plotted on the Y axis.

hydrofunctions.usgs_rdb.get_usgs_RDB_service(url, headers=None, params=None)[source]

Request data from a USGS dataservice and handle errors.

Parameters:
  • url (str) – a string used by Requests as the base URL.

  • header (dict) – a dict of parameters used to request the data.

  • params (dict) – a dict of parameters used to modify the url of a REST service.

Returns:

A Requests response object.

Raises:

This function will raise an exception for any non-200 status code, and in cases where the USGS service returns anything that is not obviously an RDB file. If an exception is raised, then an attempt will be made to display the error page which the USGS sometimes sends back to the user.

class hydrofunctions.usgs_rdb.hydroRDB(header, table, columns, dtypes, rdb_str)[source]

Bases: object

A class for holding the information from USGS rdb files.

Parameters:
  • header (str) – A multi-line string from the header of the rdb file. The header often contain important metadata and user warnings.

  • table (pandas dataframe) – This is a dataframe made from the rdb file.

  • columns (str) – A string from the rdb file that lists the column names.

  • dtypes (str) – A string from the rdb file that gives the data type and length of each column.

  • rdb (str) – The complete original text of the rdb file.

Properties:
header (str):

A multi-line string from the header of the rdb file. The header often contain important metadata and user warnings.

table (pandas dataframe):

This is a dataframe made from the rdb file.

columns (str):

A string from the rdb file that lists the column names.

dtypes (str):

A string from the rdb file that gives the data type and length of each column.

rdb (str):

The original, unparsed rdb file as returned by the USGS.

You can also access the header and the dataframe as a named tuple:

hydroRDB(header=<a multi-line string>, table=<pandas dataframe>)

Note

  • The args to create this object are supplied by hf.read_rdb().

  • The hydroRDB object is returned from several functions that request RDB files from a USGS data service, including: peaks(), field_meas(), rating_curve(), stats(), site_file(), and data_catalog().

  • You can read more about the RDB format here: https://pubs.usgs.gov/of/2003/ofr03123/6.4rdb_format.pdf

hydrofunctions.usgs_rdb.peaks(site, verbose=True)[source]

Return a series of annual peak discharges.

Parameters:
  • site (str) – The gauge ID number for the site.

  • verbose (bool) – If True (default), will print confirmation messages with the url before and after the request.

Returns:

a hydroRDB object or tuple consisting of the header and a table. The header is a multi-line string of metadata supplied by the USGS with the data series. The table is a dataframe containing the annual peak discharge series. You can use these data to conduct a flood frequency analysis.

Example:

>>> test = hf.peaks('01542500')
>>> test
hydroRDB(header=<a mulit-line string of the header>,
         table=<a Pandas dataframe>)

You can also access the header, dataframe, column names, and data types through the associated properties header, table, columns, dtypes:

>>> test.table
<a Pandas dataframe>
hydrofunctions.usgs_rdb.rating_curve(site, verbose=True)[source]

Return the most recent USGS expanded-shift-adjusted rating curve for a given stream gage into a dataframe.

Parameters:
  • site (str) – The gage ID number for the site.

  • verbose (bool) – If True (default), will print confirmation messages with the url before and after the request.

Returns:

a hydroRDB object or tuple consisting of the header and a table. The header is a multi-line string of metadata supplied by the USGS with the data series. The table is a dataframe containing the latest official rating curve for the site.

Example:

>>> test = rating_curve('01542500')
>>> test
hydroRDB(header=<a mulit-line string of the header>,
         table=<a Pandas dataframe>)

You can also access the header, dataframe, column names, and data types through the associated properties header, table, columns, dtypes:

>>> test.table
<a Pandas dataframe>
Discussion:

The USGS operates over 8,000 stream gauges around the United States and territories. Each of these sensors records the depth, or ‘stage’ of the water. In order to translate this stage data into stream discharge, the USGS staff creates an empirical relationship called a ‘rating curve’ between the river stage and stream discharge.

See hf.field_meas() to access the field data used to construct the rating curve.

Note: Rating curves change over time.

hydrofunctions.usgs_rdb.read_rdb(text)[source]

Read strings that are in rdb format.

Parameters:

text (str) – A long string containing the contents of a rdb file. A common way to obtain these would be from the .text property of a requests response, as in the example usage below.

Returns:

Every commented line at the top of the rdb file is marked with a

’#’ symbol. Each of these lines is stored in this output.

outputDF (pandas.DataFrame):

A dataframe containing the information in the rdb file. site_no and parameter_cd are interpreted as a string, but every other number is interpreted as a float or int; missing values as an np.nan; strings for everything else.

columns (list of strings):

The column names, taken from the rdb header row.

dtypes (list of strings):

The second header row from the rdb file. These mostly tell the column width, and typically record everything as string data (‘s’) type. The exception to this are dates, which are listed with a ‘d’.

Return type:

header (multi-line string)

hydrofunctions.usgs_rdb.site_file(site, verbose=True)[source]

Load USGS site file into a Pandas dataframe.

Parameters:
  • site (str) – The gauge ID number for the site.

  • verbose (bool) – If True (default), will print confirmation messages with the url before and after the request.

Returns:

a hydroRDB object or tuple consisting of the header and a pandas dataframe. The dataframe will have one row for every site requested; for each site it will provide detailed site characteristics such as watershed area, drainage basin HUC code, site latitude, longitude, altitude, and datum; the date the site was established, hole depth for wells, and other information. All of the columns are listed in the header; for more information, visit: http://waterservices.usgs.gov/rest/Site-Service.html

For information on the data collected at this site (including the start and stop dates for data collection), use the ‘data_catalog’ function.

Example:

>>> test = site_file('01542500')
>>> test
hydroRDB(header=<a multi-line string of the header>,
         table=<a Pandas dataframe>)

You can also access the header, dataframe, column names, and data types through the associated properties header, table, columns, dtypes:

>>> test.table
<a Pandas dataframe>
hydrofunctions.usgs_rdb.stats(site, statReportType='daily', verbose=True, **kwargs)[source]

Return statistics from the USGS Stats Service as a dataframe.

Parameters:
  • site (str) – The gage ID number for the site, or a series of gage IDs separated by commas, like this: ‘01546500,01548000’.

  • statReportType ('daily'|'monthly'|'annual') –

    There are three different types of report that you can request.

    • ’daily’ (default): calculate statistics for each of 365 days.

    • ’monthly’: calculate statistics for each of the twelve months.

    • ’annual’: calculate annual statistics for each year since the start of the record.

  • verbose (bool) – If True (default), will print confirmation messages with the url before and after the request.

Returns:

a hydroRDB object or tuple consisting of the header and a table. The header is a multi-line string of metadata supplied by the USGS with the data series. The table is a dataframe containing the latest official statistics for the site.

Raises:

HTTPError – when a non-200 http status code is returned.

Example:

>>> test = stats('01542500', 'monthly')
>>> test
hydroRDB(header=<a mulit-line string of the header>,
         table=<a Pandas dataframe>)

You can also access the header, dataframe, column names, and data types through the associated properties header, table, columns, dtypes:

>>> test.table
<a Pandas dataframe>

Note

This function is based on the USGS statistics service, described here: https://waterservices.usgs.gov/rest/Statistics-Service.html

The USGS Statistics Service allows you to specify a wide array of additional parameters in your request. You can provide these parameters as keyword arguments, like in this example:

>>> hf.stats('01452500', parameterCD='00060')

This will only request statistics for discharge, which is specified with the ‘00060’ parameter code.

Additional useful parameters include:

  • parameterCD=’00060,00065’ Limit the request for statistics to only one parameter or to a list of parameters. The default behavior is to provide statistics for every parameter that has been measured at this site. In this example, statistics for discharge (‘00060’) and stage (‘00065’) are requested.

  • statYearType=’water’ Calculate annual statistics based on the water year, which runs from October 1st to September 31st. This parameter is only for use with annual reports. If not specified, the default behavior will use calendar years for reporting.

  • missingData=’on’ Calculate statistics even when there are some missing values. If not specified, the default behavior is to drop years that have fewer than 365 values from annual reports, and to drop months that have fewer than 30 values in monthly reports. The number of values used to calculate a statistic is reported in the ‘count_nu’ column.

  • You can read about other useful parameters here: https://waterservices.usgs.gov/rest/Statistics-Service.html#statistical_Controls

hydrofunctions.waterwatch module

hydrofunctions.waterwatch

This module is for working with the five USGS WaterWatch Data Services. Description of data services https://waterwatch.usgs.gov/webservices/

Main page: https://waterwatch.usgs.gov

NOTICE (taken from waterwatch.usgs.gov): In January 2020, USGS WaterWatch began operating in maintenance-only mode. Existing tools, features, and web data services are being fully maintained as before, but new tools and enhancements will no longer be developed. Please click here for more information or contact USGS WaterWatch if you have any questions.

The WaterWatch program provides five data services with REST APIs:

  • Current Conditions Real-Time Streamflow Service

  • Flood and High Flow Service

  • Average Streamflow for 7, 14, and 28 Days Service

  • Hourly Flow Change Service

  • Flood Stage Service

Hydrofunctions allows you to access each of these services as either a dictionary or a dataframe with the station ID as the key/index. —–

hydrofunctions.waterwatch.filter_flood_stages(all_flood_stages, sites_numbers=None)[source]

Filters flood states of specific station numbers

hydrofunctions.waterwatch.get_flood_stage(site=None, output_format='dict')[source]

Retrieves flood stages for a list of station numbers.

This function retrieves a dictionary of flood stages for each site. The ‘stage’ of a river is the height of the river surface at a stream gauge, expressed as a height above an arbitrary datum. It is similar to water depth, except that datums are usually set so that the zero (0) to be well below the lowest elevation of the stream bed. This is done so that even if there is erosion over time, the stream bed and the river stage will never reach an elevation that is less than zero. Stage is usually expressed in feet in this dataset. You can retrieve the stage of the river using the parameter ‘00065’, whereas the discharge of the river is ‘00060’.

There are several kinds of flood stage reported in these data:

  • action stage: If the water gets above this level, it triggers an action by

    the National Weather Service.

  • flood stage: Water at this level begins to be a hazard to lives, property, or

    commerce. Not necessarily the same as bankfull stage.

  • moderate flood stage: structures and roads begin to be inundated.

  • major flood stage: requires significant evacuations of people or transfer of

    property to higher elevations.

See https://waterwatch.usgs.gov/webservices/ for more information.

Parameters:
  • site (str or list of str) – The USGS site ID number or a list of numbers.

  • output_format – Optional output format. Returns dict if ‘dict’ else returns pd.DataFrame

Returns: Dictionary or DataFrame of station numbers and their flood stages. If

there is no flood stage for a station, None is returned.

Example

>>> stations = ["07144100", "07144101"]
>>> res = get_flood_stage(stations, output_format="dict")  # dictionary output
>>> print(res)
{'07144100': {'action_stage': '20', 'flood_stage': '22', 'moderate_flood_stage': '25', 'major_flood_stage': '26'},
 '07144101': None}
>>> print(get_flood_stage(stations))