`pysep`#

Subpackages#

pysep.utils

Submodules#

Package Contents#

Classes#

`Pysep`	Download, preprocess, and save waveform data using ObsPy
`RecordSection`	Record section plotting tool which takes ObsPy streams and:

Functions#

`get_data`([config_file, event, inv, st, write_files, ...])	Interactive/scripting function to run PySep and return quality controlled,
`plotw_rs`(args, *kwargs)	Plot Waveform Record Section (main call function for RecordSection)

Attributes#

`__version__`
`logger`
`ch`
`FORMAT`
`formatter`

pysep.__version__[source]#

pysep.logger[source]#

pysep.ch[source]#

pysep.FORMAT = '[%(asctime)s] - %(name)s - %(levelname)s: %(message)s'[source]#

pysep.formatter[source]#

class pysep.Pysep(config_file=None, event_selection='default', client='IRIS', origin_time=None, reference_time=None, networks='*', stations='*', locations='*', channels='*', station_ids=None, use_mass_download=False, extra_download_pct=0.005, event_latitude=None, event_longitude=None, event_depth_km=None, event_magnitude=None, remove_response=True, remove_clipped=True, remove_insufficient_length=True, remove_masked_data=True, water_level=60, detrend=True, demean=True, taper_percentage=0.0, rotate=None, pre_filt='default', fill_data_gaps=False, gap_fraction=1.0, mindistance_km=0, maxdistance_km=20000.0, minazimuth=0, maxazimuth=360, minlatitude=None, minlongitude=None, maxlatitude=None, maxlongitude=None, resample_freq=None, scale_factor=1, phase_list=None, seconds_before_event=20, seconds_after_event=20, seconds_before_ref=100, seconds_after_ref=300, taup_model='ak135', output_unit='VEL', user=None, password=None, client_debug=False, log_level='DEBUG', timeout=600, write_files='inv,event,stream,sac,config_file,station_list', plot_files='all', llnl_db_path=None, output_dir=None, overwrite=False, legacy_naming=False, overwrite_event_tag=None, **kwargs)[source]#

Download, preprocess, and save waveform data using ObsPy

Note

Parameters for general data gathering control

Parameters:

client (str) – ObsPy FDSN client to query data from, e.g., IRIS, LLNL, NCEDC or any FDSN clients accepted by ObsPy. Defaults to ‘IRIS’
minlatitude (float) – for event, station and waveform retrieval. Defines the minimum latitude for a rectangular bounding box that is used to search for data. Only used for events if `event_selection`==’search’
maxlatitude (float) – for event, station and waveform retrieval. Defines the maximum latitude for a rectangular bounding box that is used to search for data. Only used for events if `event_selection`==’search’
minlongitude (float) – for event, station and waveform retrieval. Defines the minimum longitude for a rectangular bounding box that is used to search for data. Only used for events if `event_selection`==’search’
maxlongitude (float) – for event, station and waveform retrieval. Defines the maximum longitude for a rectangular bounding box that is used to search for data. Only used for events if `event_selection`==’search’
user (str) – User ID if IRIS embargoes data behind passwords. This is passed into the instantiation of client.
password (str) – Password if IRIS embargoes data behind passwords. This is passed into the instantiation of ‘client’
use_mass_download (bool) – Use ObsPy’s mass download option to download all available stations in the region regardless of data provider.
client_debug (bool) – turn on DEBUG mode for the ObsPy FDSN client, which outputs information-rich log messages to std out. Use for debugging when FDSN fails mysteriously.
timeout (float) – time out time in units of seconds, passed to the client to determine how long to wait for return data before exiting. Defaults to 600s.
llnl_db_path (str) – If `client`==’LLNL’, PySEP assumes we are accesing data from the LLNL waveeform database (which must be stored local). Points to the path where this is saved.

Note

Event selection parameters

Parameters:

event_selection (str) – How to define the Event which is used to define the event origin time and hypocentral location. - ‘default’: User defines Event origin_time, and location with event_latitude and event_longitude - ‘search’: PySEP will use client to search for a Catalog event defined by event_origintime, event_magnitude and event_depth_km. Buffer time around the origin_time can be defined by seconds_before_event and seconds_after_event.
origin_time (str) – the event origin time used as a central reference point for data gathering. Must be in a string format that is recognized by ObsPy UTCDateTime. For example ‘2000-01-01T00:00:00’.
event_latitude (float) – latitude of the event in units of degrees. used for defining the event hypocenter and for removing stations based on distance from the event.
event_longitude (float) – longitude of the event in units of degrees. used for defining the event hypocenter and for removing stations based on distance from the event.
event_depth_km (float or NoneType) –
depth of event in units of kilometers. postive values for deeper depths. Used for:
1. `event_selection`==’search’
2. estimating phase arrivals with TauP
3. plotting events and title on source receiver maps
If set to None, (2) and (3) will fail. Best-guesses are acceptable.
event_magnitude (float or NoneType) – event magnitude in Mw used for `event_selection`==’search’ and source receiver map plotting. If provided as None, map plotting will fail.
seconds_before_event (float) – For event selection only, only used if event_selection`==’search’. Time [s] before given `origin_time to search for a matching catalog event from the given client
seconds_after_event (float) – For event selection only, only used if event_selection`==’search’. Time [s] after given `origin_time to search for a matching catalog event from the given client

Note

Waveform and station metadata gathering parameters

Parameters:

reference_time (str) – Waveform origin time. If not given, defaults to the event origin time. This allows for a static time shift from the event origin time, e.g., if there are timing errors with relation to the origin_time. Defaults to NoneType (origin_time).
seconds_before_ref (float) – For waveform fetching. Defines the time before reference_time to fetch waveform data. Units [s]
seconds_after_ref (float) – For waveform fetching. Defines the time after reference_time to fetch waveform data. Units [s]
extra_download_pct (float) – extra download percentage. Adds a buffer around origin_time + seconds_before_ref + extra_download_pct (also -seconds_after_ref), which gathers a bit of extra data which will be trimmed away. Used because gathering data directly at the requested time limits may lead to shorter expected waveforms after resampling or preprocessing procedures. Given as a percent [0,1], defaults to .5%.
networks (str) – name or names of networks to query for, if names plural, must be a comma-separated list, i.e., ‘AK,AT,AV’. Wildcards okay, defaults to ‘*’.
stations (str) – station name or names to query for. If multiple stations, input as a list of comma-separated values, e.g., ‘STA01,STA02,STA03’. Wildcards acceptable, if using wildcards, use a ‘-’ to exclude stations (e.g., ‘,-STA01’ will gather all stations available, except STA01. Defaults to ‘’
locations (str) – locations name or names to query for, wildcard okay. See stations for inputting multiple location values. Default ‘*’.
channels (str) – channel name or names to query for, wildcard okay. If multiple stations, input as a list of comma-separated values, e.g., ‘HH?,BH?’. Wildcards acceptable. Defaults to ‘*’.
station_ids (list of str) – an alternative to gathering based on individual codes, allow the user to input a direct list of trace IDs which will be broken up and used to gather waveforms and metadata. NOTE: OVERRIDES network, stations, locations, and channels, these parameters will NOT be used. Station ids should be provided as: [‘NN.SSS.LL.CCC’, …]

Note

Station removal and curtailing parameters

Parameters:

mindistance_km (float) –
Used for removing stations and mass download option
- Removing stations: Remove any stations who are closer than the
given minimum distance away from event (units: km). Always applied - Mass Download: If use_mass_download is True and `domain_type`==’circular’, defines the minimum radius around the event hypocenter to gather waveform data and station metadata
maxdistance_km (float) –
Used for removing stations and mass download option
- Removing stations: Remove any stations who are farther than the
given maximum distance away from event (units: km). Always applied - Mass Download: If use_mass_download is True and `domain_type`==’circular’, defines the maximum radius around the event hypocenter to gather waveform data and station metadata
minazimuth (float) – for station removal. stations whose azimuth relative to the event hypocenter that do not fall within the bounds [minazimuth, maxazimuth] are removed from the final list. Defaults to 0 degrees.
minazimuth – for station removal. stations whose azimuth relative to the event hypocenter that do not fall within the bounds [minazimuth, maxazimuth] are removed from the final list. Defaults to 360 degrees.
remove_clipped (bool) – remove any clipped stations from gathered stations. Checks the max amplitude of against a maximum value expected for a 24 bit signal. Defaults False
remove_insufficient_length (bool) – remove waveforms whose trace length does not match the average (mode) trace length in the stream. Defaults to True
remove_masked_data (bool) – If fill_data_gaps is False or None, data with gaps that go through the merge process will contain masked arrays (essentially retaining gaps). By default, PySEP will remove these data during processing. To keep this data, set remove_masked_data == True.
fill_data_gaps (str or int or float or bool) –
How to deal with data gaps (missing sections of waveform over a continuous time span). False by default, which means data with gaps are removed completely. Users who want access to data with gaps must choose how gaps are filled. See API for ObsPy.core.stream.Stream.merge() for how merge is handled:

Options include:
- ’mean’: fill with the mean of all data values in the gappy data
- <int or float>: fill with a constant, user-defined value, e.g.,
0 or 1.23 or 9.999 - ‘interpolate’: linearly interpolate from the last value pre-gap to the first value post-gap - ‘latest’: fill with the last value of pre-gap data - False: do not fill data gaps, which will lead to stations w/ data gaps being removed.

NOTE: Be careful about data types, as there are no checks that the fill value matches the internal data types. This may cause unexpected errors.
gap_fraction (float) – if fill_data_gaps is not None, determines the maximum allowable fraction (percentage) of data that gaps can comprise. For example, a value of 0.3 means that 30% of the data (in samples) can be gaps that will be filled by fill_data_gaps. Traces with gap fractions that exceed this value will be removed. Defaults to 1. (100%) of data can be gaps.

Note

Data processing parameters

Parameters:

detrend (bool) – apply simple linear detrend as the first preprocessing step
demean (bool) – apply demeaning to data during instrument reseponse removal. Only applied if remove_response == True.
taper_percentage (float) – apply a taper to the waveform with ObsPy taper, fraction between 0 and 1 as the percentage of the waveform to be tapered Applied generally used when data is noisy, e.g., HutchisonGhosh2016 Note: To get the same results as the default taper in SAC, use max_percentage=0.05 and leave type as hann. Tapering also happens while resampling (see util_write_cap.py). Only applied if remove_response == True.
rotate (list of str or NoneType) –
choose how to rotate the waveform data. pre-rotation processing will be applied. Can include the following options (order insensitive):
- ZNE: Rotate from arbitrary components to North, East, Up
- RTZ: Rotate from ZNE to Radial, Transverse, Up
- UVW: Rotate from ZNE to orthogonal UVW orientation
If set to None, no rotation processing will take place.
resample_freq (float) – frequency to resample data in units Hz. If not given, no data resampling will take place. Defaults to NoneType
scale_factor (float) – scale all data by a constant factor Note: for CAP use 10**2 (to convert m/s to cm/s). Defaults to NoneType (no scaling applied)

Note

Instrument response removal parameters

Parameters:

remove_response (bool) – remove instrument response using station response information gathered from client. Defaults to True.
output_unit (str) – the output format of the waveforms if instrument response removal is applied. Only relevant if `remove_response`==True. See ObsPy.core.trace.Trace.remove_response for acceptable values. Typical values are: ‘DISP’, ‘VEL’, ‘ACC’ (displacement [m], velocity [m/s], acceleration [m/s^2]).
water_level (float or None) – a water level threshold to apply during filtering for small values. Passed to Obspy.core.trace.Trace.remove_response
pre_filt (str, tuple or NoneType) –
apply a pre-filter to the waveforms before deconvolving instrument response. Options are:
- ’default’: automatically calculate (f0, f1, f2, f3) based on the
length of the waveform (dictating longest allowable period) and the sampling rate (dictating shortest allowable period). This is the default behavior. * NoneType: do not apply any pre-filtering * tuple of float: (f0, f1, f2, f3) define the corners of your pre filter in units of frequency (Hz)

Note

SAC header control parameters

Parameters:

phase_list (list of str) – phase names to get ray information from TauP with. Defaults to ‘ttall’, which is ObsPy’s default for getting all phase arrivals. Must match Phases expected by TauP (see ObsPy TauP documentation for acceptable phases). Earliest P and S phase arrivals will be added to SAC headers, the remainder will be discarded.
taup_model (str) – name of TauP model to use to calculate phase arrivals See also phase_list which defines phases to grab arrival data for. Defaults to ‘AK135’. See ObsPy TauP documentation for avilable models.

Note

PySEP Configuration parameters

Parameters:

config_file (str) – path to YAML configuration file which will be used to overwrite internal default parameters. Used for command-line version of PySEP
log_level (str) – Level of verbosity for the internal PySEP logger. In decreasing order of verbosity: ‘DEBUG’, ‘INFO’, ‘WARNING’, ‘CRITICAL’
legacy_naming (bool) – if True, revert to old PySEP naming schema for event tags, which is responsible for naming the output directory and SAC files. Legacy filenames look something like ‘20000101000000.NN.SSS.LL.CC.c’ (event origin time, network, station, location, channel, component). Default to False
overwrite_event_tag (str or bool) –
option to allow the user to set their own event tag, rather than the automatically generated one.
- NoneType (default): use automatically generated event tag which
  consists of event origin time and Flinn-Engdahl region
- ’’: empty string will dump ALL files into output_dir, no new
  directories will be made
- str: User-defined event tag which will be created in output_dir,
  all files will be stored in {output_dir}/{overwrite_event_tag}/*

Note

Output file and figure control

Parameters:

write_files (str or NoneType) –
Which files to write out after data gathering.
1. User-defined comma-separated list of the following
  - weights_az: write out CAP weight file sorted by azimuth
  - weights_dist: write out CAP weight file sorted by distance
  - weights_code: write out CAP weight file sorted by station code
  - station_list: write out a text file with station information
  - inv: save a StationXML (.xml) file (ObsPy inventory)
  - event: save a QuakeML (.xml) file (ObsPy Catalog)
  - stream: save an ObsPy stream in Mseed (.ms) (ObsPy Stream)
  - config_file: save YAML config file w/ all input parameters
  - sac: save all waveforms as SAC (.sac) files w/ correct headers
  - sac_raw: save raw waveforms. these are straight from the
    data center with no quality check and no SAC headers
  - sac_zne: save only ZNE channel SAC files
  - sac_rtz: save only RTZ channel SAC files
  - sac_uvw: save only UVW channel SAC files
Example input: `write_files`==’inv,event,stream,sac’ By Default: ‘inv,event,stream,sac,config_file,station_list’ 2) If NoneType or an empty string, no files will be written. 3) If ‘all’, write all files listed in (1)
write_files –
What to plot after data gathering. Should be a comma-separated list of the following:
- map: plot a source-receiver map with event and all stations
- record_section: plot a record section with default parameters
- all: plot all of the above (default value)
If None, no files will be plotted.
output_dir (str) – path to output directory where all the files and figures defined by write_files and plot_files will be stored. Defaults to the current working directory.
overwrite (bool) – If True, overwrite an existing PySEP event directory. This prevents Users from re-downloading data. Defaults to False.

check()[source]#: Check input parameter validity against expected Pysep behavior

get_client()[source]#

Options to choose different Clients based on attribute client which will be used to gather waveforms and metadata

Return type:: obspy.clients.fdsn.client.Client
Returns:: Client used to gather waveforms and metadata

load(config_file=None, overwrite_event=True)[source]#

Overwrite default parameters using a YAML config file

Parameters:

config_file (str) – YAML configuration file to load from
overwrite_event (bool) – overwrite event search parameters (origin time, lat, lon etc.) from the YAML config file. Defaults to True

get_event()[source]#

Exposed API for grabbing event metadata depending on the event_selection choice.

Options for event_selection are:: ‘search’: query FDSN with event parameters ‘default’: create an event from scratch using user parameters

or if ‘client’==’LLNL’, grab event from internal database

Return type:: obspy.core.event.Event
Returns:: Matching event given event criteria. If multiple events are returned with the query, returns the first in the catalog

_query_event_from_client(magnitude_buffer=0.1, depth_buffer_km=1.0)[source]#

Retrieve an event catalog using ObsPy Client.get_events(). Searches Client for a given origin time, location, depth (optional) and magnitude (optional).

To use this, set attribute `event_selection`==’search’

Parameters:

magnitude_buffer (float) – if attribute event_magnitude is given, will search events for events with magnitude: event_magnitude +/- magnitude_buffer
depth_buffer_km (float) – if attribute event_depth_km is given, will search events for events with depth: event_depth_km +/- depth_buffer_km

Return type:

obspy.core.event.Event

Returns:

Matching event given event criteria. If multiple events are returned with the query, returns the first in the catalog

_create_event_from_scratch()[source]#

Make a barebones event object based on user-defined parameters which will then be used to query for waveforms and StationXML data

Return type:: obspy.core.event.Event
Returns:: Event object with origin and magnitude information appended

_get_event_from_llnl_catalog()[source]#

Special getter function for Lawrence Livermore National Lab data LLNL database has a special client

TODO Do we need more filtering in the catalog?

Return type:: obspy.core.event.Event
Returns:: Event information queried from LLNL database

get_stations()[source]#

Exposed API for grabbing station metadata from client. Download station metadata using ObsPy get_stations() with a user-defined bounding box and for user-defined networks, stations etc.

Return type:: obspy.core.inventory.Inventory
Returns:: Station metadata queried from Client

get_waveforms()[source]#

Exposed API for grabbing waveforms from client. Internal logic determines how waveforms are queried, but mainly it is controlled by the internal inv attribute detailing station information, and reference times for start and end times.

Note

We do not use the minimumlength variable so that we can figure out which stations have data gaps

Return type:: obspy.core.stream.Stream
Returns:: Stream of channel-separated waveforms

_bulk_query_waveforms_from_client()[source]#

Make a bulk request query to the Client based on the internal inv attribute defining the available station metadata.

Return type:: obspy.core.stream.Stream
Returns:: Stream of channel-separated waveforms

mass_download()[source]#

Use ObsPy Mass downloader to grab events from a pre-determined region

Keyword Arguments:

domain_type (str) –
How to define the search region domain - rectangular: rectangular bounding box defined by min/max

latitude/longitude
- circular: circular bounding circle defined by the events latitude and longitude, with radii defined by mindistance_km and maxdistance_km
delete_tmpdir (bool) – Remove the temporary directories that store the MSEED and StationXML files which were downloaded by the mass downloader. Saves space but also if anything fails prior to saving data, the downloaded data will not be saved. Defaults to True.

curtail_stations()[source]#

Remove stations from inv based on station distance, azimuth, etc.

Note

One-function function currently, but we can expand curtailing here if need by

Return type:: obspy.core.inventory.Inventory
Returns:: station metadata that has been curtailed based on acceptable paramaters

preprocess()[source]#

Very simple preprocessing to remove response and apply a prefilter scale waveforms (if necessary) and clean up waveform time series

Return type:: obspy.core.stream.Stream
Returns:: a preprocessed stream with response removed, amplitude scaled (optional), and time series standardized

_remove_response_llnl(st)[source]#: Remove response information from LLNL stations. This requires using the custom LLNL DB client. There are also some internal checks that need to be bypassed else they cause the program to crash

rotate_streams()[source]#

Rotate arbitrary three-component seismograms to desired orientation ‘ZNE’, ‘RTZ’ or ‘UVW’.

Warning

This function combines all traces, both rotated and non-rotated components (ZNE, RTZ, UVW, but not raw, e.g., 12Z), into a single stream. This is deemed okay because we don’t do any component-specific operations after rotation.

Return type:: obspy.core.stream.Stream
Returns:: a stream that has been rotated to desired coordinate system with SAC headers that have been adjusted for the rotation, as well as non-rotated streams which are saved incase user needs access to other components

write(write_files=None, _return_filenames=False, _subset=None, **kwargs)[source]#

Write out various files specifying information about the collected stations and waveforms.

Options are:

config_file: write the current configuration as a YAML file

station_list: write a text file with station information

inv: write the inventory as a StationXML file

event: write the event as a QuakeML file

stream: write the stream as a single MSEED file

sac_zne: write the stream as individual (per-channel) SAC files
for ZNE components with the appropriate SAC header

sac_rtz: write out per-channel SAC files for RTZ components

sac_uvw: write out per-channel SAC files for UVW components

weights_dist: write out CAP ‘weights.dat’ file sorted by distance

weights_az: write out CAP ‘weights.dat’ file sorted by azimuth

weights_code: write out CAP ‘weights.dat’ file sorted by sta code

Parameters:

write_files (list of str) – list of files that should be written out, must match the acceptable list defined in the function or here in the docstring. If not given, defaults to internal list of files
_return_filenames (bool) – internal flag to not actually write anything but just return a list of acceptable filenames. This keeps all the file naming definitions in one function. This is only required by the check() function.
_subset (list) – internal parameter used for intermediate file saving. PySEP will attempt to save files once they have been collected however if the files it tries to save do not match against the User-defined file list, they will be ignored.

Keyword Arguments:

order_station_list_by (str) – how to order the station list available options are: network, station, latitude, longitude, elevation, burial.
config_fid (str) – optional name for the configuration file name defaults to ‘pysep_config.yaml’
station_fid (str) – optional name for the stations list file name defaults to ‘station_list.txt’
inv_fid (str) – optional name for saved ObsPy inventory object, defaults to ‘inv.xml’
event_fid (str) – optional name for saved ObsPy Event object, defaults to ‘event.xml’
stream_fid (str) – optional name for saved ObsPy Stream miniseed object, defaults to ‘stream.ms’
sac_subdir (str) – sub-directory within output directory and event directory to save SAC files. Defaults to SAC/. Use an empty string to dump files directly into the event directory

_write_sac(st, output_dir=os.getcwd(), components=None)[source]#

Write SAC files with a specific naming schema, which allows for both legacy (old PySEP) or non-legacy (new PySEP) naming.

Parameters:

st (obspy.core.stream.Stream) – Stream to be written
output_dir (str) – where to save the SAC files, defaults to the current working directory
components (str) – acceptable component values for saving files, allows only saving subsets of the Stream. Example ‘RTZNE’ or just ‘R’. Must match against Trace.stats.component

write_config(fid=None, overwrite=False)[source]#

Write a YAML config file based on the internal Pysep attributes. Remove a few internal attributes (those containing data) before writing and also change types on a few to keep the output file simple but also re-usable for repeat queries.

Parameters:

fid (str) – name of the file to write. defaults to config.yaml
overwrite (bool) – if True and fid already exists, save a new config file with the same name, overwriting the old file. if False (default), throws a warning if encountering existing fid and does not write config file

plot()[source]#: Plot map and record section if requested. Allow general error catching for mapping and record section plotting because we don’t want these auxiliary steps to crash the entire workflow since they are not critical.

_event_tag_and_output_dir()[source]#

Convenience function to establish and naming schema for files and directories. Also takes care of making empty directories.

Return type:: tuple of str
Returns:: (unique event tag, path to output directory)

_set_log_file(mode)[source]#

Write logger to file as well as stdout, with the same format as the stdout logger. Need mode==1 to move the log file after everything is done because we don’t know the event tag prior to starting the logs

Parameters:: mode (int) – Two options for using this function 0: set the logger to a temporary file ‘pysep.log’, 1: move the logger from the temporary file into final output dir

run(event=None, inv=None, st=None, **kwargs)[source]#

Run PySEP: Seismogram Extraction and Processing. Steps in order are:

Set default parameters or load from config file

Check parameter validity, exit if unexpected values

Get data and metadata (QuakeML, StationXML, waveforms)

Remove unacceptable stations based on user-defined criteria

Remove unacceptable waveforms based on user-defined criteria

Generate some new metadata for tagging and output

Pre-process waveforms and standardize for general use

Generate output files and figures as end-product

Parameters:

event (obspy.core.event.Event) – optional user-provided event object which will force a skip over QuakeML/event searching
inv (obspy.core.inventory.Inventory) – optional user-provided inventory object which will force a skip over StationXML/inventory searching
st (obspy.core.stream.Stream) – optional user-provided strean object which will force a skip over waveform searching

pysep.get_data(config_file=None, event=None, inv=None, st=None, write_files=None, plot_files=None, log_level=None, *args, **kwargs)[source]#

Interactive/scripting function to run PySep and return quality controlled, SAC-headed stream object which can then be used for other processes.

Note

By default turns file writing and plotting OFF so that this function acts solely as a data collection/processing call.

Note

args and kwargs are passed directly to Pysep.__init__() so you can define all your parameters in this call, or through a config file

>>> from pysep import get_data >>> st = get_data(config_file=’config.yaml’)

Parameters:

config_file (str) – path to YAML config file which will overload any default configs
event (obspy.core.event.Event) – optional user-provided event object which will force a skip over QuakeML/event searching
inv (obspy.core.inventory.Inventory) – optional user-provided inventory object which will force a skip over StationXML/inventory searching
st (obspy.core.stream.Stream) – optional user-provided strean object which will force a skip over waveform searching
write_files (list or None) – list of files to write, acceptable options defined in write(). Defaults to None, no files will be written
plot_files (list or None) – list of files to plot, acceptable options defined in plot(). Defaults to None, no figures will be made
log_level (str or None) – verbosity of logger. Defaults to no logging to mimic a standard function call rather than a standalone package

Return type:

tuple of (obspy.core.event.Event, obspy.core.inventory.Inventory, obspy.core.stream.Stream)

Returns:

returns obspy objects defining data and metadata that have been collected by PySEP

class pysep.RecordSection(pysep_path=None, syn_path=None, stations=None, cmtsolution=None, st=None, st_syn=None, sort_by='default', scale_by=None, time_shift_s=None, zero_pad_s=None, move_out=None, min_period_s=None, max_period_s=None, preprocess=None, max_traces_per_rs=None, integrate=0, xlim_s=None, components='ZRTNE12', y_label_loc='default', y_axis_spacing=1, amplitude_scale_factor=1, azimuth_start_deg=0.0, distance_units='km', tmarks=None, geometric_spreading_factor=0.5, geometric_spreading_k_val=None, geometric_spreading_exclude=None, geometric_spreading_ymax=None, geometric_spreading_save=None, figsize=(9, 11), show=True, save='./record_section.png', overwrite=False, log_level='DEBUG', cartesian=False, synsyn=False, **kwargs)[source]#

Record section plotting tool which takes ObsPy streams and:

preprocesses and filters waveforms,
sorts source-receiver pairs based on User input,
produces record section waveform figures.

Note

Used for reading in Pysep-generated waveforms

Parameters:: pysep_path (str) – path to Pysep output, which is expected to contain trace-wise SAC waveform files which will be read in.

Note

Used for reading in SPECFEM-generated synthetic waveforms

Parameters:

syn_path (str) – full path to directory containing synthetic seismograms that have been outputted by SPECFEM. The synthetics are expected in the format: ‘??..*.sem?*’, which generally matches SPECFEM files like ‘NZ.BFZ.BXE.semd’
stations (str) – full path to STATIONS file used to define the station coordinates. Format is dictated by SPECFEM
cmtsolution (str) – required for synthetics, full path to SPECFEM source file, which was used to generate SPECFEM synthetics. Example filenames are CMTSOLUTION, FORCESOLUTION, SOURCE.
cartesian (bool) – lets RecSec know that the domain is set in Cartesian coordinates (SPECFEM2D or SPECFEM3D_CARTESIAN in UTM), such that data/metadata reading will need to adjust acoordingly because the ObsPy tools used to read metadata will fail for domains defined in XYZ
synsyn (bool) – flag to let RecSec know that we are plotting two sets of synthetic seismograms. Such that both pysep_path and syn_path will be both attempt to read in synthetic data. Both sets of synthetics MUST share the same cmtsolution and stations metadata

Note

Used for defining user-input waveforms data

Parameters:

st (obspy.core.stream.Stream) – Stream objects containing observed time series to be plotted on the record section. Can contain any number of traces
st_syn (obspy.core.stream.Stream) – Stream objects containing synthetic time series to be plotted on the record section. Must be same length as st

Note

Waveform plotting organization parameters

Parameters:

sort_by (str) –
How to sort the Y-axis of the record section, available:
- ’default’: Don’t sort, just iterate directly through Stream
- ’alphabetical’: sort alphabetically A->Z. Components sorted
  separately with parameter components
- ’azimuth’: sort by source-receiver azimuth (deg) with constant
  vertical spacing on the y-axis. Requires azimuth_start_deg
- ’backazimuth’: sort by source-receiver backazimuth (deg) with
  constant vertical spacing. Requires azimuth_start_deg
- ’distance’: sort by source-receiver distance (km) with constant
  
  vertical spacing. Smallest distances at the top of the figure.
  Requires distance_units
- ’abs_distance’: absolute vertical spacing of waveforms defined by
  source-receiver distance. Smallest distance at the top of the figure. Requires distance_units
- ’abs_azimuth’: absolute vertical spacing of waveforms defined
  by source-receiver azimuth (deg).
- ’abs_backazimuth’: absolute vertical spacing of waveforms by
  source-receiver backazimuth (deg).
- ’*_r’: Add a ‘_r’ to any of the values about to REVERSE the sort,
  e.g., ‘alphabetical_r’ sort will go Z->A
scale_by (str) –
scale amplitude of waveforms by available:
- None: Not set, no amplitude scaling, waveforms shown raw
- ’normalize’: scale each trace by the maximum amplitude,
  i.e., > a /= max(abs(a)) # where ‘a’ is time series amplitudes
- ’global_norm’: scale by the largest amplitude to be displayed on
  the screen. Will not consider waveforms which have been excluded on other basis (e.g., wrong component) i.e., > st[i].max /= max([max(abs(tr.data)) for tr in st])
- ’geometric_spreading’: scale amplitudes by expected reduction
  through geometric spreading. Related parameters are:
  - geometric_spreading_factor
  - geometric_spreading_k_val
  - geometric_spreading_exclude
  - geometric_spreading_ymax A
  Equation is A(d) = k / sin(d) ** f
  
  Where A(d) is the amplitude reduction factor as a function of distnace, d. ‘k’ is the geometric_spreading_k_val and ‘f’ is the geometric_spreading_factor. ‘k’ is calculated automatically if not given.
time_shift_s (float OR list of float OR str) –
apply static time shift to waveforms, two options:
1. float (e.g., -10.2), will shift ALL waveforms by
  that number (i.e., -10.2 second time shift applied)
2. list (e.g., [5., -2., … 11.2]), will apply individual time
  shifts to EACH trace in the stream. The length of this list MUST match the number of traces in your input stream.
3. str: apply time shift based on a theoretical TauP phase arrival
  if available in the SAC header. These should have been appended by PySEP during data download. If no value is available in the SAC header, defaults to 0. This may have unintended consequences so you should manually check that all values are okay. Available options are: - ‘first_arrival_time’: shift based on earliest phase arrival - ‘p_arrival_time’: shift based on earliest P phase arrival - ‘s_arrival_time’: shift based on earliest S phase arrival
zero_pad_s (list) –
zero pad data in units of seconsd. applied after tapering and before filtering. Input as a tuple of floats,
- (start, end): a list of floats will zero-pad the START and END
  of the trace. Either can be 0 to ignore zero-padding at either end
amplitude_scale_factor (float OR list of float) –
apply scale factor to all amplitudes. Used as a dial to adjust amplitudes manually. Defaults to 1. Two options:
1. float (e.g., 1.2), will multiply ALL waveforms by that number
2. list (e.g., [5., -2., … 11.2]), will apply individual amplitude
  scale to EACH trace in the stream. The length of this list MUST match the number of traces in your input stream.
move_out (float) – Optional. A velocity value that will be used to calculate move out, which will time shift seismograms based on their source receiver distance. This parameter will be ADDED to time_shift_s (both float and list), if it is provided. Should be in units of distance_units/s
azimuth_start_deg (float) – If sorting by azimuth, this defines the azimuthal angle for the waveform at the top of the figure. Set to 0 for default behavior
distance_units (str) – Y-axis units when sorting by epicentral distance ‘km’: kilometers on the sphere ‘deg’: degrees on the sphere ‘km_utm’: kilometers on flat plane, UTM coordinate system

Note

Data processing parameters

Parameters:

preprocess (str) –
choose which data to preprocess, options are:
- None: do not run preprocessing step, including filter (Default)
- ’st’: process waveforms in st
- ’st_syn’: process waveforms in st_syn. st still must be given
- ’both’: process waveforms in both st and st_syn
min_period_s (float) – minimum filter period in seconds
max_period_s (float) – maximum filter period in seconds
components (str) – a sequence of strings representing acceptable components from the data. Also determines the order these are shown EVEN when sorted by other variables. For example, components==’ZR’ would only display Z and R components, and Z components would be should BEFORE R components for the SAME station.
integrate (int) –
apply integration integrate times on all traces. acceptable values [-inf, inf], where positive values are integration and negative values are differentiation

e.g., if integrate == 2, will integrate each trace twice. or if integrate == -1, will differentiate once or if integrate == 0, do nothing (default)

Note

Geometric spreading parameters, used for amplitude scaling

Parameters:

geometric_spreading_factor (float) – factor to scale amplitudes by predicting the expected geometric spreading amplitude reduction and correcting for this factor. Related optional parameter: geometric_spreading_k_val. For Rayleigh waves, geometric_spreading_factor == 0.5 (default)
geometric_spreading_k_val (float) – Optional constant scaling value used to scale the geometric spreading factor equation. If not given, calculated automatically using max amplitudes Value should be between 0.5 and 1.0 for regional surface waves.
geometric_spreading_exclude (list) – a list of station names that should match the input stations. Used to exclude stations from the automatic caluclation of the geometric
geometric_spreading_ymax (float) – Optional value for geometric spreading plot. Sets the max y-value on the plot. If not set, defaults to whatever the peak y-value plotted is.
geometric_spreading_save (str) – file id to save separate geometric spreading scatter plot iff `scale_by`==’geometric_spreading’. If NoneType, will not save. By default, turned OFF

Note

Figure generation control parameters

Parameters:

max_traces_per_rs (int) – maximum number of traces to show on a single record section plot. Defaults to all traces in the Stream
xlim_s (list of float) – [start, stop] in units of time, seconds, to set the xlimits of the figure
y_axis_spacing (float) – spacing between adjacent seismograms applied to Y-axis on relative (not absolute) scales. Defaults to 1.
y_label_loc (str) –
Location to place waveform labels on the y-axis
- ’default’: auto choose the best location based on sort_by
- ’y_axis’: Replace tick labels on the y-axis (left side of figure),
  This won’t work if using absolute sorting and will be over- written by ‘default’
- ’y_axis_abs’: For absolute y-axis only. waveform labels plotted
  on the left side outside border, with y-axis labels overlapping (showing distance or azimuth)
- ’y_axis_abs_right’: For absolute y-axis only. waveform labels
  plotted on the right side outside border, with y-axis labels on the left side of the figure (showing distance or azimuth)
- ’y_axis_right’: Replace tick labels on the right side of the
  y-axis. This option won’t work with absolute sorting
- ’x_min’: Place labels on top of the waveforms at the global min
  x-value on the figure
- ’x_max’: Place labels on top of the waveforms at the global max
  x-value on the figure
- None: Don’t plot any text labels
tmarks (list of float) – place vertical lines at given reference times. Used for marking reference times such as the event origin, or phase arrival. Input as a list of times in units of seconds (where T=0 is the event origin time). For example `tmarks`=[0, 100, 200] would set vertical lines at 0s, 100s and 200s
figsize (tuple of float) – size the of the figure, passed into plt.subplots()
show (bool) – show the figure as a graphical output
save (str) – path to save output figure, will create the parent directory if it doesn’t exist. If None, will not save (default).

Note

Internal RecSec parameters

Parameters:

overwrite (bool) – if the path defined by save exists, will overwrite the existing figure
log_level (str) – level of the internal logger. In order of ascending verbosity: ‘CRITICAL’, ‘WARNING’, ‘INFO’, ‘DEBUG’.

Raises:

AssertionError – if any parameters are set incorrectly

_generate_synthetic_stream(syn_path, source, stations, cartesian=False, fmt='*.*.*.sem?*')[source]#

Convenience fucntion to read in synthetic seismograms from SPECFEM2D, SPECFEM3D or SPECFEM3D_GLOBE. Can be used to read in both st and st_syn

Parameters:

syn_path (str) – full path to directory containing synthetic seismograms
source (str) – path to source file which defined the source that generated the synthetics. Acceptable values are CMTSOLUTION (from SPECFEM3D/GLOBE), and SOURCE (from SPECFEM2D)
stations (str) – full path to STATIONS file used to define the station coordinates. Format is dictated by SPECFEM
fmt (str) – the expected filename format of the sythetics. Based on ASCII style files generated by SPECFEM. Defaults to ‘??..*.sem?*’. Expected filename looks something like: ‘NN.SSS.BXC.semd’

Return type:

obspy.core.stream.Stream

Returns:

Stream object with synthetic waveforms

check_parameters()[source]#

Check that parameters are set properly and in line with how they are expected by the program

Note

Not using assertions here because we want all incorrect parameters to be evaluated together and displayed at once, that way the user doesn’t have to run this function multiple times to figure out how to set their parameters correctly

Raises:: AssertionError – If any parameters are not set as expected by plotw_rs functionality

get_skip_idx()[source]#

Get a list of any traces that don’t adhere to user-defined boundaries such as dist, az, baz, id, or component matches. Don’t actually remove the traces from the stream but rather just collect indices we can use to skip when plotting.

TODO add distance, azi and backazi skip criteria

Return type:: np.array
Returns:: returns an indexing list which can be used to skip over traces that don’t adhere to certain criteria

get_parameters()[source]#

Calculate parameters in a specific order and based on the user-defined information.

Note

The order of function calls here is important! Some of the ‘get’ functions require the results of other ‘get’ functions.

Calculated Parameters

np.array idx:
    a linear indexing of all the traces in the stream
np.array station_ids:
    an ordered list of station ids, used to get station names
    that match the index defined in `idx`
np.array max_amplitudes:
    abs max amplitudes of each trace, used for normalization
np.array amplitude_scaling:
    An array to scale amplitudes based on user choices
np.array time_shift_s:
    An array to time shift time series based on user choices
np.array y_axis:
    Y-Axis values based on sorting algorithm, used for plotting
np.array distances:
    source-receiver distances in `distance_units` units
np.array azimuths:
    source-receiver azimuths in degrees
np.array backazimuths:
    source-receiver backazimuths in degrees
np.array sorted_idx:
    sorted indexing on all the traces of the stream based on the
    chosen sorting algorithm

get_xlims(st=None)[source]#

The x-limits of each trace depend on the overall time shift (either static or applied through move out), as well as the sampling rate of each trace (which can vary). Retrieve an index-dependent list of x-limits which can be used to truncate the time series during plotting.

Note

Requires that get_time_shifts() has already been run

Parameters:: st (obspy.core.stream.Stream) – stream object to get xlims for. By default this is the ‘data’ stored in st but it can also be given st_syn to get synthetic x limits which may differ
Return type:: np.array
Returns:: an array of tuples defining the start and stop indices for EACH trace to be used during plotting. Already includes time shift information so xlim can be applied DIRECTLY to the time shifted data

get_srcrcv_stats()[source]#

Get source receiver information such as min max values, and count-related numbers (e.g., num stations) to be used mainly for print statements and text information

Stats Arguments

np.array event_names:
    unique event names taken from the SAC header
int nevents:
    number of unique events in the stream
np.array unique_sta_ids:
    unique station codes taken from trace stats
int nstation_ids:
    number of unique station codes
np.array network_codes:
    unique network codes taken from station ids
int nnetwork:
    number of unique network codes
np.array station_codes:
    unique station codes taken from station ids
int nstation:
    number of unique station codes
np.array location_codes:
    unique location codes taken from station ids
int nlocation:
    number of unique location codes
np.array channel_codes:
    unique channel codes taken from station ids
int nchannel:
    number of unique channel codes
bool reverse_sort:
    determine if the user wants to reverse their sort, they do this
    by appending '_r' to the end of the `sort_by` argument

get_time_offsets()[source]#: Find time shift to data constituting difference between trace start time and event origin time. Both synthetics and data should have the correct timing. Time offsets will be used during plotting to make sure all traces have the same T=0

Note

Appends time offset directly to the stats header, overwriting any value that might have been there already

get_time_shifts()[source]#

Very simple function which allows float inputs for time shifts and ensures that time shifts are always per-trace arrays Applies the move out by calculating a time shift using src-rcv distance

Return type:: np.array
Returns:: a stream-lengthed array of time shifts that can be applied per trace

get_srcrcv_dist_az_baz()[source]#

Convenience function to wrap _get_srcrcv_dist_az_baz_trace into a loop over the whole stream and return lists of distances, azimuths, and backazimuths

Rtype distances:: np.array
Return distances:: source-receiver distances in user-defined units in the original order of Stream
Rtype azimuths:: np.array
Return azimuths:: source-receiver azimuths (deg) in the original order of Stream
Rtype backazimuths:: np.array
Return backazimuths:: source-receiver azimuths (deg) in the original order of Stream

_get_srcrcv_dist_az_baz_trace(tr=None, idx=0)[source]#

Check the source-receiver characteristics such as src-rcv distance, azimuth, backazimuth for a given trace.

Note

This function ASSUMES that SAC headers have been written to the traces. Otherwise we will need more complicated ways to get event lat and lon

Parameters:

tr (obspy.core.trace.Trace) – trace to get srcrcv information for. If None, will use idx
idx (int) – if `tr`==None, user can provide index of self.st (Stream) defaults to index 0

Rtype gcdist:

float

Return gcdist:

great circle distance in units specified by distance_units, can be ‘km’ or ‘deg’

Rtype az:

float

Return az:

azimuth (degrees) between source and receiver

Rtype baz:

float

Return baz:

azimuth (degrees) between source and receiver

get_amplitude_scaling(_choice='st')[source]#

Scale the amplitudes of all the waveforms by producing a Stream dependent scale factor based on user choice. It is expected that the output array will be DIVIDED by the data arrays:

i.e., st[i].data /= self.amplitude_scaling[i]

Note

Needs to be run AFTER preprocessing because filtering etc. will affect the final amplitudes of the waveforms

Parameters:: _choice (str) – Internal choice only required for ‘normalize’ scaling. st for amplitude scaling w.r.t data array, st_syn for amplitude scaling w.r.t synthetics
Return type:: np.array
Returns:: an array corresponding to the Stream indexes which provides a per-trace scaling coefficient

calculate_geometric_spreading()[source]#

Stations with larger source-receiver distances will have their amplitude scaled by a larger value.

For information on geometric spreading, see Stein and Wysession, Section 4.3.4, Eq 20 (for Rayleigh waves, geometrical spreading factor = 0.5). For our purposes, this will fold the attenuation factor into the same single factor that accounts for geometric spreading.

Equation is for amplitude reduction ‘A’ as a factor of distance ‘d’:

A(d) = k / sin(d) ** f

where ‘k’ is the geometric_spreading_k_val and ‘f’ is the geometric_spreading_factor. ‘k’ is calculated automatically if not given by the User. In the code, A(d) is represented by w_vector

Note

This does not take into account the variation in amplitude from focal mechanism regions

TODO

Look in Stein and Wysession and figure out vector names
Allow ignoring specific components, currently only allowed to
exclude on a per-station basis

Return type:: list
Returns:: scale factor per trace in the stream based on theoretical geometrical spreading factor. This is meant to be MULTIPLIED by the data arrays

get_sorted_idx()[source]#

Sort the source-receiver pairs by the chosen parameters. Except for in default sorting, always maintains component ordering defined by the user, i.e., for the same station, components will be ordered by the component parameter.

Return type:: np.array
Returns:: returns an indexing list which is sorted based on the user defined sort_by argument.

_sort_by_alphabetical()[source]#

Sort by full station name in order. That is, network is sorted, then within network, station is sorted, and so on. Components are sorted last and NOT in alphabetical order. They are ordered based on the user-input components parameter.

Return type:: np.array
Returns:: indexing of networks and stations sorted alphabetically

_get_component_order()[source]#

When we are sorting, we want components to be sorted by the preferred order (i.e, ZRT, Z index is 0, T is 2), which means the components have to be reordered to adhere to the sorting algorithm. ALSO we need to ensure that we honor component order even if everything else is being sorted reverse alphabetically so the components entry needs to be the opposite of stats.reverse_sort

Return type:: list
Returns:: components converted to integer values which can be used for sorting algorithms.

get_y_axis_positions()[source]#

Determine how seismograms are vertically separated on the Y-Axis when plotting. Allows for constant separation, or absolute separation based on distance or (back)azimuth separation.

Return type:: np.array
Returns:: an array of actual y-axis positions that can be passed directly to plt.plot() (or similar)

get_x_axis_tick_values()[source]#: Determine, based on the length of the plotted traces, how often tick marks should be applied so that they are spaced as aesthetically pleasing values.

process_st()[source]#

Preprocess the Stream with optional filtering in place.

Note

Data in memory will be irretrievably altered by running preprocess.

TODO Add feature to allow list-like periods to individually filter: seismograms. At the moment we just apply a blanket filter.

plot(subset=None, page_num=None, **kwargs)[source]#

Plot record sections based on all the available information

Parameters:

subset (list of int) – subset of sorted_idx if there are too many waveforms to plot on one page (set by max_traces_per_rs). e.g., to get the first 10 entries, subset=[0,10]
page_num (int) – If multiple pages are required due to exceeding max_traces_per_rs, page_num will make adjustments to the figure name and title to differentiate different pages of the same record section

_log_relative_information(start=0, stop=None)[source]#

If both st and st_syn are being plotted, then write some relative information about their amplitudes to the main logger.

Related to #116

_plot_trace(idx, y_index, choice='st')[source]#

Plot a single trace on the record section, with amplitude scaling, time shifts, etc. Observed waveforms are black, synthetics are red.

Parameters:

idx (int) – index of the trace to plot and all trace-ordered values like amplitude scaling and time shifts
y_index (int) – numerical order which this trace was plotted, as each trace is plotted on top of the previous one. y_index should be iterated linearly in the loop that calls this function.
choice (str) – choice of ‘st’ or ‘st_syn’ depending on whether you want to plot the observed or synthetic waveforms

_plot_tmarks()[source]#: Plot vertical lines at given reference times based on user input values

_plot_azimuth_bins(start=None, stop=None)[source]#

If plotting by azimuth, create visual bin separators so the user has a visual understanding of radiation patterns etc.

Parameters:

start (int) – optional starting index to determine the bounds of the azimuth bins
stop (int) – optional stop index to determine the bounds of the azimuth bins

_plot_axes(start=0, stop=None)[source]#

Contains the logic in how to handle the x- and y-axis labels, ticks etc.

Logic options for how to plot the y-axis: - Relative: Relative plotting means the yticklabels will get replaced

by station information. This can either be on the right or left side but will be attached to the actual axis

Absolute: Absolute plotting means the y-axis actual represents
something (e.g., distance, azimuth). That means labels can not replace the y-ticks and they have to be placed within the figure

Note

Relative plotting can also place labels OFF the y-axis, at which point the y-axis is turned off because it contains no usable information

Parameters:

start (int) – optional starting index for creating text labels
stop (int) – optional stop index for creating text labels

_set_y_axis_absolute()[source]#: If ‘abs_’ in sort_by, then the Y-axis should be shown in absolute scale. That means we need to format the text labels, add some labelling etc.

_set_y_axis_text_labels(start=0, stop=-1, loc='y_axis')[source]#

Plot a text label next to each trace describing the station, azimuth and source-receiver distance. We need to do this all at once because we have to replace all ticks and tick labels at once.

Note

if using the ‘subset’ option in plot, need to also tell the y-axis plotter that we are only plotting a subset of data by using the start and stop parameters

Parameters:

start (int) – starting index for plotting, default to start 0
stop (int) – stop index for plotting, default to end -1
loc (str) –
location to place the y_axis text labels, available: - y_axis: Place labels along the y-axis (left side of the figure)

Will replace the actual y-tick labels so not applicable for absolute sorting which requries the y-axis labels
- y_axis_abs: for absolute plotting, place waveform labels to the
  left side of the figure (outside border), co-existing with y-axis labels
- y_axis_right: same as y_axis but set on the right side of figure
- x_min: Place labels on the waveforms at the minimum x value
- x_max: Place labels on the waveforms at the maximum x value

_plot_title(nwav=None)[source]#

Create the title of the plot based on event and station information Allow dynamic creation of title based on user input parameters

Parameters:: nwav (int) – if using subset, the title needs to know how many waveforms it’s showing on the page. self.plot() should tell it

run()[source]#: Convenience run function to run the RecordSection workflow in order. No internal logic for breaking the figure up into multiple record sections. For that see main function plotw_rs.

pysep.plotw_rs(*args, **kwargs)[source]#

Plot Waveform Record Section (main call function for RecordSection)

Instantiates the RecordSection class, runs processing and parameter getting, and then plots record section. Contains additional logic for breaking up figures into multiple pages if requested by the user, while keeping sort order and waveform spacing consistent across multiple reord sections.

Note

See RecordSection.__init__() for acceptable args and kwargs

pysep#

Subpackages#

Submodules#

Package Contents#

Classes#

Functions#

Attributes#

`pysep`#