pysep
#
Subpackages#
Submodules#
Package Contents#
Classes#
Download, preprocess, and save waveform data using ObsPy |
|
Record section plotting tool which takes ObsPy streams and: |
Functions#
Attributes#
- class pysep.Pysep(config_file=None, event_selection='default', client='IRIS', origin_time=None, reference_time=None, networks='*', stations='*', locations='*', channels='*', station_ids=None, use_mass_download=False, extra_download_pct=0.005, event_latitude=None, event_longitude=None, event_depth_km=None, event_magnitude=None, remove_response=True, remove_clipped=True, remove_insufficient_length=True, remove_masked_data=True, water_level=60, detrend=True, demean=True, taper_percentage=0.0, rotate=None, pre_filt='default', fill_data_gaps=False, gap_fraction=1.0, mindistance_km=0, maxdistance_km=20000.0, minazimuth=0, maxazimuth=360, minlatitude=None, minlongitude=None, maxlatitude=None, maxlongitude=None, resample_freq=None, scale_factor=1, phase_list=None, seconds_before_event=20, seconds_after_event=20, seconds_before_ref=100, seconds_after_ref=300, taup_model='ak135', output_unit='VEL', user=None, password=None, client_debug=False, log_level='DEBUG', timeout=600, write_files='inv,event,stream,sac,config_file,station_list', plot_files='all', llnl_db_path=None, output_dir=None, overwrite=False, legacy_naming=False, overwrite_event_tag=None, **kwargs)[source]#
Download, preprocess, and save waveform data using ObsPy
Note
Parameters for general data gathering control
- Parameters:
client (str) – ObsPy FDSN client to query data from, e.g., IRIS, LLNL, NCEDC or any FDSN clients accepted by ObsPy. Defaults to ‘IRIS’
minlatitude (float) – for event, station and waveform retrieval. Defines the minimum latitude for a rectangular bounding box that is used to search for data. Only used for events if `event_selection`==’search’
maxlatitude (float) – for event, station and waveform retrieval. Defines the maximum latitude for a rectangular bounding box that is used to search for data. Only used for events if `event_selection`==’search’
minlongitude (float) – for event, station and waveform retrieval. Defines the minimum longitude for a rectangular bounding box that is used to search for data. Only used for events if `event_selection`==’search’
maxlongitude (float) – for event, station and waveform retrieval. Defines the maximum longitude for a rectangular bounding box that is used to search for data. Only used for events if `event_selection`==’search’
user (str) – User ID if IRIS embargoes data behind passwords. This is passed into the instantiation of client.
password (str) – Password if IRIS embargoes data behind passwords. This is passed into the instantiation of ‘client’
use_mass_download (bool) – Use ObsPy’s mass download option to download all available stations in the region regardless of data provider.
client_debug (bool) – turn on DEBUG mode for the ObsPy FDSN client, which outputs information-rich log messages to std out. Use for debugging when FDSN fails mysteriously.
timeout (float) – time out time in units of seconds, passed to the client to determine how long to wait for return data before exiting. Defaults to 600s.
llnl_db_path (str) – If `client`==’LLNL’, PySEP assumes we are accesing data from the LLNL waveeform database (which must be stored local). Points to the path where this is saved.
Note
Event selection parameters
- Parameters:
event_selection (str) – How to define the Event which is used to define the event origin time and hypocentral location. - ‘default’: User defines Event origin_time, and location with event_latitude and event_longitude - ‘search’: PySEP will use client to search for a Catalog event defined by event_origintime, event_magnitude and event_depth_km. Buffer time around the origin_time can be defined by seconds_before_event and seconds_after_event.
origin_time (str) – the event origin time used as a central reference point for data gathering. Must be in a string format that is recognized by ObsPy UTCDateTime. For example ‘2000-01-01T00:00:00’.
event_latitude (float) – latitude of the event in units of degrees. used for defining the event hypocenter and for removing stations based on distance from the event.
event_longitude (float) – longitude of the event in units of degrees. used for defining the event hypocenter and for removing stations based on distance from the event.
event_depth_km (float or NoneType) –
depth of event in units of kilometers. postive values for deeper depths. Used for:
`event_selection`==’search’
estimating phase arrivals with TauP
plotting events and title on source receiver maps
If set to None, (2) and (3) will fail. Best-guesses are acceptable.
event_magnitude (float or NoneType) – event magnitude in Mw used for `event_selection`==’search’ and source receiver map plotting. If provided as None, map plotting will fail.
seconds_before_event (float) – For event selection only, only used if event_selection`==’search’. Time [s] before given `origin_time to search for a matching catalog event from the given client
seconds_after_event (float) – For event selection only, only used if event_selection`==’search’. Time [s] after given `origin_time to search for a matching catalog event from the given client
Note
Waveform and station metadata gathering parameters
- Parameters:
reference_time (str) – Waveform origin time. If not given, defaults to the event origin time. This allows for a static time shift from the event origin time, e.g., if there are timing errors with relation to the origin_time. Defaults to NoneType (origin_time).
seconds_before_ref (float) – For waveform fetching. Defines the time before reference_time to fetch waveform data. Units [s]
seconds_after_ref (float) – For waveform fetching. Defines the time after reference_time to fetch waveform data. Units [s]
extra_download_pct (float) – extra download percentage. Adds a buffer around origin_time + seconds_before_ref + extra_download_pct (also -seconds_after_ref), which gathers a bit of extra data which will be trimmed away. Used because gathering data directly at the requested time limits may lead to shorter expected waveforms after resampling or preprocessing procedures. Given as a percent [0,1], defaults to .5%.
networks (str) – name or names of networks to query for, if names plural, must be a comma-separated list, i.e., ‘AK,AT,AV’. Wildcards okay, defaults to ‘*’.
stations (str) – station name or names to query for. If multiple stations, input as a list of comma-separated values, e.g., ‘STA01,STA02,STA03’. Wildcards acceptable, if using wildcards, use a ‘-’ to exclude stations (e.g., ‘,-STA01’ will gather all stations available, except STA01. Defaults to ‘’
locations (str) – locations name or names to query for, wildcard okay. See stations for inputting multiple location values. Default ‘*’.
channels (str) – channel name or names to query for, wildcard okay. If multiple stations, input as a list of comma-separated values, e.g., ‘HH?,BH?’. Wildcards acceptable. Defaults to ‘*’.
station_ids (list of str) – an alternative to gathering based on individual codes, allow the user to input a direct list of trace IDs which will be broken up and used to gather waveforms and metadata. NOTE: OVERRIDES network, stations, locations, and channels, these parameters will NOT be used. Station ids should be provided as: [‘NN.SSS.LL.CCC’, …]
Note
Station removal and curtailing parameters
- Parameters:
mindistance_km (float) –
Used for removing stations and mass download option
Removing stations: Remove any stations who are closer than the
given minimum distance away from event (units: km). Always applied - Mass Download: If use_mass_download is True and `domain_type`==’circular’, defines the minimum radius around the event hypocenter to gather waveform data and station metadata
maxdistance_km (float) –
Used for removing stations and mass download option
Removing stations: Remove any stations who are farther than the
given maximum distance away from event (units: km). Always applied - Mass Download: If use_mass_download is True and `domain_type`==’circular’, defines the maximum radius around the event hypocenter to gather waveform data and station metadata
minazimuth (float) – for station removal. stations whose azimuth relative to the event hypocenter that do not fall within the bounds [minazimuth, maxazimuth] are removed from the final list. Defaults to 0 degrees.
minazimuth – for station removal. stations whose azimuth relative to the event hypocenter that do not fall within the bounds [minazimuth, maxazimuth] are removed from the final list. Defaults to 360 degrees.
remove_clipped (bool) – remove any clipped stations from gathered stations. Checks the max amplitude of against a maximum value expected for a 24 bit signal. Defaults False
remove_insufficient_length (bool) – remove waveforms whose trace length does not match the average (mode) trace length in the stream. Defaults to True
remove_masked_data (bool) – If fill_data_gaps is False or None, data with gaps that go through the merge process will contain masked arrays (essentially retaining gaps). By default, PySEP will remove these data during processing. To keep this data, set remove_masked_data == True.
fill_data_gaps (str or int or float or bool) –
How to deal with data gaps (missing sections of waveform over a continuous time span). False by default, which means data with gaps are removed completely. Users who want access to data with gaps must choose how gaps are filled. See API for ObsPy.core.stream.Stream.merge() for how merge is handled:
Options include:
’mean’: fill with the mean of all data values in the gappy data
<int or float>: fill with a constant, user-defined value, e.g.,
0 or 1.23 or 9.999 - ‘interpolate’: linearly interpolate from the last value pre-gap to the first value post-gap - ‘latest’: fill with the last value of pre-gap data - False: do not fill data gaps, which will lead to stations w/ data gaps being removed.
NOTE: Be careful about data types, as there are no checks that the fill value matches the internal data types. This may cause unexpected errors.
gap_fraction (float) – if fill_data_gaps is not None, determines the maximum allowable fraction (percentage) of data that gaps can comprise. For example, a value of 0.3 means that 30% of the data (in samples) can be gaps that will be filled by fill_data_gaps. Traces with gap fractions that exceed this value will be removed. Defaults to 1. (100%) of data can be gaps.
Note
Data processing parameters
- Parameters:
detrend (bool) – apply simple linear detrend as the first preprocessing step
demean (bool) – apply demeaning to data during instrument reseponse removal. Only applied if remove_response == True.
taper_percentage (float) – apply a taper to the waveform with ObsPy taper, fraction between 0 and 1 as the percentage of the waveform to be tapered Applied generally used when data is noisy, e.g., HutchisonGhosh2016 Note: To get the same results as the default taper in SAC, use max_percentage=0.05 and leave type as hann. Tapering also happens while resampling (see util_write_cap.py). Only applied if remove_response == True.
rotate (list of str or NoneType) –
choose how to rotate the waveform data. pre-rotation processing will be applied. Can include the following options (order insensitive):
ZNE: Rotate from arbitrary components to North, East, Up
RTZ: Rotate from ZNE to Radial, Transverse, Up
UVW: Rotate from ZNE to orthogonal UVW orientation
If set to None, no rotation processing will take place.
resample_freq (float) – frequency to resample data in units Hz. If not given, no data resampling will take place. Defaults to NoneType
scale_factor (float) – scale all data by a constant factor Note: for CAP use 10**2 (to convert m/s to cm/s). Defaults to NoneType (no scaling applied)
Note
Instrument response removal parameters
- Parameters:
remove_response (bool) – remove instrument response using station response information gathered from client. Defaults to True.
output_unit (str) – the output format of the waveforms if instrument response removal is applied. Only relevant if `remove_response`==True. See ObsPy.core.trace.Trace.remove_response for acceptable values. Typical values are: ‘DISP’, ‘VEL’, ‘ACC’ (displacement [m], velocity [m/s], acceleration [m/s^2]).
water_level (float or None) – a water level threshold to apply during filtering for small values. Passed to Obspy.core.trace.Trace.remove_response
pre_filt (str, tuple or NoneType) –
apply a pre-filter to the waveforms before deconvolving instrument response. Options are:
’default’: automatically calculate (f0, f1, f2, f3) based on the
length of the waveform (dictating longest allowable period) and the sampling rate (dictating shortest allowable period). This is the default behavior. * NoneType: do not apply any pre-filtering * tuple of float: (f0, f1, f2, f3) define the corners of your pre filter in units of frequency (Hz)
Note
SAC header control parameters
- Parameters:
phase_list (list of str) – phase names to get ray information from TauP with. Defaults to ‘ttall’, which is ObsPy’s default for getting all phase arrivals. Must match Phases expected by TauP (see ObsPy TauP documentation for acceptable phases). Earliest P and S phase arrivals will be added to SAC headers, the remainder will be discarded.
taup_model (str) – name of TauP model to use to calculate phase arrivals See also phase_list which defines phases to grab arrival data for. Defaults to ‘AK135’. See ObsPy TauP documentation for avilable models.
Note
PySEP Configuration parameters
- Parameters:
config_file (str) – path to YAML configuration file which will be used to overwrite internal default parameters. Used for command-line version of PySEP
log_level (str) – Level of verbosity for the internal PySEP logger. In decreasing order of verbosity: ‘DEBUG’, ‘INFO’, ‘WARNING’, ‘CRITICAL’
legacy_naming (bool) – if True, revert to old PySEP naming schema for event tags, which is responsible for naming the output directory and SAC files. Legacy filenames look something like ‘20000101000000.NN.SSS.LL.CC.c’ (event origin time, network, station, location, channel, component). Default to False
overwrite_event_tag (str or bool) –
option to allow the user to set their own event tag, rather than the automatically generated one.
- NoneType (default): use automatically generated event tag which
consists of event origin time and Flinn-Engdahl region
- ’’: empty string will dump ALL files into output_dir, no new
directories will be made
- str: User-defined event tag which will be created in output_dir,
all files will be stored in {output_dir}/{overwrite_event_tag}/*
Note
Output file and figure control
- Parameters:
write_files (str or NoneType) –
Which files to write out after data gathering.
User-defined comma-separated list of the following
weights_az: write out CAP weight file sorted by azimuth
weights_dist: write out CAP weight file sorted by distance
weights_code: write out CAP weight file sorted by station code
station_list: write out a text file with station information
inv: save a StationXML (.xml) file (ObsPy inventory)
event: save a QuakeML (.xml) file (ObsPy Catalog)
stream: save an ObsPy stream in Mseed (.ms) (ObsPy Stream)
config_file: save YAML config file w/ all input parameters
sac: save all waveforms as SAC (.sac) files w/ correct headers
- sac_raw: save raw waveforms. these are straight from the
data center with no quality check and no SAC headers
sac_zne: save only ZNE channel SAC files
sac_rtz: save only RTZ channel SAC files
sac_uvw: save only UVW channel SAC files
Example input: `write_files`==’inv,event,stream,sac’ By Default: ‘inv,event,stream,sac,config_file,station_list’ 2) If NoneType or an empty string, no files will be written. 3) If ‘all’, write all files listed in (1)
write_files –
What to plot after data gathering. Should be a comma-separated list of the following:
map: plot a source-receiver map with event and all stations
record_section: plot a record section with default parameters
all: plot all of the above (default value)
If None, no files will be plotted.
output_dir (str) – path to output directory where all the files and figures defined by write_files and plot_files will be stored. Defaults to the current working directory.
overwrite (bool) – If True, overwrite an existing PySEP event directory. This prevents Users from re-downloading data. Defaults to False.
- get_client()[source]#
Options to choose different Clients based on attribute client which will be used to gather waveforms and metadata
- Return type:
obspy.clients.fdsn.client.Client
- Returns:
Client used to gather waveforms and metadata
- load(config_file=None, overwrite_event=True)[source]#
Overwrite default parameters using a YAML config file
- Parameters:
config_file (str) – YAML configuration file to load from
overwrite_event (bool) – overwrite event search parameters (origin time, lat, lon etc.) from the YAML config file. Defaults to True
- get_event()[source]#
Exposed API for grabbing event metadata depending on the event_selection choice.
- Options for event_selection are:
‘search’: query FDSN with event parameters ‘default’: create an event from scratch using user parameters
or if ‘client’==’LLNL’, grab event from internal database
- Return type:
obspy.core.event.Event
- Returns:
Matching event given event criteria. If multiple events are returned with the query, returns the first in the catalog
- _query_event_from_client(magnitude_buffer=0.1, depth_buffer_km=1.0)[source]#
Retrieve an event catalog using ObsPy Client.get_events(). Searches Client for a given origin time, location, depth (optional) and magnitude (optional).
To use this, set attribute `event_selection`==’search’
- Parameters:
magnitude_buffer (float) – if attribute event_magnitude is given, will search events for events with magnitude: event_magnitude +/- magnitude_buffer
depth_buffer_km (float) – if attribute event_depth_km is given, will search events for events with depth: event_depth_km +/- depth_buffer_km
- Return type:
obspy.core.event.Event
- Returns:
Matching event given event criteria. If multiple events are returned with the query, returns the first in the catalog
- _create_event_from_scratch()[source]#
Make a barebones event object based on user-defined parameters which will then be used to query for waveforms and StationXML data
- Return type:
obspy.core.event.Event
- Returns:
Event object with origin and magnitude information appended
- _get_event_from_llnl_catalog()[source]#
Special getter function for Lawrence Livermore National Lab data LLNL database has a special client
TODO Do we need more filtering in the catalog?
- Return type:
obspy.core.event.Event
- Returns:
Event information queried from LLNL database
- get_stations()[source]#
Exposed API for grabbing station metadata from client. Download station metadata using ObsPy get_stations() with a user-defined bounding box and for user-defined networks, stations etc.
- Return type:
obspy.core.inventory.Inventory
- Returns:
Station metadata queried from Client
- get_waveforms()[source]#
Exposed API for grabbing waveforms from client. Internal logic determines how waveforms are queried, but mainly it is controlled by the internal inv attribute detailing station information, and reference times for start and end times.
Note
We do not use the minimumlength variable so that we can figure out which stations have data gaps
- Return type:
obspy.core.stream.Stream
- Returns:
Stream of channel-separated waveforms
- _bulk_query_waveforms_from_client()[source]#
Make a bulk request query to the Client based on the internal inv attribute defining the available station metadata.
- Return type:
obspy.core.stream.Stream
- Returns:
Stream of channel-separated waveforms
- mass_download()[source]#
Use ObsPy Mass downloader to grab events from a pre-determined region
- Keyword Arguments:
domain_type (str) –
How to define the search region domain - rectangular: rectangular bounding box defined by min/max
latitude/longitude
circular: circular bounding circle defined by the events latitude and longitude, with radii defined by mindistance_km and maxdistance_km
delete_tmpdir (bool) – Remove the temporary directories that store the MSEED and StationXML files which were downloaded by the mass downloader. Saves space but also if anything fails prior to saving data, the downloaded data will not be saved. Defaults to True.
- curtail_stations()[source]#
Remove stations from inv based on station distance, azimuth, etc.
Note
One-function function currently, but we can expand curtailing here if need by
- Return type:
obspy.core.inventory.Inventory
- Returns:
station metadata that has been curtailed based on acceptable paramaters
- preprocess()[source]#
Very simple preprocessing to remove response and apply a prefilter scale waveforms (if necessary) and clean up waveform time series
- Return type:
obspy.core.stream.Stream
- Returns:
a preprocessed stream with response removed, amplitude scaled (optional), and time series standardized
- _remove_response_llnl(st)[source]#
Remove response information from LLNL stations. This requires using the custom LLNL DB client. There are also some internal checks that need to be bypassed else they cause the program to crash
- rotate_streams()[source]#
Rotate arbitrary three-component seismograms to desired orientation ‘ZNE’, ‘RTZ’ or ‘UVW’.
Warning
This function combines all traces, both rotated and non-rotated components (ZNE, RTZ, UVW, but not raw, e.g., 12Z), into a single stream. This is deemed okay because we don’t do any component-specific operations after rotation.
- Return type:
obspy.core.stream.Stream
- Returns:
a stream that has been rotated to desired coordinate system with SAC headers that have been adjusted for the rotation, as well as non-rotated streams which are saved incase user needs access to other components
- write(write_files=None, _return_filenames=False, _subset=None, **kwargs)[source]#
Write out various files specifying information about the collected stations and waveforms.
Options are:
config_file: write the current configuration as a YAML file
station_list: write a text file with station information
inv: write the inventory as a StationXML file
event: write the event as a QuakeML file
stream: write the stream as a single MSEED file
- sac_zne: write the stream as individual (per-channel) SAC files
for ZNE components with the appropriate SAC header
sac_rtz: write out per-channel SAC files for RTZ components
sac_uvw: write out per-channel SAC files for UVW components
weights_dist: write out CAP ‘weights.dat’ file sorted by distance
weights_az: write out CAP ‘weights.dat’ file sorted by azimuth
weights_code: write out CAP ‘weights.dat’ file sorted by sta code
- Parameters:
write_files (list of str) – list of files that should be written out, must match the acceptable list defined in the function or here in the docstring. If not given, defaults to internal list of files
_return_filenames (bool) – internal flag to not actually write anything but just return a list of acceptable filenames. This keeps all the file naming definitions in one function. This is only required by the check() function.
_subset (list) – internal parameter used for intermediate file saving. PySEP will attempt to save files once they have been collected however if the files it tries to save do not match against the User-defined file list, they will be ignored.
- Keyword Arguments:
order_station_list_by (str) – how to order the station list available options are: network, station, latitude, longitude, elevation, burial.
config_fid (str) – optional name for the configuration file name defaults to ‘pysep_config.yaml’
station_fid (str) – optional name for the stations list file name defaults to ‘station_list.txt’
inv_fid (str) – optional name for saved ObsPy inventory object, defaults to ‘inv.xml’
event_fid (str) – optional name for saved ObsPy Event object, defaults to ‘event.xml’
stream_fid (str) – optional name for saved ObsPy Stream miniseed object, defaults to ‘stream.ms’
sac_subdir (str) – sub-directory within output directory and event directory to save SAC files. Defaults to SAC/. Use an empty string to dump files directly into the event directory
- _write_sac(st, output_dir=os.getcwd(), components=None)[source]#
Write SAC files with a specific naming schema, which allows for both legacy (old PySEP) or non-legacy (new PySEP) naming.
- Parameters:
st (obspy.core.stream.Stream) – Stream to be written
output_dir (str) – where to save the SAC files, defaults to the current working directory
components (str) – acceptable component values for saving files, allows only saving subsets of the Stream. Example ‘RTZNE’ or just ‘R’. Must match against Trace.stats.component
- write_config(fid=None, overwrite=False)[source]#
Write a YAML config file based on the internal Pysep attributes. Remove a few internal attributes (those containing data) before writing and also change types on a few to keep the output file simple but also re-usable for repeat queries.
- Parameters:
fid (str) – name of the file to write. defaults to config.yaml
overwrite (bool) – if True and fid already exists, save a new config file with the same name, overwriting the old file. if False (default), throws a warning if encountering existing fid and does not write config file
- plot()[source]#
Plot map and record section if requested. Allow general error catching for mapping and record section plotting because we don’t want these auxiliary steps to crash the entire workflow since they are not critical.
- _event_tag_and_output_dir()[source]#
Convenience function to establish and naming schema for files and directories. Also takes care of making empty directories.
- Return type:
tuple of str
- Returns:
(unique event tag, path to output directory)
- _set_log_file(mode)[source]#
Write logger to file as well as stdout, with the same format as the stdout logger. Need mode==1 to move the log file after everything is done because we don’t know the event tag prior to starting the logs
- Parameters:
mode (int) – Two options for using this function 0: set the logger to a temporary file ‘pysep.log’, 1: move the logger from the temporary file into final output dir
- run(event=None, inv=None, st=None, **kwargs)[source]#
Run PySEP: Seismogram Extraction and Processing. Steps in order are:
Set default parameters or load from config file
Check parameter validity, exit if unexpected values
Get data and metadata (QuakeML, StationXML, waveforms)
Remove unacceptable stations based on user-defined criteria
Remove unacceptable waveforms based on user-defined criteria
Generate some new metadata for tagging and output
Pre-process waveforms and standardize for general use
Generate output files and figures as end-product
- Parameters:
event (obspy.core.event.Event) – optional user-provided event object which will force a skip over QuakeML/event searching
inv (obspy.core.inventory.Inventory) – optional user-provided inventory object which will force a skip over StationXML/inventory searching
st (obspy.core.stream.Stream) – optional user-provided strean object which will force a skip over waveform searching
- pysep.get_data(config_file=None, event=None, inv=None, st=None, write_files=None, plot_files=None, log_level=None, *args, **kwargs)[source]#
Interactive/scripting function to run PySep and return quality controlled, SAC-headed stream object which can then be used for other processes.
Note
By default turns file writing and plotting OFF so that this function acts solely as a data collection/processing call.
Note
args and kwargs are passed directly to Pysep.__init__() so you can define all your parameters in this call, or through a config file
>>> from pysep import get_data >>> st = get_data(config_file=’config.yaml’)
- Parameters:
config_file (str) – path to YAML config file which will overload any default configs
event (obspy.core.event.Event) – optional user-provided event object which will force a skip over QuakeML/event searching
inv (obspy.core.inventory.Inventory) – optional user-provided inventory object which will force a skip over StationXML/inventory searching
st (obspy.core.stream.Stream) – optional user-provided strean object which will force a skip over waveform searching
write_files (list or None) – list of files to write, acceptable options defined in write(). Defaults to None, no files will be written
plot_files (list or None) – list of files to plot, acceptable options defined in plot(). Defaults to None, no figures will be made
log_level (str or None) – verbosity of logger. Defaults to no logging to mimic a standard function call rather than a standalone package
- Return type:
tuple of (obspy.core.event.Event, obspy.core.inventory.Inventory, obspy.core.stream.Stream)
- Returns:
returns obspy objects defining data and metadata that have been collected by PySEP
- class pysep.RecordSection(pysep_path=None, syn_path=None, stations=None, cmtsolution=None, st=None, st_syn=None, sort_by='default', scale_by=None, time_shift_s=None, zero_pad_s=None, move_out=None, min_period_s=None, max_period_s=None, preprocess=None, max_traces_per_rs=None, integrate=0, xlim_s=None, components='ZRTNE12', y_label_loc='default', y_axis_spacing=1, amplitude_scale_factor=1, azimuth_start_deg=0.0, distance_units='km', tmarks=None, geometric_spreading_factor=0.5, geometric_spreading_k_val=None, geometric_spreading_exclude=None, geometric_spreading_ymax=None, geometric_spreading_save=None, figsize=(9, 11), show=True, save='./record_section.png', overwrite=False, log_level='DEBUG', cartesian=False, synsyn=False, **kwargs)[source]#
Record section plotting tool which takes ObsPy streams and:
preprocesses and filters waveforms,
sorts source-receiver pairs based on User input,
produces record section waveform figures.
Note
Used for reading in Pysep-generated waveforms
- Parameters:
pysep_path (str) – path to Pysep output, which is expected to contain trace-wise SAC waveform files which will be read in.
Note
Used for reading in SPECFEM-generated synthetic waveforms
- Parameters:
syn_path (str) – full path to directory containing synthetic seismograms that have been outputted by SPECFEM. The synthetics are expected in the format: ‘??..*.sem?*’, which generally matches SPECFEM files like ‘NZ.BFZ.BXE.semd’
stations (str) – full path to STATIONS file used to define the station coordinates. Format is dictated by SPECFEM
cmtsolution (str) – required for synthetics, full path to SPECFEM source file, which was used to generate SPECFEM synthetics. Example filenames are CMTSOLUTION, FORCESOLUTION, SOURCE.
cartesian (bool) – lets RecSec know that the domain is set in Cartesian coordinates (SPECFEM2D or SPECFEM3D_CARTESIAN in UTM), such that data/metadata reading will need to adjust acoordingly because the ObsPy tools used to read metadata will fail for domains defined in XYZ
synsyn (bool) – flag to let RecSec know that we are plotting two sets of synthetic seismograms. Such that both pysep_path and syn_path will be both attempt to read in synthetic data. Both sets of synthetics MUST share the same cmtsolution and stations metadata
Note
Used for defining user-input waveforms data
- Parameters:
st (obspy.core.stream.Stream) – Stream objects containing observed time series to be plotted on the record section. Can contain any number of traces
st_syn (obspy.core.stream.Stream) – Stream objects containing synthetic time series to be plotted on the record section. Must be same length as st
Note
Waveform plotting organization parameters
- Parameters:
sort_by (str) –
How to sort the Y-axis of the record section, available:
’default’: Don’t sort, just iterate directly through Stream
- ’alphabetical’: sort alphabetically A->Z. Components sorted
separately with parameter components
- ’azimuth’: sort by source-receiver azimuth (deg) with constant
vertical spacing on the y-axis. Requires azimuth_start_deg
- ’backazimuth’: sort by source-receiver backazimuth (deg) with
constant vertical spacing. Requires azimuth_start_deg
- ’distance’: sort by source-receiver distance (km) with constant
- vertical spacing. Smallest distances at the top of the figure.
Requires distance_units
- ’abs_distance’: absolute vertical spacing of waveforms defined by
source-receiver distance. Smallest distance at the top of the figure. Requires distance_units
- ’abs_azimuth’: absolute vertical spacing of waveforms defined
by source-receiver azimuth (deg).
- ’abs_backazimuth’: absolute vertical spacing of waveforms by
source-receiver backazimuth (deg).
- ’*_r’: Add a ‘_r’ to any of the values about to REVERSE the sort,
e.g., ‘alphabetical_r’ sort will go Z->A
scale_by (str) –
scale amplitude of waveforms by available:
None: Not set, no amplitude scaling, waveforms shown raw
- ’normalize’: scale each trace by the maximum amplitude,
i.e., > a /= max(abs(a)) # where ‘a’ is time series amplitudes
- ’global_norm’: scale by the largest amplitude to be displayed on
the screen. Will not consider waveforms which have been excluded on other basis (e.g., wrong component) i.e., > st[i].max /= max([max(abs(tr.data)) for tr in st])
- ’geometric_spreading’: scale amplitudes by expected reduction
through geometric spreading. Related parameters are:
geometric_spreading_factor
geometric_spreading_k_val
geometric_spreading_exclude
geometric_spreading_ymax A
Equation is A(d) = k / sin(d) ** f
Where A(d) is the amplitude reduction factor as a function of distnace, d. ‘k’ is the geometric_spreading_k_val and ‘f’ is the geometric_spreading_factor. ‘k’ is calculated automatically if not given.
time_shift_s (float OR list of float OR str) –
apply static time shift to waveforms, two options:
- float (e.g., -10.2), will shift ALL waveforms by
that number (i.e., -10.2 second time shift applied)
- list (e.g., [5., -2., … 11.2]), will apply individual time
shifts to EACH trace in the stream. The length of this list MUST match the number of traces in your input stream.
- str: apply time shift based on a theoretical TauP phase arrival
if available in the SAC header. These should have been appended by PySEP during data download. If no value is available in the SAC header, defaults to 0. This may have unintended consequences so you should manually check that all values are okay. Available options are: - ‘first_arrival_time’: shift based on earliest phase arrival - ‘p_arrival_time’: shift based on earliest P phase arrival - ‘s_arrival_time’: shift based on earliest S phase arrival
zero_pad_s (list) –
zero pad data in units of seconsd. applied after tapering and before filtering. Input as a tuple of floats,
- (start, end): a list of floats will zero-pad the START and END
of the trace. Either can be 0 to ignore zero-padding at either end
amplitude_scale_factor (float OR list of float) –
apply scale factor to all amplitudes. Used as a dial to adjust amplitudes manually. Defaults to 1. Two options:
float (e.g., 1.2), will multiply ALL waveforms by that number
- list (e.g., [5., -2., … 11.2]), will apply individual amplitude
scale to EACH trace in the stream. The length of this list MUST match the number of traces in your input stream.
move_out (float) – Optional. A velocity value that will be used to calculate move out, which will time shift seismograms based on their source receiver distance. This parameter will be ADDED to time_shift_s (both float and list), if it is provided. Should be in units of distance_units/s
azimuth_start_deg (float) – If sorting by azimuth, this defines the azimuthal angle for the waveform at the top of the figure. Set to 0 for default behavior
distance_units (str) – Y-axis units when sorting by epicentral distance ‘km’: kilometers on the sphere ‘deg’: degrees on the sphere ‘km_utm’: kilometers on flat plane, UTM coordinate system
Note
Data processing parameters
- Parameters:
preprocess (str) –
choose which data to preprocess, options are:
None: do not run preprocessing step, including filter (Default)
’st’: process waveforms in st
’st_syn’: process waveforms in st_syn. st still must be given
’both’: process waveforms in both st and st_syn
min_period_s (float) – minimum filter period in seconds
max_period_s (float) – maximum filter period in seconds
components (str) – a sequence of strings representing acceptable components from the data. Also determines the order these are shown EVEN when sorted by other variables. For example, components==’ZR’ would only display Z and R components, and Z components would be should BEFORE R components for the SAME station.
integrate (int) –
apply integration integrate times on all traces. acceptable values [-inf, inf], where positive values are integration and negative values are differentiation
e.g., if integrate == 2, will integrate each trace twice. or if integrate == -1, will differentiate once or if integrate == 0, do nothing (default)
Note
Geometric spreading parameters, used for amplitude scaling
- Parameters:
geometric_spreading_factor (float) – factor to scale amplitudes by predicting the expected geometric spreading amplitude reduction and correcting for this factor. Related optional parameter: geometric_spreading_k_val. For Rayleigh waves, geometric_spreading_factor == 0.5 (default)
geometric_spreading_k_val (float) – Optional constant scaling value used to scale the geometric spreading factor equation. If not given, calculated automatically using max amplitudes Value should be between 0.5 and 1.0 for regional surface waves.
geometric_spreading_exclude (list) – a list of station names that should match the input stations. Used to exclude stations from the automatic caluclation of the geometric
geometric_spreading_ymax (float) – Optional value for geometric spreading plot. Sets the max y-value on the plot. If not set, defaults to whatever the peak y-value plotted is.
geometric_spreading_save (str) – file id to save separate geometric spreading scatter plot iff `scale_by`==’geometric_spreading’. If NoneType, will not save. By default, turned OFF
Note
Figure generation control parameters
- Parameters:
max_traces_per_rs (int) – maximum number of traces to show on a single record section plot. Defaults to all traces in the Stream
xlim_s (list of float) – [start, stop] in units of time, seconds, to set the xlimits of the figure
y_axis_spacing (float) – spacing between adjacent seismograms applied to Y-axis on relative (not absolute) scales. Defaults to 1.
y_label_loc (str) –
Location to place waveform labels on the y-axis
’default’: auto choose the best location based on sort_by
- ’y_axis’: Replace tick labels on the y-axis (left side of figure),
This won’t work if using absolute sorting and will be over- written by ‘default’
- ’y_axis_abs’: For absolute y-axis only. waveform labels plotted
on the left side outside border, with y-axis labels overlapping (showing distance or azimuth)
- ’y_axis_abs_right’: For absolute y-axis only. waveform labels
plotted on the right side outside border, with y-axis labels on the left side of the figure (showing distance or azimuth)
- ’y_axis_right’: Replace tick labels on the right side of the
y-axis. This option won’t work with absolute sorting
- ’x_min’: Place labels on top of the waveforms at the global min
x-value on the figure
- ’x_max’: Place labels on top of the waveforms at the global max
x-value on the figure
None: Don’t plot any text labels
tmarks (list of float) – place vertical lines at given reference times. Used for marking reference times such as the event origin, or phase arrival. Input as a list of times in units of seconds (where T=0 is the event origin time). For example `tmarks`=[0, 100, 200] would set vertical lines at 0s, 100s and 200s
figsize (tuple of float) – size the of the figure, passed into plt.subplots()
show (bool) – show the figure as a graphical output
save (str) – path to save output figure, will create the parent directory if it doesn’t exist. If None, will not save (default).
Note
Internal RecSec parameters
- Parameters:
overwrite (bool) – if the path defined by save exists, will overwrite the existing figure
log_level (str) – level of the internal logger. In order of ascending verbosity: ‘CRITICAL’, ‘WARNING’, ‘INFO’, ‘DEBUG’.
- Raises:
AssertionError – if any parameters are set incorrectly
- _generate_synthetic_stream(syn_path, source, stations, cartesian=False, fmt='*.*.*.sem?*')[source]#
Convenience fucntion to read in synthetic seismograms from SPECFEM2D, SPECFEM3D or SPECFEM3D_GLOBE. Can be used to read in both st and st_syn
- Parameters:
syn_path (str) – full path to directory containing synthetic seismograms
source (str) – path to source file which defined the source that generated the synthetics. Acceptable values are CMTSOLUTION (from SPECFEM3D/GLOBE), and SOURCE (from SPECFEM2D)
stations (str) – full path to STATIONS file used to define the station coordinates. Format is dictated by SPECFEM
fmt (str) – the expected filename format of the sythetics. Based on ASCII style files generated by SPECFEM. Defaults to ‘??..*.sem?*’. Expected filename looks something like: ‘NN.SSS.BXC.semd’
- Return type:
obspy.core.stream.Stream
- Returns:
Stream object with synthetic waveforms
- check_parameters()[source]#
Check that parameters are set properly and in line with how they are expected by the program
Note
Not using assertions here because we want all incorrect parameters to be evaluated together and displayed at once, that way the user doesn’t have to run this function multiple times to figure out how to set their parameters correctly
- Raises:
AssertionError – If any parameters are not set as expected by plotw_rs functionality
- get_skip_idx()[source]#
Get a list of any traces that don’t adhere to user-defined boundaries such as dist, az, baz, id, or component matches. Don’t actually remove the traces from the stream but rather just collect indices we can use to skip when plotting.
TODO add distance, azi and backazi skip criteria
- Return type:
np.array
- Returns:
returns an indexing list which can be used to skip over traces that don’t adhere to certain criteria
- get_parameters()[source]#
Calculate parameters in a specific order and based on the user-defined information.
Note
The order of function calls here is important! Some of the ‘get’ functions require the results of other ‘get’ functions.
Calculated Parameters
np.array idx: a linear indexing of all the traces in the stream np.array station_ids: an ordered list of station ids, used to get station names that match the index defined in `idx` np.array max_amplitudes: abs max amplitudes of each trace, used for normalization np.array amplitude_scaling: An array to scale amplitudes based on user choices np.array time_shift_s: An array to time shift time series based on user choices np.array y_axis: Y-Axis values based on sorting algorithm, used for plotting np.array distances: source-receiver distances in `distance_units` units np.array azimuths: source-receiver azimuths in degrees np.array backazimuths: source-receiver backazimuths in degrees np.array sorted_idx: sorted indexing on all the traces of the stream based on the chosen sorting algorithm
- get_xlims(st=None)[source]#
The x-limits of each trace depend on the overall time shift (either static or applied through move out), as well as the sampling rate of each trace (which can vary). Retrieve an index-dependent list of x-limits which can be used to truncate the time series during plotting.
Note
Requires that get_time_shifts() has already been run
- Parameters:
st (obspy.core.stream.Stream) – stream object to get xlims for. By default this is the ‘data’ stored in st but it can also be given st_syn to get synthetic x limits which may differ
- Return type:
np.array
- Returns:
an array of tuples defining the start and stop indices for EACH trace to be used during plotting. Already includes time shift information so xlim can be applied DIRECTLY to the time shifted data
- get_srcrcv_stats()[source]#
Get source receiver information such as min max values, and count-related numbers (e.g., num stations) to be used mainly for print statements and text information
Stats Arguments
np.array event_names: unique event names taken from the SAC header int nevents: number of unique events in the stream np.array unique_sta_ids: unique station codes taken from trace stats int nstation_ids: number of unique station codes np.array network_codes: unique network codes taken from station ids int nnetwork: number of unique network codes np.array station_codes: unique station codes taken from station ids int nstation: number of unique station codes np.array location_codes: unique location codes taken from station ids int nlocation: number of unique location codes np.array channel_codes: unique channel codes taken from station ids int nchannel: number of unique channel codes bool reverse_sort: determine if the user wants to reverse their sort, they do this by appending '_r' to the end of the `sort_by` argument
- get_time_offsets()[source]#
Find time shift to data constituting difference between trace start time and event origin time. Both synthetics and data should have the correct timing. Time offsets will be used during plotting to make sure all traces have the same T=0
Note
Appends time offset directly to the stats header, overwriting any value that might have been there already
- get_time_shifts()[source]#
Very simple function which allows float inputs for time shifts and ensures that time shifts are always per-trace arrays Applies the move out by calculating a time shift using src-rcv distance
- Return type:
np.array
- Returns:
a stream-lengthed array of time shifts that can be applied per trace
- get_srcrcv_dist_az_baz()[source]#
Convenience function to wrap _get_srcrcv_dist_az_baz_trace into a loop over the whole stream and return lists of distances, azimuths, and backazimuths
- Rtype distances:
np.array
- Return distances:
source-receiver distances in user-defined units in the original order of Stream
- Rtype azimuths:
np.array
- Return azimuths:
source-receiver azimuths (deg) in the original order of Stream
- Rtype backazimuths:
np.array
- Return backazimuths:
source-receiver azimuths (deg) in the original order of Stream
- _get_srcrcv_dist_az_baz_trace(tr=None, idx=0)[source]#
Check the source-receiver characteristics such as src-rcv distance, azimuth, backazimuth for a given trace.
Note
This function ASSUMES that SAC headers have been written to the traces. Otherwise we will need more complicated ways to get event lat and lon
- Parameters:
tr (obspy.core.trace.Trace) – trace to get srcrcv information for. If None, will use idx
idx (int) – if `tr`==None, user can provide index of self.st (Stream) defaults to index 0
- Rtype gcdist:
float
- Return gcdist:
great circle distance in units specified by distance_units, can be ‘km’ or ‘deg’
- Rtype az:
float
- Return az:
azimuth (degrees) between source and receiver
- Rtype baz:
float
- Return baz:
azimuth (degrees) between source and receiver
- get_amplitude_scaling(_choice='st')[source]#
Scale the amplitudes of all the waveforms by producing a Stream dependent scale factor based on user choice. It is expected that the output array will be DIVIDED by the data arrays:
i.e., st[i].data /= self.amplitude_scaling[i]
Note
Needs to be run AFTER preprocessing because filtering etc. will affect the final amplitudes of the waveforms
- Parameters:
_choice (str) – Internal choice only required for ‘normalize’ scaling. st for amplitude scaling w.r.t data array, st_syn for amplitude scaling w.r.t synthetics
- Return type:
np.array
- Returns:
an array corresponding to the Stream indexes which provides a per-trace scaling coefficient
- calculate_geometric_spreading()[source]#
Stations with larger source-receiver distances will have their amplitude scaled by a larger value.
For information on geometric spreading, see Stein and Wysession, Section 4.3.4, Eq 20 (for Rayleigh waves, geometrical spreading factor = 0.5). For our purposes, this will fold the attenuation factor into the same single factor that accounts for geometric spreading.
Equation is for amplitude reduction ‘A’ as a factor of distance ‘d’:
A(d) = k / sin(d) ** f
where ‘k’ is the geometric_spreading_k_val and ‘f’ is the geometric_spreading_factor. ‘k’ is calculated automatically if not given by the User. In the code, A(d) is represented by w_vector
Note
This does not take into account the variation in amplitude from focal mechanism regions
- TODO
Look in Stein and Wysession and figure out vector names
- Allow ignoring specific components, currently only allowed to
exclude on a per-station basis
- Return type:
list
- Returns:
scale factor per trace in the stream based on theoretical geometrical spreading factor. This is meant to be MULTIPLIED by the data arrays
- get_sorted_idx()[source]#
Sort the source-receiver pairs by the chosen parameters. Except for in default sorting, always maintains component ordering defined by the user, i.e., for the same station, components will be ordered by the component parameter.
- Return type:
np.array
- Returns:
returns an indexing list which is sorted based on the user defined sort_by argument.
- _sort_by_alphabetical()[source]#
Sort by full station name in order. That is, network is sorted, then within network, station is sorted, and so on. Components are sorted last and NOT in alphabetical order. They are ordered based on the user-input components parameter.
- Return type:
np.array
- Returns:
indexing of networks and stations sorted alphabetically
- _get_component_order()[source]#
When we are sorting, we want components to be sorted by the preferred order (i.e, ZRT, Z index is 0, T is 2), which means the components have to be reordered to adhere to the sorting algorithm. ALSO we need to ensure that we honor component order even if everything else is being sorted reverse alphabetically so the components entry needs to be the opposite of stats.reverse_sort
- Return type:
list
- Returns:
components converted to integer values which can be used for sorting algorithms.
- get_y_axis_positions()[source]#
Determine how seismograms are vertically separated on the Y-Axis when plotting. Allows for constant separation, or absolute separation based on distance or (back)azimuth separation.
- Return type:
np.array
- Returns:
an array of actual y-axis positions that can be passed directly to plt.plot() (or similar)
- get_x_axis_tick_values()[source]#
Determine, based on the length of the plotted traces, how often tick marks should be applied so that they are spaced as aesthetically pleasing values.
- process_st()[source]#
Preprocess the Stream with optional filtering in place.
Note
Data in memory will be irretrievably altered by running preprocess.
- TODO Add feature to allow list-like periods to individually filter
seismograms. At the moment we just apply a blanket filter.
- plot(subset=None, page_num=None, **kwargs)[source]#
Plot record sections based on all the available information
- Parameters:
subset (list of int) – subset of sorted_idx if there are too many waveforms to plot on one page (set by max_traces_per_rs). e.g., to get the first 10 entries, subset=[0,10]
page_num (int) – If multiple pages are required due to exceeding max_traces_per_rs, page_num will make adjustments to the figure name and title to differentiate different pages of the same record section
- _log_relative_information(start=0, stop=None)[source]#
If both st and st_syn are being plotted, then write some relative information about their amplitudes to the main logger.
Related to #116
- _plot_trace(idx, y_index, choice='st')[source]#
Plot a single trace on the record section, with amplitude scaling, time shifts, etc. Observed waveforms are black, synthetics are red.
- Parameters:
idx (int) – index of the trace to plot and all trace-ordered values like amplitude scaling and time shifts
y_index (int) – numerical order which this trace was plotted, as each trace is plotted on top of the previous one. y_index should be iterated linearly in the loop that calls this function.
choice (str) – choice of ‘st’ or ‘st_syn’ depending on whether you want to plot the observed or synthetic waveforms
- _plot_azimuth_bins(start=None, stop=None)[source]#
If plotting by azimuth, create visual bin separators so the user has a visual understanding of radiation patterns etc.
- Parameters:
start (int) – optional starting index to determine the bounds of the azimuth bins
stop (int) – optional stop index to determine the bounds of the azimuth bins
- _plot_axes(start=0, stop=None)[source]#
Contains the logic in how to handle the x- and y-axis labels, ticks etc.
Logic options for how to plot the y-axis: - Relative: Relative plotting means the yticklabels will get replaced
by station information. This can either be on the right or left side but will be attached to the actual axis
- Absolute: Absolute plotting means the y-axis actual represents
something (e.g., distance, azimuth). That means labels can not replace the y-ticks and they have to be placed within the figure
Note
Relative plotting can also place labels OFF the y-axis, at which point the y-axis is turned off because it contains no usable information
- Parameters:
start (int) – optional starting index for creating text labels
stop (int) – optional stop index for creating text labels
- _set_y_axis_absolute()[source]#
If ‘abs_’ in sort_by, then the Y-axis should be shown in absolute scale. That means we need to format the text labels, add some labelling etc.
- _set_y_axis_text_labels(start=0, stop=-1, loc='y_axis')[source]#
Plot a text label next to each trace describing the station, azimuth and source-receiver distance. We need to do this all at once because we have to replace all ticks and tick labels at once.
Note
if using the ‘subset’ option in plot, need to also tell the y-axis plotter that we are only plotting a subset of data by using the start and stop parameters
- Parameters:
start (int) – starting index for plotting, default to start 0
stop (int) – stop index for plotting, default to end -1
loc (str) –
location to place the y_axis text labels, available: - y_axis: Place labels along the y-axis (left side of the figure)
Will replace the actual y-tick labels so not applicable for absolute sorting which requries the y-axis labels
- y_axis_abs: for absolute plotting, place waveform labels to the
left side of the figure (outside border), co-existing with y-axis labels
y_axis_right: same as y_axis but set on the right side of figure
x_min: Place labels on the waveforms at the minimum x value
x_max: Place labels on the waveforms at the maximum x value
- _plot_title(nwav=None)[source]#
Create the title of the plot based on event and station information Allow dynamic creation of title based on user input parameters
- Parameters:
nwav (int) – if using subset, the title needs to know how many waveforms it’s showing on the page. self.plot() should tell it
- pysep.plotw_rs(*args, **kwargs)[source]#
Plot Waveform Record Section (main call function for RecordSection)
Instantiates the RecordSection class, runs processing and parameter getting, and then plots record section. Contains additional logic for breaking up figures into multiple pages if requested by the user, while keeping sort order and waveform spacing consistent across multiple reord sections.
Note
See RecordSection.__init__() for acceptable args and kwargs