PySEP: data download¶
The PySEP class is a tool for downloading seismic waveform and metadata en-masse. It applies some preprocessing, instrument response removal and rotation to all data for use in moment tensor and tomographic inversions. PySEP can be used via the command line, or through scripting.
Note
See PySEP class API documentation for a list of available input parameters
What it does
Create or gather event metadata (QuakeML) with user-defined event parameters
Gather station metdata (StationXML) in a bounding box surrounding an event,
Curtail station list due to distance or azimuth constraints
Gather three-component waveform data for chosen station catalog
Quality check waveforms: remove gaps, null out missing components, address clipped amplitudes
Preprocess waveforms: detrend, remove response, amplitude scaling, standardizing time series
Rotate streams to desired components: ZNE, RTZ, UVW (triaxial orthogonal)
Append SAC headers with event and station metadata, and TauP arrivals
Write per-component SAC files, plus StationXML, QuakeML and MSEED files
Write CAP (Cut-and-Paste) weight files for moment tensor inversions
Write config YAML files which can be used to reproduce data gathering/processing
PySEP via Command Line¶
The most convenient way of using PySEP is as a command line tool. To bring up the command line tool help message:
pysep -h
To list out available pre-defined configuration files which can be used as an example of how PySEP operates:
pysep -l # or pysep --list
-p/--preset -e/--event
-p mtuq_workshop_2022 -e 2009-04-07T201255_ANCHORAGE.yaml
-p mtuq_workshop_2022 -e 2014-08-25T161903_ICELAND.yaml
-p mtuq_workshop_2022 -e 2017-09-03T033001_NORTH_KOREA.yaml
-p mtuq_workshop_2022 -e 2020-04-04T015318_SOUTHERN_CALIFORNIA.yaml
...
To run one of the pre-defined configuration files to gather and process data:
pysep -p mtuq_workshop_2022 -e 2017-09-03T033001_NORTH_KOREA.yaml
When performing your own data collection, you can create an empty configuration file that you must fill out with your own parameters:
pysep -W # or pysep --write
To run this configuration file:
pysep -c pysep_config.yaml
Input Parameters and YAML Parameter File¶
Note
See PySEP class API documentation for a list of available input parameters
When using PySEP from the command line, all input variables are controlled by a YAML parameter file, which is a text file containing a set of parameter names and corresponding values, for example:
# key: value
origin_time: '2009-04-07T20:12:55.351000Z'
These parameters control everything from the hypocentral location of your earthquake, to the specific waveform data you want to collect, to the types of preprocessing steps to be applied.
To generate a template parameter file you can run:
pysep -W # or pysep --write
Scripting PySEP¶
Note
See PySEP class API documentation for a list of available input parameters when scripting PySEP.
PySEP can similarly be scripted into other tools. When using a YAML parameter file, this looks like:
from pysep import Pysep
sep = Pysep(config_file='pysep_config.yaml')
sep.run()
The results of the run function are stored as internal attributes. The most important attributes are
st: ObsPy Stream object with all waveforms gathered and processed
inv: Station metadata and response information
event: Event object defining the event which the waveforms recorded
sep.st
print(sep.st[0].stats.sac)
sep.inv
sep.event
You can also pass parameters directly to the instantiation of the PySEP class. See the PySEP docstring for input parameter types and definitions.
from pysep import Pysep
sep = Pysep(origin_time="2000-01-01T00:00:00", event_latitude=64.8596,
event_longitude=-147.8498, event_depth_km=15., ....
)
PySEP Outputs¶
PySEP uses a default directory structure when saving files:
output_dir
: By default, PySEP writes all files to the User-defined parameteroutput_dir
, which defaults to the current working directory.event_tag
: Files are written into a sub-directory defined by the event origin time, and Flinn-Engdahl region. For example:2009-04-07T201255_SOUTHERN_ALASKA
sac_subdir
: All waveform files are saved in a further sub-directory (default: SAC/), to avoid cluttering up the output directory. Waveform files are tagged by the event_tag and trace ID.
Users can use the parameters write_files
and plot_files
to control
exactly what files are produced during a PySEP (see API documentation for details).
By default, PySEP will write SAC files, StationXML, QuakeML and config files, and create a source-receiver map and record section.
Override Directory Names¶
In some cases it may be useful for Users to save files directly to their working directory, without all the automatically generated sub-directories.
To ignore the automatically generated event tag, you can set the
overwrite_event_tag
parameter as an empty string. Via the command line:pysep -c pysep_config.yaml --overwrite_event_tag ''
or via scripting:
sep = Pysep(overwrite_event_tag="")
To set your own event tag, use a string value
pysep -c pysep_config.yaml --overwrite_event_tag event_abc
To ignore the SAC subdirectory and save waveform files directly in the output directory, use the
sac_subdir
parameter, which should be input in your YAML parameter file:sac_subdir: ''
or via scripting
sep = Pysep(sac_subdir="") # or use a string value to define your own
Example: if a User only wants to save SAC waveforms for the rotated RTZ component within their current working directory, ignoring all automatically generated sub directories, all other written files and all plots:
sep = Pysep(overwrite_event_tag="", sac_subdir="", write_files="sac_rtz", plot_files="")
Override Filenames¶
Note
The output SAC file names are hardcoded as trace IDs (with or without the event tag). If additional control over file IDs is a required feature, please open up a GitHub issue
The event tag used to name the output directory and written SAC files can be set
manually by the user using the overwrite_event_tag
argument.
Other output file names can also be changed from their default values, see the
write function
for write file options and
arguments to use for changing file names.
An example of this via the command line:
pysep -c pysep_config.yaml \
--overwrite_event_tag event_abc \
--config_fid event_abc.yaml \
--stations_fid event_abc_stations.txt \
--inv_fid event_abc_inv.xml \
--event_fid event_abc_event.xml \
--stream_fid event_abc_st.ms
Or with scripting
sep = Pysep(overwrite_event_tag="event_abc",
config_fid="event_abc.yaml", ...)
Legacy Filenaming Schema¶
The new version of PySEP uses a file naming schema that is incompatible with previous versions, which may lead to problems in established workflows.
To honor the legacy naming schema of PySEP, simply use the legacy_naming
parameter. This will change how the event tag is formatted, how the output
directory is structured, and how the output SAC files are named.
pysep -c pysep_config.yaml --legacy_naming
Or with scripting
sep = Pysep(legacy_naming=True, ...)
Multiple Event Input¶
To use the same configuration file with multiple events, you can use an event file passed to PySEP through the command line.
Note
Multiple event input is only available for command line usage of PySEP. We suggest using a for loop if you would like to script multiple event input using PySEP
When using this option, the event parameters inside the config file will be ignored, but all the other parameters will be used to gather data and metadata.
Event input files should be text files where each row describes one event with the following parameters as columns:
For an example event input file called ‘event_input.txt’, call structure is:
pysep -c pysep_config.yaml -E event_input.txt
ObsPy Mass Downloader¶
ObsPy’s Mass Download feature allows for large data downloads over all available data services. This may be useful if you don’t care where your data comes from and just want to download all data available.
Note
This ignores the client parameter and downloads waveform and station metadata for all available data services.
To use the mass download option from the command line, you will first need to add the following parameter to your YAML config file:
use_mass_download: true
Then from the command line, you can run things as normal, PySEP will know to use the mass download option to grab waveform and station metadata. There are two additional keyword arguments which you can provide from the command line.
- domain_type (str): Define the search region domain as
- rectangular: rectangular bounding box defined by minlatitude,
minlongitude, maxlatitude and maxlongitude
circular: circular bounding circle defined by the event_latitude, event_longitude and min and max radii defined by mindistance_km and maxdistance_km
delete_tmpdir (bool): Removes the temporary directories that store the MSEED and StationXML files which were downloaded by the mass downloader. Saves space but also if anything fails prior to saving data, the downloaded data will not be saved. Defaults to True.
pysep -c config.yaml --domain_type circular --delete_tmpdir False
Or with scripting
sep = Pysep(config_file="config.yaml", use_mass_download=True,
domain_type="circular", delete_tmpdir=False, ...)
Select Trace ID Input¶
It may be useful to not download data for all possible combinations of network, station, location and channel, but rather to gather data by trace id only.
PySEP accepts a list of station IDs under the parameter station_ids, which allows users to selectively gather data and metadata for a particular event.
Note
Using the station_ids option will ignore any values provided to network, station, location and channel. Trace IDs must be in the form: ‘NN.SSS.LL.CCC’
Note
Select trace ID input is only available through scripting. If this is desirable as a command line input, please open up a GitHub issue.
from pysep import Pysep
station_ids = ["II.DAV.00.LHZ", "IU.XMAS.*.LHZ", "IU.SDV.10.LHZ"]
sep = Pysep(station_ids=station_ids, ...)