Abstract¶
LSST Data Management’s verification program ensures that code meets
performance specifications during construction. During operations,
verification continues with an additional focus towards ensuring that
data releases meet science requirements. To facilitate verification
activities, we are introducing the LSST Verification Framework. This
framework is implemented in the lsst.verify
Python package,
available at https://github.com/lsst/verify.
This technical note introduces the framework’s concepts and usage patterns through a working tutorial. First, this tutorial demonstrates how new metrics (observable concepts) and specifications (requirements and milestones that metric measurements should meet) are created. Then we measure metrics, using both a lightweight approach that is easy to retrofit into LSST Science Pipelines Tasks and a second more rigorous measurement approach that enables detailed diagnostics. Finally, this tutorial shows how metric measurements can be analyzed in a Jupyter notebook environment.
Set up¶
This technical note is available as a Jupyter notebook from its GitHub
repository: https://github.com/lsst-sqre/sqr-019. You are encouraged to
run and modify this notebook to help you learn about the lsst.verify
package. This section covers the dependencies needed to run this
notebook.
First, install the LSST Science Pipelines with
lsstsw. Specifically,
build and setup
the verify
package:
rebuild verify
setup verify
The ``verify`` package is not yet distributed with the LSST Science Pipelines Stack as of this writing. lsstsw is the most convenient means of installing ``verify`` from scratch.
Next, install these packages with Anaconda:
conda install jupyter pandas bokeh
These additional packages are used for this technote note, but are not
required by lsst.verify
itself.
These are the Python imports needed for this technical note:
In [1]:
# Standard library and third party packages used by this notebook
import json
import os
from tempfile import TemporaryDirectory
import astropy.units as u
import numpy as np
import yaml
# For demonstration plots
from bokeh.charts import Scatter
from bokeh.models import Range1d, Span
from bokeh.layouts import row
from bokeh.io import show, output_notebook
from bokeh.plotting import figure
import pandas
# Load Bokeh
output_notebook()
In [2]:
# The Verification Framework itself
import lsst.verify
Introduction¶
lsst.verify is the new framework for making verification measurements in the LSST Science Pipelines. Verification is an activity where well-defined quantities, called metrics, are measured to ensure that LSST’s data and pipelines meet requirements, which we call specifications.
You might be familar with
validate_drp
[DMTN-008]. That package currently
measures metrics of ProcessCcdTask
outputs and posts results to
SQUASH
[SQR-009]. By tracking metric
measurements we are able to understand trends in the algorithmic
performance of the LSST Science Pipelines, and ultimately verify that we
will meet our requirements.
With lsst.verify
we sought to generalize the process of defining
metrics, measuring those metrics, and tracking those measurements in
SQUASH. Rather than supporting only specially-designed verification
afterburner Tasks, our goal is to empower developers to track
performance metrics of their own specific pipeline Tasks. By defining
metrics relevant to specific Tasks, verification becomes a highly
relevant integration testing activity for day-to-day pipelines
development. The lsst.verify
design is described in
SQR-017.
This tutorial demonstrates key features and patterns in the
lsst.verify
framework, from defining metrics and specifications, to
making measurements, to analyzing and summarizing performance.
Defining metrics¶
Metrics are definitions of measurable things that you want to track. A measureable thing could be anything: the \(\chi^2\) of a fit, the number of sources identified and measured, or even the latency or memory usage of a function.
In the verification framework, all metrics are centrally defined in the
verify_metrics package. To
define a new metric, simply add or modify a YAML file in the
/metrics
directory of
verify_metrics. Each Stack
package that measures metrics has its own YAML definitions file
(jointcal.yaml
, validate_drp.yaml
, and so on).
SQUASH watches verify_metrics
so that when a metric is committed to
the GitHub repo is it also known to the SQUASH dashboard.
For this tutorial, we will create metrics for hypothetical demo1
and
demo2
packages.
First, content for a hypothetical /metrics/demo1.yaml
file:
In [3]:
demo1_metrics_yaml = """
ZeropointRMS:
unit: mmag
description: >
Photometric calibration RMS.
reference:
url: https://example.com/PhotRMS
tags:
- photometry
- demo
Completeness:
unit: mag
description: >
Magnitude of the catalog's 50% completeness limit.
reference:
url: https://example.com/Complete
tags:
- photometry
- demo
"""
This YAML defines two metrics: demo1.ZeropointRMS
and
demo1.Completeness
. A metric consists of:
- A name. Names are prefixed by name of the package that defines them.
- A description. This helps to document metrics, even if they are
more thoroughly defined in other documentation (see the
reference
field). - A unit. Units are
astropy.units
-compatible strings. Metrics of unitless quantities should use the dimensionless_unscaled unit, an empty string. - References. References can be made to URLs, or even to document
handles and page numbers. We may expand the
reference
field’s schema to accomodate formalize reference identifiers in the future. - Tags. These help us group metrics together in reports.
For the purposes of this demo, we’ll parse this YAML object into a
lsst.verify.MetricSet
collection. Normally this doesn’t need to be
done since metrics should be pre-defined in verify_metrics
, which
are automatically loaded as we’ll see later.
In [4]:
with TemporaryDirectory() as temp_dir:
demo1_metrics_path = os.path.join(temp_dir, 'demo1.yaml')
with open(demo1_metrics_path, mode='w') as f:
f.write(demo1_metrics_yaml)
demo_metrics = lsst.verify.MetricSet.load_single_package(demo1_metrics_path)
demo_metrics
Out[4]:
Name | Description | Units | Reference | Tags |
---|---|---|---|---|
str18 | str50 | str15 | str28 | str16 |
demo1.Completeness | Magnitude of the catalog's 50% completeness limit. | $\mathrm{mag}$ | https://example.com/Complete | demo, photometry |
demo1.ZeropointRMS | Photometric calibration RMS. | $\mathrm{mmag}$ | https://example.com/PhotRMS | demo, photometry |
Defining metric specifications¶
Specifications are tests of metric measurements. A specification can be thought of as a milestone; if a measurement passes a specification then data and code are working as expected.
Like metrics, specifications are usually defined centrally in the
verify_metrics repository.
Specifications for each package are defined in one or more YAML files in
the specs
subdirectory of verify_metrics
. See the validate_drp
directory for an
example.
Here is a typical specification written in YAML; in this case for the
demo1.ZeropointRMS
metric:
name: "minimum"
metric: "ZeropointRMS"
threshold:
operator: "<="
unit: "mmag"
value: 20.0
tags:
- "minimum"
The fully-qualified name for this specification is
demo1.ZeropointRMS.minimum
, following a
{package}.{metric}.{spec_name}
format. Specification names should be
unique, but otherwise can be anything. The Verification Framework does
not place special meaning on “minimum,” “design,” and “stretch”
specifications. Instead, we recommend that you use tags to designate
specifications with operational meaning.
The core of a specification is its test. The
demo1.ZeropointRMS.minimum
specification defines its test in the
threshold
YAML field. Here, a measurement passes the specification
if \(\mathrm{measurement} \leq 20.0~\mathrm{mmag}\).
We envision other types of specifications beyond thresholds (binary comparisions). Possibilities include ranges and tolerances.
Metadata queries: making specifications only act upon certain measurements¶
Often you’ll make measurements of a metric in many contexts: with different datasets, from different cameras, in different filters, and so on. A specification we define for one measurement context might not be relevant for other contexts. LPM-17, for example, does this frequently by defining different specifications for \(gri\) datasets than \(uzy\). To prevent false alerts, the Verification Framework allows you to define criteria for when a specification applies to a measurement.
Originally we indended to leverage the provenance of a pipeline execution. Provenance, in general, fully describes the environment of the pipeline run, the datasets that were processed and produced, and the pipeline configuration. We envisioned that specifications might query the provenance of a metric measurement to determine if the specification is applicable. While this is our long-term design intent, a comprehensive pipeline provenance framework does not exist.
To shim the provenance system’s functionality, the Verification Framework introduces a complementary concept called job metadata. Whereas provenance is passively gathered during pipeline execution, metadata is explicitly added by pipeline developers and operators. Metadata could be a task configuration, filter name, dataset name, or any state known during a Task’s execution.
For example, suppose that a specification only applies to CFHT/MegaCam
datasets in the \(r\)-band. This requirement is written into the
specification’s definition with a metadata_query
field:
name: "minimum_megacam_r"
metric: "ZeropointRMS"
threshold:
operator: "<="
unit: "mmag"
value: 20.0
tags:
- "minimum"
metadata_query:
camera: "megacam"
filter_name: "r"
If a job has metadata with matching camera
and filter_name
fields, the specification applies:
{
'camera': 'megacam',
'filter_name': 'r'
'dataset_repo': 'https://github.com/lsst/ci_cfht.git'
}
On the other hand, if a job has metadata that is either missing fields, or has conflicting values, the specification does not apply:
{
'filter_name': 'i'
'dataset_repo': 'https://github.com/lsst/ci_cfht.git'
}
Specification inheritance¶
Metadata queries help us write specifications that monitor precisely the pipeline runs we are interested in, with test criteria that make sense. But this also means that we are potentially writing many more specifications for each metric. Most specifications for a given metric share common characteristics, such as units, threshold operators, the metric name, and even some base metadata query terms. To write specifications without repeating outself, we can take advantage of specification inheritance.
As an example, let’s write a basic specification in YAML for the
demo1.ZeropointRMS
metric, and write another specification that is
customized for CFHT/MegaCam \(r\)-band data:
In [5]:
zeropointrms_specs_yaml = """
---
name: "minimum"
metric: "ZeropointRMS"
threshold:
operator: "<="
unit: "mmag"
value: 20.0
tags:
- "minimum"
---
name: "minimum_megacam_r"
base: ["ZeropointRMS.minimum"]
threshold:
value: 15.0
metadata_query:
camera: "megacam"
filter_name: "r"
"""
with TemporaryDirectory() as temp_dir:
# Write YAML to disk, emulating the verify_metrics package for this demo
specs_dirname = os.path.join(temp_dir, 'demo1')
os.makedirs(specs_dirname)
demo1_specs_path = os.path.join(specs_dirname, 'zeropointRMS.yaml')
with open(demo1_specs_path, mode='w') as f:
f.write(zeropointrms_specs_yaml)
# Parse the YAML into a set of Specification objects
demo1_specs = lsst.verify.SpecificationSet.load_single_package(specs_dirname)
demo1_specs
Out[5]:
Name | Test | Tags |
---|---|---|
str36 | str27 | str7 |
demo1.ZeropointRMS.minimum | $x$ <= 20.0 $\mathrm{mmag}$ | minimum |
demo1.ZeropointRMS.minimum_megacam_r | $x$ <= 15.0 $\mathrm{mmag}$ | minimum |
The demo1.ZeropointRMS.minimum_megacam_r
specification indicates
that it inherits from demo1.ZeropointRMS.minium
by referencing it in
the base
field.
With inheritance, demo1.ZeropointRMS.minimum_megacam_r
includes all
fields defined in its base, adds new fields, and overrides values.
Notice how the threshold has changed from 20.0 mmag, to 15.0 mmag.
Specification partials for even more composable specifications¶
Suppose we want to create specifications for many metrics that apply to
the megacam
camera. Specification inheritance doesn’t help because
we need to repeat the metadata query for each metric:
---
# Base specification: demo1.ZeropointRMS.minimum
name: "minimum"
metric: "ZeropointRMS"
threshold:
operator: "<="
unit: "mmag"
value: 20.0
tags:
- "minimum"
---
# Base specification: demo1.Completeness.minimum
name: "minimum"
metric: "Completeness"
threshold:
operator: ">="
unit: "mag"
value: 20.0
tags:
- "minimum"
---
# A demo1.ZeropointRMS specification targetting MegaCam r-band
name: "minimum_megacam_r"
base: ["ZeropointRMS.minimum"]
threshold:
value: 15.0
metadata_query:
camera: "megacam"
filter_name: "r"
---
# A demo1.CompletenessRMS specification targetting MegaCam r-band
name: "minimum_megacam_r"
base: ["Completeness.minimum"]
threshold:
value: 24.0
metadata_query:
camera: "megacam"
filter_name: "r"
To avoid duplicating metadata_query
information for all MegaCam
\(r\)-band specifications across many metrics, we can extract that
information into a partial. Partials are formatted like
specifications, but are never parsed as stand-alone specifiations. That
means a partial can, as the name implies, define common partial
information that can be mixed into many specifications.
Here’s the same example as before, but written with a #megacam_r
partial:
---
# Partial for MegaCam r-band specifications
id: "megacam-r"
metadata_query:
camera: "megacam"
filter_name: "r"
---
# Base specification: demo1.ZeropointRMS.minimum
name: "minimum"
metric: "ZeropointRMS"
threshold:
operator: "<="
unit: "mmag"
value: 20.0
tags:
- "minimum"
---
# Base specification: demo1.Completeness.minimum
name: "minimum"
metric: "Completeness"
threshold:
operator: ">="
unit: "mag"
value: 20.0
tags:
- "minimum"
---
# A demo1.ZeropointRMS specification targetting MegaCam r-band
name: "minimum_megacam_r"
base: ["ZeropointRMS.minimum", "#megacam-r"]
threshold:
value: 15.0
---
# A demo1.Completeness specification targetting MegaCam r-band
name: "minimum_megacam_r"
base: ["Completeness.minimum", "#megacam-r]
threshold:
value: 24.0
As you can see, we’ve added the megacam-r
partial to the inheritance
chain defined in the base
fields. The
demo1.ZeropointRMS.minimum_megacam_r
and
demo1.Completeness.minimum_megacam_r
specifications inherit from
both specifications and the #megacam-r
partial. The #
prefix
implies a partial, not a specification. It’s also possible to reference
partials in other YAML files, see the validate_drp
specifications
for an example.
Inheritance is evaluated left to right. For example,
demo1.Completeness.minimum_megacam_r
is built up in this order:
- Use the
demo1.Completeness.minimum
specification. - Override with information from
#megacam-r
. - Override with information from the
demo1.Completeness.minimum_megacam_r
specification’s own YAML fields.
Specifications: putting it all together¶
We’ve seen how to write specification metrics in YAML, and how to write
them more efficiently with inheritance and partials. Now let’s write out
a full specification set, like we might in verify_metrics
:
In [6]:
demo1_specs_yaml = """
# Partials that define metadata queries
# for pipeline execution contexts with
# MegaCam r and u-band data, or HSC r-band.
---
id: "megacam-r"
metadata_query:
camera: "megacam"
filter_name: "r"
---
id: "megacam-u"
metadata_query:
camera: "megacam"
filter_name: "u"
---
id: "hsc-r"
metadata_query:
camera: "hsc"
filter_name: "r"
# We'll also write partials for each metric,
# that set up the basic test. Alternatively
# we could create full specifications to
# inherit from for each camera.
---
id: "ZeropointRMS"
metric: "demo1.ZeropointRMS"
threshold:
operator: "<="
unit: "mmag"
---
id: "Completeness"
metric: "demo1.Completeness"
threshold:
operator: ">="
unit: "mag"
# Partials to tag specifications as
# "minimum" requirements or "stretch
# goals"
---
id: "tag-minimum"
tags:
- "minimum"
---
id: "tag-stretch"
tags:
- "stretch"
# ZeropointRMS specifications
# tailored for each camera, in
# minimum and stretch goal variants.
---
name: "minimum_megacam_r"
base: ["#ZeropointRMS", "#megacam-r", "#tag-minimum"]
threshold:
value: 15.0
---
name: "stretch_megacam_r"
base: ["#ZeropointRMS", "#megacam-r", "#tag-stretch"]
threshold:
value: 10.0
---
name: "minimum_megacam_u"
base: ["#ZeropointRMS", "#megacam-u", "#tag-minimum"]
threshold:
value: 30.0
---
name: "stretch_megacam_u"
base: ["#ZeropointRMS", "#megacam-u", "#tag-stretch"]
threshold:
value: 20.0
---
name: "minimum_hsc_r"
base: ["#ZeropointRMS", "#hsc-r", "#tag-minimum"]
threshold:
value: 12.0
---
name: "stretch_hsc_r"
base: ["#ZeropointRMS", "#hsc-r", "#tag-stretch"]
threshold:
value: 6.0
# Competeness specifications,
# tailored for each camera in
# minimum and stretch goal variants
---
name: "minimum_megacam_r"
base: ["#Completeness", "#megacam-r", "#tag-minimum"]
threshold:
value: 24.0
---
name: "stretch_megacam_r"
base: ["#Completeness", "#megacam-r", "#tag-stretch"]
threshold:
value: 26.0
---
name: "minimum_megacam_u"
base: ["#Completeness", "#megacam-u", "#tag-minimum"]
threshold:
value: 20.0
---
name: "stretch_megacam_u"
base: ["#Completeness", "#megacam-u", "#tag-stretch"]
threshold:
value: 24.0
---
name: "minimum_hsc_r"
base: ["#Completeness", "#hsc-r", "#tag-minimum"]
threshold:
value: 20.0
---
name: "stretch_hsc_r"
base: ["#Completeness", "#hsc-r", "#tag-stretch"]
threshold:
value: 28.0
"""
with TemporaryDirectory() as temp_dir:
# Write YAML to disk, emulating the verify_metrics package for this demo
specs_dirname = os.path.join(temp_dir, 'demo1')
os.makedirs(specs_dirname)
demo1_specs_path = os.path.join(specs_dirname, 'demo1.yaml')
with open(demo1_specs_path, mode='w') as f:
f.write(demo1_specs_yaml)
# Parse the YAML into a set of Specification objects
demo_specs = lsst.verify.SpecificationSet.load_single_package(specs_dirname)
demo_specs
Out[6]:
Name | Test | Tags |
---|---|---|
str36 | str27 | str7 |
demo1.Completeness.minimum_hsc_r | $x$ >= 20.0 $\mathrm{mag}$ | minimum |
demo1.Completeness.minimum_megacam_r | $x$ >= 24.0 $\mathrm{mag}$ | minimum |
demo1.Completeness.minimum_megacam_u | $x$ >= 20.0 $\mathrm{mag}$ | minimum |
demo1.Completeness.stretch_hsc_r | $x$ >= 28.0 $\mathrm{mag}$ | stretch |
demo1.Completeness.stretch_megacam_r | $x$ >= 26.0 $\mathrm{mag}$ | stretch |
demo1.Completeness.stretch_megacam_u | $x$ >= 24.0 $\mathrm{mag}$ | stretch |
demo1.ZeropointRMS.minimum_hsc_r | $x$ <= 12.0 $\mathrm{mmag}$ | minimum |
demo1.ZeropointRMS.minimum_megacam_r | $x$ <= 15.0 $\mathrm{mmag}$ | minimum |
demo1.ZeropointRMS.minimum_megacam_u | $x$ <= 30.0 $\mathrm{mmag}$ | minimum |
demo1.ZeropointRMS.stretch_hsc_r | $x$ <= 6.0 $\mathrm{mmag}$ | stretch |
demo1.ZeropointRMS.stretch_megacam_r | $x$ <= 10.0 $\mathrm{mmag}$ | stretch |
demo1.ZeropointRMS.stretch_megacam_u | $x$ <= 20.0 $\mathrm{mmag}$ | stretch |
More metrics and specifications for the demo2 package¶
All the metrics we’ve created have been associated with the hypothetical “demo1” pipeline package. Let’s quickly create another set of metrics and specifications for a “demo2” pipeline package, which we’ll use later. This is an opportunity to show that metrics and specifications can be created dynamically in Python too.
In [7]:
sourcecount_metric = lsst.verify.Metric(
'demo2.SourceCount',
"Number of matched sources.",
unit=u.dimensionless_unscaled,
tags=['demo'])
demo_metrics.insert(sourcecount_metric)
print(demo_metrics['demo2.SourceCount'])
demo2.SourceCount (dimensionless_unscaled): Number of matched sources.
Notice that demo2.SourceCount
is just a count; it doesn’t have
physical units. We designated this type of unit with Astropy’s
astropy.units.dimensionless_unscaled
unit. Its string form is an empty string:
In [8]:
u.dimensionless_unscaled == u.Unit('')
Out[8]:
True
Next, we’ll create complementary specifications:
In [9]:
sourcecount_minimum_spec = lsst.verify.ThresholdSpecification(
'demo2.SourceCount.minimum_cfht_r',
250 * u.dimensionless_unscaled,
'>=',
tags=['minimum'],
metadata_query={'camera': 'megacam', 'filter_name': 'r'}
)
demo_specs.insert(sourcecount_minimum_spec)
sourcecount_stretch_spec = lsst.verify.ThresholdSpecification(
'demo2.SourceCount.stretch_cfht_r',
500 * u.dimensionless_unscaled,
'>=',
tags=['stretch'],
metadata_query={'camera': 'megacam', 'filter_name': 'r'}
)
demo_specs.insert(sourcecount_stretch_spec)
That’s it. We now have a set of metrics and specifications defined for
two packages, demo1
and demo2
. Here are the metrics in full:
In [10]:
demo_metrics
Out[10]:
Name | Description | Units | Reference | Tags |
---|---|---|---|---|
str18 | str50 | str15 | str28 | str16 |
demo1.Completeness | Magnitude of the catalog's 50% completeness limit. | $\mathrm{mag}$ | https://example.com/Complete | demo, photometry |
demo1.ZeropointRMS | Photometric calibration RMS. | $\mathrm{mmag}$ | https://example.com/PhotRMS | demo, photometry |
demo2.SourceCount | Number of matched sources. | $\mathrm{}$ | demo |
And the specifications in full:
In [11]:
demo_specs
Out[11]:
Name | Test | Tags |
---|---|---|
str36 | str27 | str7 |
demo1.Completeness.minimum_hsc_r | $x$ >= 20.0 $\mathrm{mag}$ | minimum |
demo1.Completeness.minimum_megacam_r | $x$ >= 24.0 $\mathrm{mag}$ | minimum |
demo1.Completeness.minimum_megacam_u | $x$ >= 20.0 $\mathrm{mag}$ | minimum |
demo1.Completeness.stretch_hsc_r | $x$ >= 28.0 $\mathrm{mag}$ | stretch |
demo1.Completeness.stretch_megacam_r | $x$ >= 26.0 $\mathrm{mag}$ | stretch |
demo1.Completeness.stretch_megacam_u | $x$ >= 24.0 $\mathrm{mag}$ | stretch |
demo1.ZeropointRMS.minimum_hsc_r | $x$ <= 12.0 $\mathrm{mmag}$ | minimum |
demo1.ZeropointRMS.minimum_megacam_r | $x$ <= 15.0 $\mathrm{mmag}$ | minimum |
demo1.ZeropointRMS.minimum_megacam_u | $x$ <= 30.0 $\mathrm{mmag}$ | minimum |
demo1.ZeropointRMS.stretch_hsc_r | $x$ <= 6.0 $\mathrm{mmag}$ | stretch |
demo1.ZeropointRMS.stretch_megacam_r | $x$ <= 10.0 $\mathrm{mmag}$ | stretch |
demo1.ZeropointRMS.stretch_megacam_u | $x$ <= 20.0 $\mathrm{mmag}$ | stretch |
demo2.SourceCount.minimum_cfht_r | $x$ >= 250.0 $\mathrm{}$ | minimum |
demo2.SourceCount.stretch_cfht_r | $x$ >= 500.0 $\mathrm{}$ | stretch |
Of course, these examples are contrived for this tutorial. Normally metrics and specifications aren’t defined in notebooks or code, but with a pull request to the verify_metrics GitHub repository.
Making measurements¶
Now that we’ve defined metrics, we can measure them. Measurements happen in Pipelines code, either within regular Tasks, or in dedicated afterburner Tasks.
The Verification Framework provides two patterns for making
measurements: either using the full measurement API, or a more
lightweight capture of measurement quantities. For the demo1
package
we’ll use the more comprehensive approach, and then make lightweight
measurements for the demo2
package.
Measuring ZeropointRMS¶
In our Task, we might have arrays of matched photometry and catalogs stars with known photometry:
In [12]:
catalog_mags = np.random.uniform(18, 26, size=100)*u.mag
obs_mags = catalog_mags - 25*u.mag + np.random.normal(scale=12.0, size=100)*u.mmag
From these the task might estimate a zeropoint:
In [13]:
zp = np.median(catalog_mags - obs_mags)
And a scatter:
In [14]:
zp_rms = np.std(catalog_mags - obs_mags)
zp_rms
is a measurement of the demo1.ZeropointRMS
metric that
we’d like to capture. Let’s create a lsst.verify.Measurement
object
to do that:
In [15]:
zp_meas = lsst.verify.Measurement('demo1.ZeropointRMS', zp_rms)
We’ve captured the measurement, but there’s more information that will be useful for later understanding the measurement. These additional data are called measurement extras:
In [16]:
zp_meas.extras['zp'] = lsst.verify.Datum(zp, label="m_0",
description="Estimated zeropoint.")
zp_meas.extras['catalog_mags'] = lsst.verify.Datum(catalog_mags, label="m_cat",
description="Catalog magnitudes.")
zp_meas.extras['obs_mags'] = lsst.verify.Datum(obs_mags, label="m_obs",
description="Instrument magnitudes.")
The Datum
objects act as wrappers for information, like Astropy
quantities, that adds plotting labels and descriptions to help document
our datasets.
In a Task, we might want to add annotations about the Task’s configuration. These annotations will be added to the metadata of the pipeline execution. For example, this is an annotation of the function used to estimate the RMS:
In [17]:
zp_meas.notes['estimator'] = 'numpy.std'
Measuring Completeness¶
Our task also measures photometric completeness. Let’s making another
Measurement
to record this metric measurement, along with extras:
In [18]:
# Here's a mock dataset
mag_grid = np.linspace(22, 28, num=50, endpoint=True)
c_percent = 1. / np.cosh((mag_grid - mag_grid.min()) / 2.) * 100.
# Make the measurement
completeness_mag = np.interp(50.0, c_percent[::-1], mag_grid[::-1]) * u.mag
# Package the measurement
completeness_meas = lsst.verify.Measurement(
'demo1.Completeness',
completeness_mag,
)
completeness_meas.extras['mag_grid'] = lsst.verify.Datum(
mag_grid*u.mag,
label="m",
description="Magnitude")
completeness_meas.extras['c_frac'] = lsst.verify.Datum(
c_percent*u.percent,
label="C",
description="Photometric catalog completeness.")
Packaging measurements in a Verification Job¶
In the Verification Framework, a “job” is a pipeline run that produces
metric measurements. The lsst.verify.Job
class allows us to package
several measurements from the pipeline run. With a Job
object, we
can then analyze the measurements, save verification datasets to disk,
and dispatch datasets to the SQUASH database.
Normally when we create a Job
object from scratch we seed it with
the metrics and specifications defined in the verify_metrics
repo:
In [19]:
job = lsst.verify.Job.load_metrics_package()
Of course, we created ad hoc metrics and specifications outside of
verify_metrics
. We can add those to the job
:
In [20]:
job.metrics.update(demo_metrics)
job.specs.update(demo_specs)
Now add the measurements:
In [21]:
job.measurements.insert(zp_meas)
job.measurements.insert(completeness_meas)
The pipeline Tasks that is making this Job knows about the camera and filter of the dataset. The Task code can record this metadata:
In [22]:
job.meta.update({'camera': 'megacam', 'filter_name': 'r'})
Job metadata is a dict
-like mapping. Here’s the full set of metadata
recorded for the job
:
In [23]:
print(job.meta)
{
"camera": "megacam",
"demo1.ZeropointRMS.estimator": "numpy.std",
"filter_name": "r"
}
As expected, the camera
and filter_name
is present, but so is
the estimator
annotation that we attached to the
demo1.ZeropointRMS
measurement. Measurement annotations are
automatically included in a Job’s metadata, but keys are prefixed with
the measurement’s metric name. Specification metadata_query
definitions can act on both job and measurement-level metadata.
Before a Task exits, it should write the verification Job dataset to
disk. Serialization to disk is a temporary shim until Job
datasets
can be persisted through the Butler.
The native serialization format of the Verification Framework is JSON:
In [24]:
job.write('demo1.verify.json')
Making lightweight quantity-only measurements with output_quantities()¶
lsst.verify.Measurement
and lsst.verify.Job
classes are
necessary for producing rich job datasets (for example, associating
extras with measurements). Many Tasks, though, won’t need this
functionality. A Task might record a measurement as an Astropy quantity
and persist that measurement with as little overhead as possible. The
lsst.verify.output_quantities
function enables this use case.
First, a Task will create a dictionary to collect measurements throughout the lifetime of the Task’s execution:
In [25]:
demo2_measurements = {}
Then the task measures the demo2.SourceCount
metric:
In [26]:
demo2_measurements['demo2.SourceCount'] = 350*u.dimensionless_unscaled
Measurements are always Astropy quantities.
Finally, before the Task returns, it can output measurements to disk.
The default filename format for the Verification job dataset file is
{package}.verify.json
.
In [27]:
lsst.verify.output_quantities('demo2', demo2_measurements)
Out[27]:
'demo2.verify.json'
Post processing verification jobs¶
Our hypothetical pipeline has produced measurements for two packages:
demo1
and demo2
. These measurements are persisted to
demo1.verify.json
and demo2.verify.json
files on disk. Now we’d
like to gather these measurements and either submit them to the SQUASH
dashboard, or collate the measurements for local analysis.
The dispatch_verify.py
tool lets us do this. For this demo we won’t
upload measurements to SQUASH. Instead we will combine the mesurements
into one JSON file.
In [28]:
%%bash
export DYLD_LIBRARY_PATH=$LSST_LIBRARY_PATH
dispatch_verify.py --test --ignore-lsstsw --write demo.verify.json demo1.verify.json demo2.verify.json
verify.bin.dispatchverify.main INFO: Loading demo1.verify.json
verify.bin.dispatchverify.main INFO: Loading demo2.verify.json
verify.bin.dispatchverify.main INFO: Merging verification Job JSON.
verify.bin.dispatchverify.main INFO: Refreshing metric definitions from verify_metrics
verify.bin.dispatchverify.main INFO: Writing Job JSON to demo.verify.json.
The flags used here are:
--test
: preventsdispatch_verify.py
from attempting to upload to the SQUASH service.--ignore-lsstsw
: since the$LSSTSW
environment variable may not be available in this notebook context, we’ll avoid scraping it for information (such as Git commits and branches of packages included in the Pipeline stack).--write demo.verify.json
: Write the merged job dataset todemo.verify.json
.demo1.verify.json
anddemo2.verify.json
are inputs, as positional arguments, pointing to the job JSON files that we created earlier with metric measurements.
See dispatch_verify.py --help
for more information.
Analyze verification results locally¶
For code development, it’s convenient to look at the results of verification measurements locally, rather than in SQUASH. The Verification Framework is designed for this workflow, with special affordances for Jupyter Notebook users.
The collated measurement dataset produced by dispatch_verify.py
earlier is in the file demo.verify.json
. Let’s open this dataset
using the Job.deserialize
class method:
In [29]:
with open('demo.verify.json') as f:
job = lsst.verify.Job.deserialize(**json.load(f))
Making reports¶
With a job dataset, we can make a report that summarizes the pass/fail
status of specifications that have a corresponding measurement. Reports,
lsst.verify.Report
instances, are thin wrappers around Astropy
Tables, and look great in Jupyter Notebooks:
In [30]:
job.report().show()
Out[30]:
Status | Specification | Measurement | Test | Metric Tags | Spec. Tags |
---|---|---|---|---|---|
✅ | demo1.Completeness.minimum_megacam_r | 24.6 $\mathrm{mag}$ | $x$ >= 24.0 $\mathrm{mag}$ | demo, photometry | minimum |
❌ | demo1.Completeness.stretch_megacam_r | 24.6 $\mathrm{mag}$ | $x$ >= 26.0 $\mathrm{mag}$ | demo, photometry | stretch |
✅ | demo1.ZeropointRMS.minimum_megacam_r | 13.0 $\mathrm{mmag}$ | $x$ <= 15.0 $\mathrm{mmag}$ | demo, photometry | minimum |
❌ | demo1.ZeropointRMS.stretch_megacam_r | 13.0 $\mathrm{mmag}$ | $x$ <= 10.0 $\mathrm{mmag}$ | demo, photometry | stretch |
✅ | demo2.SourceCount.minimum_cfht_r | 350.0 $\mathrm{}$ | $x$ >= 250.0 $\mathrm{}$ | demo | minimum |
❌ | demo2.SourceCount.stretch_cfht_r | 350.0 $\mathrm{}$ | $x$ >= 500.0 $\mathrm{}$ | demo | stretch |
Notice that the report only shows specification tests that are relevant to the measurements. Recall that the job metadata indicates these measurements are with CFHT/MegaCam in the \(r\)-band:
In [31]:
print(job.meta)
{
"camera": "megacam",
"demo1.ZeropointRMS.estimator": "numpy.std",
"filter_name": "r",
"packages": {}
}
Thus all the specifications having to do with HSC or the \(u\)-band aren’t tested because those tests are meaningless with the current measurements.
When there are many measurements and specifications, you might be more
interested in producing reports around specific topics. Such tailored
reports can be made by passing arguments to the Job.report
method.
For example, this is a report listing only demo1
package metrics:
In [32]:
job.report(name='demo1').show()
Out[32]:
Status | Specification | Measurement | Test | Metric Tags | Spec. Tags |
---|---|---|---|---|---|
✅ | demo1.Completeness.minimum_megacam_r | 24.6 $\mathrm{mag}$ | $x$ >= 24.0 $\mathrm{mag}$ | demo, photometry | minimum |
❌ | demo1.Completeness.stretch_megacam_r | 24.6 $\mathrm{mag}$ | $x$ >= 26.0 $\mathrm{mag}$ | demo, photometry | stretch |
✅ | demo1.ZeropointRMS.minimum_megacam_r | 13.0 $\mathrm{mmag}$ | $x$ <= 15.0 $\mathrm{mmag}$ | demo, photometry | minimum |
❌ | demo1.ZeropointRMS.stretch_megacam_r | 13.0 $\mathrm{mmag}$ | $x$ <= 10.0 $\mathrm{mmag}$ | demo, photometry | stretch |
And this report shows results for the demo1.ZeropointRMS
metrics:
In [33]:
job.report(name='demo1.ZeropointRMS').show()
Out[33]:
Status | Specification | Measurement | Test | Metric Tags | Spec. Tags |
---|---|---|---|---|---|
✅ | demo1.ZeropointRMS.minimum_megacam_r | 13.0 $\mathrm{mmag}$ | $x$ <= 15.0 $\mathrm{mmag}$ | demo, photometry | minimum |
❌ | demo1.ZeropointRMS.stretch_megacam_r | 13.0 $\mathrm{mmag}$ | $x$ <= 10.0 $\mathrm{mmag}$ | demo, photometry | stretch |
Recall that we added tags to the specifications to designate minimum
and stretch
goals, as in seen in the
demo1.ZeropointRMS.minimum_megacam_r
specification:
In [34]:
job.specs['demo1.ZeropointRMS.minimum_megacam_r'].tags
Out[34]:
{'minimum'}
We can tailor the report to show tests only against these minimum
specifications:
In [35]:
job.report(spec_tags=['minimum']).show()
Out[35]:
Status | Specification | Measurement | Test | Metric Tags | Spec. Tags |
---|---|---|---|---|---|
✅ | demo1.Completeness.minimum_megacam_r | 24.6 $\mathrm{mag}$ | $x$ >= 24.0 $\mathrm{mag}$ | demo, photometry | minimum |
✅ | demo1.ZeropointRMS.minimum_megacam_r | 13.0 $\mathrm{mmag}$ | $x$ <= 15.0 $\mathrm{mmag}$ | demo, photometry | minimum |
✅ | demo2.SourceCount.minimum_cfht_r | 350.0 $\mathrm{}$ | $x$ >= 250.0 $\mathrm{}$ | demo | minimum |
Notice that the spec_tags
argument takes a sequence of tags. Each
tag is treated as an AND
filter with the others. For example, there
are no specifications that are both minimum
and stretch
, so the
report is empty:
In [36]:
job.report(spec_tags=['minimum', 'stretch']).show()
Out[36]:
Status | Specification | Measurement | Test | Metric Tags | Spec. Tags |
---|
In addition to specification tags, you can filter by metric tags by
setting the metric_tags
argument.
Finally, these filters can be combined. For example, this report
summarizes specification tests for metrics from the demo1
package
against minimum
goals:
In [37]:
job.report(name='demo1', spec_tags=['minimum']).show()
Out[37]:
Status | Specification | Measurement | Test | Metric Tags | Spec. Tags |
---|---|---|---|---|---|
✅ | demo1.Completeness.minimum_megacam_r | 24.6 $\mathrm{mag}$ | $x$ >= 24.0 $\mathrm{mag}$ | demo, photometry | minimum |
✅ | demo1.ZeropointRMS.minimum_megacam_r | 13.0 $\mathrm{mmag}$ | $x$ <= 15.0 $\mathrm{mmag}$ | demo, photometry | minimum |
Data behind the measurements¶
Besides reports of specifications that were met or failed during a job, we’re also interested in the context of the measurements. What was the distribution of points? Where were sources on the detector? These questions cannot be answered by metrics, which are scalars by definition. But they might be answered by the blob datasets that accompany measurements.
Recall that during the demo1
measurements we added “extras,”
consisting of raw arrays of magnitudes, as well as the fitted zeropoint.
We can access these blob datasets and make plots for deeper
investigation.
First, we access the demo1.ZeropointRMS
metric measurement in the
job:
In [38]:
m = job.measurements['demo1.ZeropointRMS']
The extra data associated with the measurement are stored as key-value
items in the measurement’s extras
attribute:
In [39]:
list(m.extras.keys())
Out[39]:
['zp', 'obs_mags', 'catalog_mags']
For this tutorial we’ll use Bokeh to make interactive plots with this
data. Often it’s easiest to pack a Pandas DataFrame for plotting with
Bokeh. We’ll make the DataFrame from the Astropy Quantity
array,
accessed from the quantity
attributes of each item:
In [40]:
df = pandas.DataFrame({"obs_mags": m.extras['obs_mags'].quantity,
"catalog_mags": m.extras['catalog_mags'].quantity,
"delta_mags": m.extras['catalog_mags'].quantity - m.extras['obs_mags'].quantity})
These items, obs_mags
and catalog_mags
, are
lsst.verify.Datum
instances. Datum
objects allow us to pack
information with data, such as plot labels. Here we’ll use that metadata
to build plot labels:
In [41]:
# Scatter plot of observed vs. catalog stellar photometry
p = Scatter(df, x='obs_mags', y='catalog_mags',
title="Zeropoint stellar sample",
xlabel="{0.label} [{0.unit}]".format(m.extras['obs_mags']),
ylabel="{0.label} [{0.unit}]".format(m.extras['catalog_mags']),
plot_width=350, plot_height=350)
In [42]:
# Histogram of zeropoint estimates from individual matched stars.
# We're not using the Histogram Bokeh chart for some extra control.
hist_counts, hist_edges = np.histogram(df['delta_mags'], bins=10)
h = figure(tools="xpan, xwheel_zoom, reset",
active_scroll="xwheel_zoom",
y_range=(0, hist_counts.max()+2),
y_axis_label="Count",
x_axis_label="{0.label} - {1.label} [{0.unit}]".format(m.extras['obs_mags'], m.extras['catalog_mags']),
plot_width=350, plot_height=350)
# Draw histogram edges on the figure
h.quad(bottom=0,
left=hist_edges[:-1],
right=hist_edges[1:],
top=hist_counts,
color="lightblue",
line_color="#3A5785")
# Line at zeropoint estimate
span = Span(location=m.extras['zp'].quantity.value,
dimension='height', line_color="black",
line_dash='dashed', line_width=3)
h.add_layout(span)
In [43]:
# Plot side-by-side
show(row(p, h), notebook_handle=True)
Out[43]:
<Bokeh Notebook handle for In[43]>
The key to building useful plots is packing the right blob data with measurements to begin with. As you write your code, imagine what plots might usefully augment metric measurements.
Summary and outlook¶
This technical note has demonstrated the full usage cycle of
lsst.verify
:
- Defining metrics.
- Defining specifications of metrics.
- Measuring metrics.
- Associating extra datasets with measurements.
- Analyzing verification pipeline jobs, including building pass/fail reports and making plots.
We encourage Data Management engineers and scientists to consider how you might instrument your own code, particularly pipeline Tasks, with verification measurements. By systematically monitoring performance metrics in your code, you will gain a clearer picture of how code development is affecting your systems.
This technical note has only shown local usage patterns with the
lsst.verify
framework. We are integrating lsst.verify
with the
SQUASH dashboard application. With
SQUASH, your metric measurements are centrally available to the whole
organization. We believe that lsst.verify
and SQUASH will become an
everday service for DM developers to ensure that code contributions do
not introduce adverse performance side-effects across the Stack.
References¶
Astropy Collaboration et al (2013). Astropy: A community Python package for astronomy. A&A, 558, A33, 10.1051/0004-6361/201322068.
Fausti, Angelo (2016). SQUASH dashboard prototype. SQuaRE Technical Note SQR-009. https://sqr-009.lsst.io.
Ivezić, Željko, and The LSST Science Collaboration (2011). LSST Science Requirements Document. LPM-17. https://ls.st/LPM-17.
Parejko, John and Sick, Jonathan (2017). Validation Metrics Framework. SQuaRE Technical Note SQR-017. https://sqr-017.lsst.io.
Wood-Vasey, Michael (2016). Introducing validate_drp: Calculate SRD Key Performance Metrics for an output repository. Data Management Technical Note DMTN-008. https://dmtn-008.lsst.io.
In [ ]: