Abstract¶

LSST Data Management’s verification program ensures that code meets performance specifications during construction. During operations, verification continues with an additional focus towards ensuring that data releases meet science requirements. To facilitate verification activities, we are introducing the LSST Verification Framework. This framework is implemented in the lsst.verify Python package, available at https://github.com/lsst/verify.

This technical note introduces the framework’s concepts and usage patterns through a working tutorial. First, this tutorial demonstrates how new metrics (observable concepts) and specifications (requirements and milestones that metric measurements should meet) are created. Then we measure metrics, using both a lightweight approach that is easy to retrofit into LSST Science Pipelines Tasks and a second more rigorous measurement approach that enables detailed diagnostics. Finally, this tutorial shows how metric measurements can be analyzed in a Jupyter notebook environment.

Set up¶

This technical note is available as a Jupyter notebook from its GitHub repository: https://github.com/lsst-sqre/sqr-019. You are encouraged to run and modify this notebook to help you learn about the lsst.verify package. This section covers the dependencies needed to run this notebook.

First, install the LSST Science Pipelines with lsstsw. Specifically, build and setup the verify package:

rebuild verify
setup verify

The ``verify`` package is not yet distributed with the LSST Science Pipelines Stack as of this writing. lsstsw is the most convenient means of installing ``verify`` from scratch.

Next, install the following packages

pip install bokeh pandas

These additional packages are used for this technote note, but are not required by lsst.verify itself.

These are the Python imports needed for this technical note:

In [1]:

# Standard library and third party packages used by this notebook
import json
import os
from tempfile import TemporaryDirectory

import astropy.units as u
import numpy as np
import yaml

# For demonstration  plots
from bokeh.io import output_notebook
from bokeh.plotting import figure, show
from bokeh.models import Range1d, Span
from bokeh.layouts import row

import pandas

# Load Bokeh
output_notebook()

Loading BokehJS ...

In [2]:

# The Verification Framework itself
import lsst.verify

Introduction¶

lsst.verify is the new framework for making verification measurements in the LSST Science Pipelines. Verification is an activity where well-defined quantities, called metrics, are measured to ensure that LSST’s data and pipelines meet requirements, which we call specifications.

You might be familar with validate_drp [DMTN-008]. That package currently measures metrics of ProcessCcdTask outputs and posts results to SQUASH [SQR-009]. By tracking metric measurements we are able to understand trends in the algorithmic performance of the LSST Science Pipelines, and ultimately verify that we will meet our requirements.

With lsst.verify we sought to generalize the process of defining metrics, measuring those metrics, and tracking those measurements in SQUASH. Rather than supporting only specially-designed verification afterburner Tasks, our goal is to empower developers to track performance metrics of their own specific pipeline Tasks. By defining metrics relevant to specific Tasks, verification becomes a highly relevant integration testing activity for day-to-day pipelines development. The lsst.verify design is described in SQR-017.

This tutorial demonstrates key features and patterns in the lsst.verify framework, from defining metrics and specifications, to making measurements, to analyzing and summarizing performance.

Defining metrics¶

Metrics are definitions of measurable things that you want to track. A measureable thing could be anything: the $\chi^2$ of a fit, the number of sources identified and measured, or even the latency or memory usage of a function.

In the verification framework, all metrics are centrally defined in the verify_metrics package. To define a new metric, simply add or modify a YAML file in the /metrics directory of verify_metrics. Each Stack package that measures metrics has its own YAML definitions file (jointcal.yaml, validate_drp.yaml, and so on).

SQUASH watches verify_metrics so that when a metric is committed to the GitHub repo is it also known to the SQUASH dashboard.

For this tutorial, we will create metrics for hypothetical demo1 and demo2 packages.

First, content for a hypothetical /metrics/demo1.yaml file:

In [3]:

demo1_metrics_yaml = """
ZeropointRMS:
  unit: mmag
  description: >
    Photometric calibration RMS.
  reference:
    url: https://example.com/PhotRMS
  tags:
    - photometry
    - demo

Completeness:
  unit: mag
  description: >
    Magnitude of the catalog's 50% completeness limit.
  reference:
    url: https://example.com/Complete
  tags:
    - photometry
    - demo
"""

This YAML defines two metrics: demo1.ZeropointRMS and demo1.Completeness. A metric consists of:

A name. Names are prefixed by name of the package that defines them.
A description. This helps to document metrics, even if they are more thoroughly defined in other documentation (see the reference field).
A unit. Units are astropy.units-compatible strings. Metrics of unitless quantities should use the dimensionless_unscaled unit, an empty string.
References. References can be made to URLs, or even to document handles and page numbers. We may expand the reference field’s schema to accomodate formalize reference identifiers in the future.
Tags. These help us group metrics together in reports.

For the purposes of this demo, we’ll parse this YAML object into a lsst.verify.MetricSet collection. Normally this doesn’t need to be done since metrics should be pre-defined in verify_metrics, which are automatically loaded as we’ll see later.

In [5]:

with TemporaryDirectory() as temp_dir:
    demo1_metrics_path = os.path.join(temp_dir, 'demo1.yaml')
    with open(demo1_metrics_path, mode='w') as f:
        f.write(demo1_metrics_yaml)
    demo_metrics = lsst.verify.MetricSet.load_single_package(
        demo1_metrics_path)
demo_metrics

Out[5]:

Name	Description	Units	Reference	Tags
str18	str50	str15	str28	str16
demo1.Completeness	Magnitude of the catalog's 50% completeness limit.	$\mathrm{mag}$	https://example.com/Complete	demo, photometry
demo1.ZeropointRMS	Photometric calibration RMS.	$\mathrm{mmag}$	https://example.com/PhotRMS	demo, photometry

Defining metric specifications¶

Specifications are tests of metric measurements. A specification can be thought of as a milestone; if a measurement passes a specification then data and code are working as expected.

Like metrics, specifications are usually defined centrally in the verify_metrics repository. Specifications for each package are defined in one or more YAML files in the specs subdirectory of verify_metrics. See the validate_drp directory for an example.

Here is a typical specification written in YAML; in this case for the demo1.ZeropointRMS metric:

name: "minimum"
metric: "ZeropointRMS"
threshold:
  operator: "<="
  unit: "mmag"
  value: 20.0
tags:
  - "minimum"

The fully-qualified name for this specification is demo1.ZeropointRMS.minimum, following a {package}.{metric}.{spec_name} format. Specification names should be unique, but otherwise can be anything. The Verification Framework does not place special meaning on “minimum,” “design,” and “stretch” specifications. Instead, we recommend that you use tags to designate specifications with operational meaning.

The core of a specification is its test. The demo1.ZeropointRMS.minimum specification defines its test in the threshold YAML field. Here, a measurement passes the specification if $\mathrm{measurement} \leq 20.0~\mathrm{mmag}$.

We envision other types of specifications beyond thresholds (binary comparisions). Possibilities include ranges and tolerances.

Metadata queries: making specifications only act upon certain measurements¶

Often you’ll make measurements of a metric in many contexts: with different datasets, from different cameras, in different filters, and so on. A specification we define for one measurement context might not be relevant for other contexts. LPM-17, for example, does this frequently by defining different specifications for $gri$ datasets than $uzy$. To prevent false alerts, the Verification Framework allows you to define criteria for when a specification applies to a measurement.

Originally we indended to leverage the provenance of a pipeline execution. Provenance, in general, fully describes the environment of the pipeline run, the datasets that were processed and produced, and the pipeline configuration. We envisioned that specifications might query the provenance of a metric measurement to determine if the specification is applicable. While this is our long-term design intent, a comprehensive pipeline provenance framework does not exist.

To shim the provenance system’s functionality, the Verification Framework introduces a complementary concept called job metadata. Whereas provenance is passively gathered during pipeline execution, metadata is explicitly added by pipeline developers and operators. Metadata could be a task configuration, filter name, dataset name, or any state known during a Task’s execution.

For example, suppose that a specification only applies to CFHT/MegaCam datasets in the $r$-band. This requirement is written into the specification’s definition with a metadata_query field:

name: "minimum_megacam_r"
metric: "ZeropointRMS"
threshold:
  operator: "<="
  unit: "mmag"
  value: 20.0
tags:
  - "minimum"
metadata_query:
  camera: "megacam"
  filter_name: "r"

If a job has metadata with matching camera and filter_name fields, the specification applies:

{
  'camera': 'megacam',
  'filter_name': 'r'
  'dataset_repo': 'https://github.com/lsst/ci_cfht.git'
}

On the other hand, if a job has metadata that is either missing fields, or has conflicting values, the specification does not apply:

{
  'filter_name': 'i'
  'dataset_repo': 'https://github.com/lsst/ci_cfht.git'
}

Specification inheritance¶

Metadata queries help us write specifications that monitor precisely the pipeline runs we are interested in, with test criteria that make sense. But this also means that we are potentially writing many more specifications for each metric. Most specifications for a given metric share common characteristics, such as units, threshold operators, the metric name, and even some base metadata query terms. To write specifications without repeating outself, we can take advantage of specification inheritance.

As an example, let’s write a basic specification in YAML for the demo1.ZeropointRMS metric, and write another specification that is customized for CFHT/MegaCam $r$-band data:

In [6]:

zeropointrms_specs_yaml = """
---
name: "minimum"
metric: "ZeropointRMS"
threshold:
  operator: "<="
  unit: "mmag"
  value: 20.0
tags:
  - "minimum"

---
name: "minimum_megacam_r"
base: ["ZeropointRMS.minimum"]
threshold:
  value: 15.0
metadata_query:
  camera: "megacam"
  filter_name: "r"
"""

with TemporaryDirectory() as temp_dir:
    # Write YAML to disk, emulating the verify_metrics package for this demo
    specs_dirname = os.path.join(temp_dir, 'demo1')
    os.makedirs(specs_dirname)
    demo1_specs_path = os.path.join(specs_dirname, 'zeropointRMS.yaml')
    with open(demo1_specs_path, mode='w') as f:
        f.write(zeropointrms_specs_yaml)

    # Parse the YAML into a set of Specification objects
    demo1_specs = lsst.verify.SpecificationSet.load_single_package(
        specs_dirname)

demo1_specs

Out[6]:

Name	Test	Tags
str36	str27	str7
demo1.ZeropointRMS.minimum	$x$ <= 20.0 $\mathrm{mmag}$	minimum
demo1.ZeropointRMS.minimum_megacam_r	$x$ <= 15.0 $\mathrm{mmag}$	minimum

The demo1.ZeropointRMS.minimum_megacam_r specification indicates that it inherits from demo1.ZeropointRMS.minium by referencing it in the base field.

With inheritance, demo1.ZeropointRMS.minimum_megacam_r includes all fields defined in its base, adds new fields, and overrides values. Notice how the threshold has changed from 20.0 mmag, to 15.0 mmag.

Specification partials for even more composable specifications¶

Suppose we want to create specifications for many metrics that apply to the megacam camera. Specification inheritance doesn’t help because we need to repeat the metadata query for each metric:

---
# Base specification: demo1.ZeropointRMS.minimum
name: "minimum"
metric: "ZeropointRMS"
threshold:
  operator: "<="
  unit: "mmag"
  value: 20.0
tags:
  - "minimum"

---
# Base specification: demo1.Completeness.minimum
name: "minimum"
metric: "Completeness"
threshold:
  operator: ">="
  unit: "mag"
  value: 20.0
tags:
  - "minimum"

---
# A demo1.ZeropointRMS specification targetting MegaCam r-band
name: "minimum_megacam_r"
base: ["ZeropointRMS.minimum"]
threshold:
  value: 15.0
metadata_query:
  camera: "megacam"
  filter_name: "r"

---
# A demo1.CompletenessRMS specification targetting MegaCam r-band
name: "minimum_megacam_r"
base: ["Completeness.minimum"]
threshold:
  value: 24.0
metadata_query:
  camera: "megacam"
  filter_name: "r"

To avoid duplicating metadata_query information for all MegaCam $r$-band specifications across many metrics, we can extract that information into a partial. Partials are formatted like specifications, but are never parsed as stand-alone specifiations. That means a partial can, as the name implies, define common partial information that can be mixed into many specifications.

Here’s the same example as before, but written with a #megacam_r partial:

---
# Partial for MegaCam r-band specifications
id: "megacam-r"
metadata_query:
  camera: "megacam"
  filter_name: "r"

---
# Base specification: demo1.ZeropointRMS.minimum
name: "minimum"
metric: "ZeropointRMS"
threshold:
  operator: "<="
  unit: "mmag"
  value: 20.0
tags:
  - "minimum"

---
# Base specification: demo1.Completeness.minimum
name: "minimum"
metric: "Completeness"
threshold:
  operator: ">="
  unit: "mag"
  value: 20.0
tags:
  - "minimum"

---
# A demo1.ZeropointRMS specification targetting MegaCam r-band
name: "minimum_megacam_r"
base: ["ZeropointRMS.minimum", "#megacam-r"]
threshold:
  value: 15.0


---
# A demo1.Completeness specification targetting MegaCam r-band
name: "minimum_megacam_r"
base: ["Completeness.minimum", "#megacam-r]
threshold:
  value: 24.0

As you can see, we’ve added the megacam-r partial to the inheritance chain defined in the base fields. The demo1.ZeropointRMS.minimum_megacam_r and demo1.Completeness.minimum_megacam_r specifications inherit from both specifications and the #megacam-r partial. The # prefix implies a partial, not a specification. It’s also possible to reference partials in other YAML files, see the validate_drp specifications for an example.

Inheritance is evaluated left to right. For example, demo1.Completeness.minimum_megacam_r is built up in this order:

Use the demo1.Completeness.minimum specification.
Override with information from #megacam-r.
Override with information from the demo1.Completeness.minimum_megacam_r specification’s own YAML fields.

Specifications: putting it all together¶

We’ve seen how to write specification metrics in YAML, and how to write them more efficiently with inheritance and partials. Now let’s write out a full specification set, like we might in verify_metrics:

In [7]:

demo1_specs_yaml = """
# Partials that define metadata queries
# for pipeline execution contexts with
# MegaCam r and u-band data, or HSC r-band.

---
id: "megacam-r"
metadata_query:
  camera: "megacam"
  filter_name: "r"

---
id: "megacam-u"
metadata_query:
  camera: "megacam"
  filter_name: "u"

---
id: "hsc-r"
metadata_query:
  camera: "hsc"
  filter_name: "r"

# We'll also write partials for each metric,
# that set up the basic test. Alternatively
# we could create full specifications to
# inherit from for each camera.

---
id: "ZeropointRMS"
metric: "demo1.ZeropointRMS"
threshold:
  operator: "<="
  unit: "mmag"

---
id: "Completeness"
metric: "demo1.Completeness"
threshold:
  operator: ">="
  unit: "mag"

# Partials to tag specifications as
# "minimum" requirements or "stretch
# goals"
---
id: "tag-minimum"
tags:
  - "minimum"

---
id: "tag-stretch"
tags:
  - "stretch"

# ZeropointRMS specifications
# tailored for each camera, in
# minimum and stretch goal variants.

---
name: "minimum_megacam_r"
base: ["#ZeropointRMS", "#megacam-r", "#tag-minimum"]
threshold:
  value: 15.0

---
name: "stretch_megacam_r"
base: ["#ZeropointRMS", "#megacam-r", "#tag-stretch"]
threshold:
  value: 10.0

---
name: "minimum_megacam_u"
base: ["#ZeropointRMS", "#megacam-u", "#tag-minimum"]
threshold:
  value: 30.0

---
name: "stretch_megacam_u"
base: ["#ZeropointRMS", "#megacam-u", "#tag-stretch"]
threshold:
  value: 20.0

---
name: "minimum_hsc_r"
base: ["#ZeropointRMS", "#hsc-r", "#tag-minimum"]
threshold:
  value: 12.0

---
name: "stretch_hsc_r"
base: ["#ZeropointRMS", "#hsc-r", "#tag-stretch"]
threshold:
  value: 6.0

# Competeness specifications,
# tailored for each camera in
# minimum and stretch goal variants

---
name: "minimum_megacam_r"
base: ["#Completeness", "#megacam-r", "#tag-minimum"]
threshold:
  value: 24.0

---
name: "stretch_megacam_r"
base: ["#Completeness", "#megacam-r", "#tag-stretch"]
threshold:
  value: 26.0

---
name: "minimum_megacam_u"
base: ["#Completeness", "#megacam-u", "#tag-minimum"]
threshold:
  value: 20.0

---
name: "stretch_megacam_u"
base: ["#Completeness", "#megacam-u", "#tag-stretch"]
threshold:
  value: 24.0

---
name: "minimum_hsc_r"
base: ["#Completeness", "#hsc-r", "#tag-minimum"]
threshold:
  value: 20.0

---
name: "stretch_hsc_r"
base: ["#Completeness", "#hsc-r", "#tag-stretch"]
threshold:
  value: 28.0
"""

with TemporaryDirectory() as temp_dir:
    # Write YAML to disk, emulating the verify_metrics package for this demo
    specs_dirname = os.path.join(temp_dir, 'demo1')
    os.makedirs(specs_dirname)
    demo1_specs_path = os.path.join(specs_dirname, 'demo1.yaml')
    with open(demo1_specs_path, mode='w') as f:
        f.write(demo1_specs_yaml)

    # Parse the YAML into a set of Specification objects
    demo_specs = lsst.verify.SpecificationSet.load_single_package(
        specs_dirname)

demo_specs

Out[7]:

Name	Test	Tags
str36	str27	str7
demo1.Completeness.minimum_hsc_r	$x$ >= 20.0 $\mathrm{mag}$	minimum
demo1.Completeness.minimum_megacam_r	$x$ >= 24.0 $\mathrm{mag}$	minimum
demo1.Completeness.minimum_megacam_u	$x$ >= 20.0 $\mathrm{mag}$	minimum
demo1.Completeness.stretch_hsc_r	$x$ >= 28.0 $\mathrm{mag}$	stretch
demo1.Completeness.stretch_megacam_r	$x$ >= 26.0 $\mathrm{mag}$	stretch
demo1.Completeness.stretch_megacam_u	$x$ >= 24.0 $\mathrm{mag}$	stretch
demo1.ZeropointRMS.minimum_hsc_r	$x$ <= 12.0 $\mathrm{mmag}$	minimum
demo1.ZeropointRMS.minimum_megacam_r	$x$ <= 15.0 $\mathrm{mmag}$	minimum
demo1.ZeropointRMS.minimum_megacam_u	$x$ <= 30.0 $\mathrm{mmag}$	minimum
demo1.ZeropointRMS.stretch_hsc_r	$x$ <= 6.0 $\mathrm{mmag}$	stretch
demo1.ZeropointRMS.stretch_megacam_r	$x$ <= 10.0 $\mathrm{mmag}$	stretch
demo1.ZeropointRMS.stretch_megacam_u	$x$ <= 20.0 $\mathrm{mmag}$	stretch

More metrics and specifications for the demo2 package¶

All the metrics we’ve created have been associated with the hypothetical “demo1” pipeline package. Let’s quickly create another set of metrics and specifications for a “demo2” pipeline package, which we’ll use later. This is an opportunity to show that metrics and specifications can be created dynamically in Python too.

In [8]:

sourcecount_metric = lsst.verify.Metric(
    'demo2.SourceCount',
    "Number of matched sources.",
    unit=u.dimensionless_unscaled,
    tags=['demo'])
demo_metrics.insert(sourcecount_metric)

print(demo_metrics['demo2.SourceCount'])

demo2.SourceCount (dimensionless_unscaled): Number of matched sources.

Notice that demo2.SourceCount is just a count; it doesn’t have physical units. We designated this type of unit with Astropy’s astropy.units.dimensionless_unscaled unit. Its string form is an empty string:

In [9]:

u.dimensionless_unscaled == u.Unit('')

Out[9]:

True

Next, we’ll create complementary specifications:

In [10]:

sourcecount_minimum_spec = lsst.verify.ThresholdSpecification(
    'demo2.SourceCount.minimum_cfht_r',
    250 * u.dimensionless_unscaled,
    '>=',
    tags=['minimum'],
    metadata_query={
        'camera': 'megacam',
        'filter_name': 'r'
    })
demo_specs.insert(sourcecount_minimum_spec)

sourcecount_stretch_spec = lsst.verify.ThresholdSpecification(
    'demo2.SourceCount.stretch_cfht_r',
    500 * u.dimensionless_unscaled,
    '>=',
    tags=['stretch'],
    metadata_query={
        'camera': 'megacam',
        'filter_name': 'r'
    })
demo_specs.insert(sourcecount_stretch_spec)

That’s it. We now have a set of metrics and specifications defined for two packages, demo1 and demo2. Here are the metrics in full:

In [11]:

demo_metrics

Out[11]:

Name	Description	Units	Reference	Tags
str18	str50	str15	str28	str16
demo1.Completeness	Magnitude of the catalog's 50% completeness limit.	$\mathrm{mag}$	https://example.com/Complete	demo, photometry
demo1.ZeropointRMS	Photometric calibration RMS.	$\mathrm{mmag}$	https://example.com/PhotRMS	demo, photometry
demo2.SourceCount	Number of matched sources.	$\mathrm{}$		demo

And the specifications in full:

In [12]:

demo_specs

Out[12]:

Name	Test	Tags
str36	str27	str7
demo1.Completeness.minimum_hsc_r	$x$ >= 20.0 $\mathrm{mag}$	minimum
demo1.Completeness.minimum_megacam_r	$x$ >= 24.0 $\mathrm{mag}$	minimum
demo1.Completeness.minimum_megacam_u	$x$ >= 20.0 $\mathrm{mag}$	minimum
demo1.Completeness.stretch_hsc_r	$x$ >= 28.0 $\mathrm{mag}$	stretch
demo1.Completeness.stretch_megacam_r	$x$ >= 26.0 $\mathrm{mag}$	stretch
demo1.Completeness.stretch_megacam_u	$x$ >= 24.0 $\mathrm{mag}$	stretch
demo1.ZeropointRMS.minimum_hsc_r	$x$ <= 12.0 $\mathrm{mmag}$	minimum
demo1.ZeropointRMS.minimum_megacam_r	$x$ <= 15.0 $\mathrm{mmag}$	minimum
demo1.ZeropointRMS.minimum_megacam_u	$x$ <= 30.0 $\mathrm{mmag}$	minimum
demo1.ZeropointRMS.stretch_hsc_r	$x$ <= 6.0 $\mathrm{mmag}$	stretch
demo1.ZeropointRMS.stretch_megacam_r	$x$ <= 10.0 $\mathrm{mmag}$	stretch
demo1.ZeropointRMS.stretch_megacam_u	$x$ <= 20.0 $\mathrm{mmag}$	stretch
demo2.SourceCount.minimum_cfht_r	$x$ >= 250.0 $\mathrm{}$	minimum
demo2.SourceCount.stretch_cfht_r	$x$ >= 500.0 $\mathrm{}$	stretch

Of course, these examples are contrived for this tutorial. Normally metrics and specifications aren’t defined in notebooks or code, but with a pull request to the verify_metrics GitHub repository.

Making measurements¶

Now that we’ve defined metrics, we can measure them. Measurements happen in Pipelines code, either within regular Tasks, or in dedicated afterburner Tasks.

The Verification Framework provides two patterns for making measurements: either using the full measurement API, or a more lightweight capture of measurement quantities. For the demo1 package we’ll use the more comprehensive approach, and then make lightweight measurements for the demo2 package.

Measuring ZeropointRMS¶

In our Task, we might have arrays of matched photometry and catalogs stars with known photometry:

In [13]:

catalog_mags = np.random.uniform(18, 26, size=100)*u.mag

obs_mags = catalog_mags - 25*u.mag + np.random.normal(scale=12.0, size=100)*u.mmag

From these the task might estimate a zeropoint:

In [14]:

zp = np.median(catalog_mags - obs_mags)

And a scatter:

In [15]:

zp_rms = np.std(catalog_mags - obs_mags)

zp_rms is a measurement of the demo1.ZeropointRMS metric that we’d like to capture. Let’s create a lsst.verify.Measurement object to do that:

In [16]:

zp_meas = lsst.verify.Measurement('demo1.ZeropointRMS', zp_rms)

We’ve captured the measurement, but there’s more information that will be useful for later understanding the measurement. These additional data are called measurement extras:

In [17]:

zp_meas.extras['zp'] = lsst.verify.Datum(
    zp, label="m_0", description="Estimated zeropoint.")
zp_meas.extras['catalog_mags'] = lsst.verify.Datum(
    catalog_mags, label="m_cat", description="Catalog magnitudes.")
zp_meas.extras['obs_mags'] = lsst.verify.Datum(
    obs_mags, label="m_obs", description="Instrument magnitudes.")

The Datum objects act as wrappers for information, like Astropy quantities, that adds plotting labels and descriptions to help document our datasets.

In a Task, we might want to add annotations about the Task’s configuration. These annotations will be added to the metadata of the pipeline execution. For example, this is an annotation of the function used to estimate the RMS:

In [18]:

zp_meas.notes['estimator'] = 'numpy.std'

Measuring Completeness¶

Our task also measures photometric completeness. Let’s making another Measurement to record this metric measurement, along with extras:

In [19]:

# Here's a mock dataset
mag_grid = np.linspace(22, 28, num=50, endpoint=True)
c_percent = 1. / np.cosh((mag_grid - mag_grid.min()) / 2.) * 100.

# Make the measurement
completeness_mag = np.interp(50.0, c_percent[::-1], mag_grid[::-1]) * u.mag

# Package the measurement
completeness_meas = lsst.verify.Measurement(
    'demo1.Completeness',
    completeness_mag,
)
completeness_meas.extras['mag_grid'] = lsst.verify.Datum(
    mag_grid * u.mag, label="m", description="Magnitude")
completeness_meas.extras['c_frac'] = lsst.verify.Datum(
    c_percent * u.percent,
    label="C",
    description="Photometric catalog completeness.")

Packaging measurements in a Verification Job¶

In the Verification Framework, a “job” is a pipeline run that produces metric measurements. The lsst.verify.Job class allows us to package several measurements from the pipeline run. With a Job object, we can then analyze the measurements, save verification datasets to disk, and dispatch datasets to the SQUASH database.

Normally when we create a Job object from scratch we seed it with the metrics and specifications defined in the verify_metrics repo:

In [20]:

job = lsst.verify.Job.load_metrics_package()

Of course, we created ad hoc metrics and specifications outside of verify_metrics. We can add those to the job:

In [21]:

job.metrics.update(demo_metrics)
job.specs.update(demo_specs)

Now add the measurements:

In [22]:

job.measurements.insert(zp_meas)
job.measurements.insert(completeness_meas)

The pipeline Tasks that is making this Job knows about the camera and filter of the dataset. The Task code can record this metadata:

In [23]:

job.meta.update({'camera': 'megacam', 'filter_name': 'r'})

Job metadata is a dict-like mapping. Here’s the full set of metadata recorded for the job:

In [24]:

print(job.meta)

{
    "camera": "megacam",
    "demo1.ZeropointRMS.estimator": "numpy.std",
    "filter_name": "r"
}

As expected, the camera and filter_name is present, but so is the estimator annotation that we attached to the demo1.ZeropointRMS measurement. Measurement annotations are automatically included in a Job’s metadata, but keys are prefixed with the measurement’s metric name. Specification metadata_query definitions can act on both job and measurement-level metadata.

Before a Task exits, it should write the verification Job dataset to disk. Serialization to disk is a temporary shim until Job datasets can be persisted through the Butler.

The native serialization format of the Verification Framework is JSON:

In [25]:

job.write('demo1.verify.json')

Making lightweight quantity-only measurements with output_quantities()¶

lsst.verify.Measurement and lsst.verify.Job classes are necessary for producing rich job datasets (for example, associating extras with measurements). Many Tasks, though, won’t need this functionality. A Task might record a measurement as an Astropy quantity and persist that measurement with as little overhead as possible. The lsst.verify.output_quantities function enables this use case.

First, a Task will create a dictionary to collect measurements throughout the lifetime of the Task’s execution:

In [26]:

demo2_measurements = {}

Then the task measures the demo2.SourceCount metric:

In [27]:

demo2_measurements['demo2.SourceCount'] = 350*u.dimensionless_unscaled

Measurements are always Astropy quantities.

Finally, before the Task returns, it can output measurements to disk. The default filename format for the Verification job dataset file is {package}.verify.json.

In [28]:

lsst.verify.output_quantities('demo2', demo2_measurements)

Out[28]:

'demo2.verify.json'

Post processing verification jobs¶

Our hypothetical pipeline has produced measurements for two packages: demo1 and demo2. These measurements are persisted to demo1.verify.json and demo2.verify.json files on disk. Now we’d like to gather these measurements and either submit them to the SQUASH dashboard, or collate the measurements for local analysis.

The dispatch_verify.py tool lets us do this. Before uploading measurements to SQUASH, let us see how to combine the mesurements into a single JSON file.

In [29]:

%%bash
export DYLD_LIBRARY_PATH=$LSST_LIBRARY_PATH
dispatch_verify.py --test --ignore-lsstsw --write demo.verify.json demo1.verify.json demo2.verify.json

verify.bin.dispatchverify.main INFO: Loading demo1.verify.json
verify.bin.dispatchverify.main INFO: Loading demo2.verify.json
verify.bin.dispatchverify.main INFO: Merging verification Job JSON.
verify.bin.dispatchverify.main INFO: Refreshing metric definitions from verify_metrics
verify.bin.dispatchverify.main INFO: Writing Job JSON to demo.verify.json.

The flags used here are:

--test: prevents dispatch_verify.py from attempting to upload to the SQUASH service.
--ignore-lsstsw: since this notebook may not be run from an lsstsw-based installation, we’ll avoid scraping it for information (such as Git commits and branches of packages included in the Pipeline stack).
--write demo.verify.json: Write the merged job dataset to demo.verify.json.
demo1.verify.json and demo2.verify.json are inputs, as positional arguments, pointing to the job JSON files that we created earlier with metric measurements.

See dispatch_verify.py --help for more information.

Dispatching verification jobs to SQUASH¶

We’ll use a sandbox instance of SQUASH specially deployed for this tutorial. We’ll show how to register a new user and update the SQUASH database with the metrics and specifications defined earlier. Finally, we’ll upload a verification job to SQUASH with dispatch_verify.py.

The SQUASH RESTful API is used for managing the SQUASH metrics dashboard. In the sandbox instance it can be reached by the following URL:

In [30]:

squash_api_url = "https://squash-restful-api-sandbox.lsst.codes"

Here we create a new user in SQUASH. An authenticated user is required to make POST requests to the SQUASH RESTful API.

In [31]:

import getpass
username = getpass.getuser()
password = getpass.getpass(prompt='Password for user `{}`: '.format(username))

Password for user `afausti`: ········

In [32]:

import requests
credentials = {'username': username, 'password': password}
r = requests.post('{}/register'.format(squash_api_url), json=credentials)
r.json()

Out[32]:

{'message': 'User created successfully.'}

Uploading metrics definition and specifications¶

In practice, a change in the verify_metrics package would automatically trigger an update to SQUASH. However, the metrics and specifications defined in this tutorial for the demo1 and demo2 packages must be manually loaded to SQUASH. This can be done throught the SQUASH RESTful API.

In [33]:

r = requests.post('{}/auth'.format(squash_api_url), json=credentials)
r.json()

Out[33]:

{'access_token': 'eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJleHAiOjE2MDI1MzU0OTAsImlhdCI6MTUyNDc3NTQ5MCwibmJmIjoxNTI0Nzc1NDkwLCJpZGVudGl0eSI6Mn0.cyS6ozuO8eCxjSHaCZplKqv2YAn52YtHca0PB9dAuI8'}

In [34]:

headers = {'Authorization': 'JWT {}'.format(r.json()['access_token'])}

In [35]:

r = requests.post(
    '{}/metrics'.format(squash_api_url),
    json={'metrics': demo_metrics.json},
    headers=headers)
r.json()

Out[35]:

{'message': 'List of metrics successfully created.'}

In [36]:

r = requests.post(
    '{}/specs'.format(squash_api_url),
    json={'specs': demo_specs.json},
    headers=headers)
r.json()

Out[36]:

{'message': 'List of metric specificationss successfully created.'}

Uploading verification jobs¶

Finally, let us upload the demo.verify.json file to SQUASH so that we can visualize the results in the metrics dashboard.

In [37]:

%%bash -s "$squash_api_url" "$username" "$password"
export DYLD_LIBRARY_PATH=$LSST_LIBRARY_PATH
dispatch_verify.py --ignore-lsstsw --url $1 --user $2 --password $3 demo.verify.json

verify.bin.dispatchverify.main INFO: Loading demo.verify.json
verify.bin.dispatchverify.main INFO: Refreshing metric definitions from verify_metrics
verify.bin.dispatchverify.main INFO: Uploading Job JSON to https://squash-restful-api-sandbox.lsst.codes.
verify.squash.get INFO: GET https://squash-restful-api-sandbox.lsst.codes status: 200
verify.squash.post INFO: POST https://squash-restful-api-sandbox.lsst.codes/auth status: 200
verify.squash.post INFO: POST https://squash-restful-api-sandbox.lsst.codes/job status: 202

The flags used here are:

--ignore-lsstsw: since this notebook may not be run from an lsstsw-based installation, we’ll avoid scraping it for information (such as Git commits and branches of packages included in the Pipeline stack).
--url: points to the SQUASH RESTful API URL
--user and --password: credentials for the SQUASH user
demo.verify.json the job JSON file that we created earlier.

See dispatch_verify.py --help for more information.

Analyze verification results locally¶

For code development, it’s convenient to look at the results of verification measurements locally, rather than in SQUASH. The Verification Framework is designed for this workflow, with special affordances for Jupyter Notebook users.

The collated measurement dataset produced by dispatch_verify.py earlier is in the file demo.verify.json. Let’s open this dataset using the Job.deserialize class method:

In [38]:

with open('demo.verify.json') as f:
    job = lsst.verify.Job.deserialize(**json.load(f))

Making reports¶

With a job dataset, we can make a report that summarizes the pass/fail status of specifications that have a corresponding measurement. Reports, lsst.verify.Report instances, are thin wrappers around Astropy Tables, and look great in Jupyter Notebooks:

In [39]:

job.report().show()

Out[39]:

Status	Specification	Measurement	Test	Metric Tags	Spec. Tags
✅	demo1.Completeness.minimum_megacam_r	24.6 $\mathrm{mag}$	$x$ >= 24.0 $\mathrm{mag}$	demo, photometry	minimum
❌	demo1.Completeness.stretch_megacam_r	24.6 $\mathrm{mag}$	$x$ >= 26.0 $\mathrm{mag}$	demo, photometry	stretch
✅	demo1.ZeropointRMS.minimum_megacam_r	12.2 $\mathrm{mmag}$	$x$ <= 15.0 $\mathrm{mmag}$	demo, photometry	minimum
❌	demo1.ZeropointRMS.stretch_megacam_r	12.2 $\mathrm{mmag}$	$x$ <= 10.0 $\mathrm{mmag}$	demo, photometry	stretch
✅	demo2.SourceCount.minimum_cfht_r	350.0 $\mathrm{}$	$x$ >= 250.0 $\mathrm{}$	demo	minimum
❌	demo2.SourceCount.stretch_cfht_r	350.0 $\mathrm{}$	$x$ >= 500.0 $\mathrm{}$	demo	stretch

Notice that the report only shows specification tests that are relevant to the measurements. Recall that the job metadata indicates these measurements are with CFHT/MegaCam in the $r$-band:

In [40]:

print(job.meta)

{
    "camera": "megacam",
    "demo1.ZeropointRMS.estimator": "numpy.std",
    "filter_name": "r",
    "packages": {}
}

Thus all the specifications having to do with HSC or the $u$-band aren’t tested because those tests are meaningless with the current measurements.

When there are many measurements and specifications, you might be more interested in producing reports around specific topics. Such tailored reports can be made by passing arguments to the Job.report method. For example, this is a report listing only demo1 package metrics:

In [41]:

job.report(name='demo1').show()

Out[41]:

Status	Specification	Measurement	Test	Metric Tags	Spec. Tags
✅	demo1.Completeness.minimum_megacam_r	24.6 $\mathrm{mag}$	$x$ >= 24.0 $\mathrm{mag}$	demo, photometry	minimum
❌	demo1.Completeness.stretch_megacam_r	24.6 $\mathrm{mag}$	$x$ >= 26.0 $\mathrm{mag}$	demo, photometry	stretch
✅	demo1.ZeropointRMS.minimum_megacam_r	12.2 $\mathrm{mmag}$	$x$ <= 15.0 $\mathrm{mmag}$	demo, photometry	minimum
❌	demo1.ZeropointRMS.stretch_megacam_r	12.2 $\mathrm{mmag}$	$x$ <= 10.0 $\mathrm{mmag}$	demo, photometry	stretch

And this report shows results for the demo1.ZeropointRMS metrics:

In [42]:

job.report(name='demo1.ZeropointRMS').show()

Out[42]:

Status	Specification	Measurement	Test	Metric Tags	Spec. Tags
✅	demo1.ZeropointRMS.minimum_megacam_r	12.2 $\mathrm{mmag}$	$x$ <= 15.0 $\mathrm{mmag}$	demo, photometry	minimum
❌	demo1.ZeropointRMS.stretch_megacam_r	12.2 $\mathrm{mmag}$	$x$ <= 10.0 $\mathrm{mmag}$	demo, photometry	stretch

Recall that we added tags to the specifications to designate minimum and stretch goals, as in seen in the demo1.ZeropointRMS.minimum_megacam_r specification:

In [43]:

job.specs['demo1.ZeropointRMS.minimum_megacam_r'].tags

Out[43]:

{'minimum'}

We can tailor the report to show tests only against these minimum specifications:

In [44]:

job.report(spec_tags=['minimum']).show()

Out[44]:

Status	Specification	Measurement	Test	Metric Tags	Spec. Tags
✅	demo1.Completeness.minimum_megacam_r	24.6 $\mathrm{mag}$	$x$ >= 24.0 $\mathrm{mag}$	demo, photometry	minimum
✅	demo1.ZeropointRMS.minimum_megacam_r	12.2 $\mathrm{mmag}$	$x$ <= 15.0 $\mathrm{mmag}$	demo, photometry	minimum
✅	demo2.SourceCount.minimum_cfht_r	350.0 $\mathrm{}$	$x$ >= 250.0 $\mathrm{}$	demo	minimum

Notice that the spec_tags argument takes a sequence of tags. Each tag is treated as an AND filter with the others. For example, there are no specifications that are both minimum and stretch, so the report is empty:

In [45]:

job.report(spec_tags=['minimum', 'stretch']).show()

Out[45]:

Status	Specification	Measurement	Test	Metric Tags	Spec. Tags

In addition to specification tags, you can filter by metric tags by setting the metric_tags argument.

Finally, these filters can be combined. For example, this report summarizes specification tests for metrics from the demo1 package against minimum goals:

In [46]:

job.report(name='demo1', spec_tags=['minimum']).show()

Out[46]:

Status	Specification	Measurement	Test	Metric Tags	Spec. Tags
✅	demo1.Completeness.minimum_megacam_r	24.6 $\mathrm{mag}$	$x$ >= 24.0 $\mathrm{mag}$	demo, photometry	minimum
✅	demo1.ZeropointRMS.minimum_megacam_r	12.2 $\mathrm{mmag}$	$x$ <= 15.0 $\mathrm{mmag}$	demo, photometry	minimum

Data behind the measurements¶

Besides reports of specifications that were met or failed during a job, we’re also interested in the context of the measurements. What was the distribution of points? Where were sources on the detector? These questions cannot be answered by metrics, which are scalars by definition. But they might be answered by the blob datasets that accompany measurements.

Recall that during the demo1 measurements we added “extras,” consisting of raw arrays of magnitudes, as well as the fitted zeropoint. We can access these blob datasets and make plots for deeper investigation.

First, we access the demo1.ZeropointRMS metric measurement in the job:

In [47]:

m = job.measurements['demo1.ZeropointRMS']

The extra data associated with the measurement are stored as key-value items in the measurement’s extras attribute:

In [48]:

list(m.extras.keys())

Out[48]:

['zp', 'catalog_mags', 'obs_mags']

For this tutorial we’ll use Bokeh to make interactive plots with this data. Often it’s easiest to pack a Pandas DataFrame for plotting with Bokeh. We’ll make the DataFrame from the Astropy Quantity array, accessed from the quantity attributes of each item:

In [49]:

df = pandas.DataFrame({
    "obs_mags":
    m.extras['obs_mags'].quantity,
    "catalog_mags":
    m.extras['catalog_mags'].quantity,
    "delta_mags":
    m.extras['catalog_mags'].quantity - m.extras['obs_mags'].quantity
})

These items, obs_mags and catalog_mags, are lsst.verify.Datum instances. Datum objects allow us to pack information with data, such as plot labels. Here we’ll use that metadata to build plot labels:

In [50]:

# Scatter plot of observed vs. catalog stellar photometry
p = figure(
    title="Zeropoint stellar sample",
    x_axis_label="{0.label} [{0.unit}]".format(m.extras['obs_mags']),
    y_axis_label="{0.label} [{0.unit}]".format(m.extras['catalog_mags']),
    plot_width=350,
    plot_height=350)
p.circle(df['obs_mags'], df['catalog_mags'], size=5)

# Histogram of zeropoint estimates from individual matched stars.
# We're not using the Histogram Bokeh chart for some extra control.
hist_counts, hist_edges = np.histogram(df['delta_mags'], bins=10)
h = figure(
    tools="xpan, xwheel_zoom, reset",
    active_scroll="xwheel_zoom",
    y_range=(0, hist_counts.max() + 2),
    y_axis_label="Count",
    x_axis_label="{0.label} - {1.label} [{0.unit}]".format(
        m.extras['obs_mags'], m.extras['catalog_mags']),
    plot_width=350,
    plot_height=350)

# Draw histogram edges on the figure
h.quad(
    bottom=0,
    left=hist_edges[:-1],
    right=hist_edges[1:],
    top=hist_counts,
    color="lightblue",
    line_color="#3A5785")
# Line at zeropoint estimate
span = Span(
    location=m.extras['zp'].quantity.value,
    dimension='height',
    line_color="black",
    line_dash='dashed',
    line_width=3)
h.add_layout(span)

In [51]:

# Plot side-by-side
show(row(p, h), notebook_handle=True)

Out[51]:

<Bokeh Notebook handle for In[51]>

The key to building useful plots is packing the right blob data with measurements to begin with. As you write your code, imagine what plots might usefully augment metric measurements.

Summary and outlook¶

This technical note has demonstrated the full usage cycle of lsst.verify:

Defining metrics.
Defining specifications of metrics.
Measuring metrics.
Associating extra datasets with measurements.
Integration with the SQUASH dashboard application.
Analyzing verification pipeline jobs, including building pass/fail reports and making plots.

We encourage Data Management engineers and scientists to consider how you might instrument your own code, particularly pipeline Tasks, with verification measurements. By systematically monitoring performance metrics in your code, you will gain a clearer picture of how code development is affecting your systems.

With SQUASH, your metric measurements are centrally available to the whole organization. We believe that lsst.verify and SQUASH will become an everday service for DM developers to ensure that code contributions do not introduce adverse performance side-effects across the Stack.

References¶

Astropy Collaboration et al (2013). Astropy: A community Python package for astronomy. A&A, 558, A33, 10.1051/0004-6361/201322068.

Fausti, Angelo (2016). SQUASH dashboard prototype. SQuaRE Technical Note SQR-009. https://sqr-009.lsst.io.

Ivezić, Željko, and The LSST Science Collaboration (2011). LSST Science Requirements Document. LPM-17. https://ls.st/LPM-17.

Parejko, John and Sick, Jonathan (2017). Validation Metrics Framework. SQuaRE Technical Note SQR-017. https://sqr-017.lsst.io.

Wood-Vasey, Michael (2016). Introducing validate_drp: Calculate SRD Key Performance Metrics for an output repository. Data Management Technical Note DMTN-008. https://dmtn-008.lsst.io.

SQR-019: LSST Verification Framework API Demonstration