Monitoring Trellis

Trellis is a an event-driven system that tracks data and orchestrates workflows using metadata. This metadata is stored in a Neo4j graph database and is used to activate database triggers. The triggers will launch compute jobs to process new data as soon as it is available. The (2) easiest ways to monitor Trellis are by A) connecting to the Neo4j database and interacting with it through the Neo4j Browser that comes with the Neo4j desktop client that you can download here and by B) checking the Monitoring dashboard or by creating additional dashboards/widgets.

Neo4j Browser

You should find the database credentials of each Neo4j instance in the Secret Manager of the same project the database is running in under the name “trellis-neo4j-credentials”. Follow the steps in the activity to try interacting with the database via browser (Stanford staff only).

Also, Trellis behavior is governed by a configuration file called trellis-config.yaml in the Trellis bucket of each project it is deployed in. Try reading the object without taking it out of the cloud environment (because you should never take anything out of the cloud environment).

Activity 1

  • Download the Neo4j Desktop Client
  • Retrieve the Neo4j database credentials
  • Connect to the database from your local Neo4j client
  • Try running some queries to determine the state of MVP data (see below)
  • Bonus: Try creating a Cypher query to find any samples that were delivered in the last week, but haven’t finished variant calling. Use the browser to navigate the graph and identify the last completed step. (Hint: don’t start from scratch)

Trellis Monitoring Dashboard

The other way to keep an eye on Trellis and the systems that run it (compute instances, cloud functions, etc.) is using Monitoring Dashboards. I’ve created one spefically for Trellis. If you are not Stanford Staff, you can check out the slide decks below to get a better idea of what the dashboard looks like.

Activity 2

Using the Monitoring dashboard, answer these questions:

  • How long has the average Neo4j query taken to run in the last 7 days?
  • How long has the longest query taken?
  • How much MVP data are we storing?
  • Which storage buckets host the most data?

From the Archives

Because we are not currently running variant calling it’s harder to get a feel for what the dashboard should look like, but you can look back at some of my old slide decks to see get an idea what normal and aberrant patterns look like.

Core Trellis repositories

The source for the Trellis application and its deployment, as well as supporting resources, are stored in these GitHub repositories. Also, Trellis behavior is governed by a configuration file called “trellis-config.yaml” in the Trellis bucket of each project it is deployed in. Try reading the object without taking it out of the cloud environment (because you should never take anything out of the cloud environment).

Activity 3

Figure out which branch of the trellis-mvp-gatk repository is being deployed to production. Hint: check the Cloud Build configuration.

Cypher queries for checking the state of MVP data

1. How many sequencing samples have been delivered?

MATCH (n:PersonalisSequencing)
RETURN COUNT(DISTINCT n)

2. How many samples have variants called?

MATCH (n:PersonalisSequencing)<-[:WAS_USED_BY]-(:Sample)<-[:GENERATED]-(:Person)-[:HAS_BIOLOGICAL_OME]->(:Genome)-[:HAS_VARIANT_CALLS]->(v:Merged:Vcf)
RETURN COUNT(DISTINCT n)

3. How many samples have been delivered within the last 48 hours?

MATCH (n:PersonalisSequencing)
WHERE n.timeCreatedEpoch > (datetime().epochSeconds - 172800)
RETURN COUNT(n)

4. How many samples have started variant calling in the last 7 days?

MATCH (n:PersonalisSequencing)-[:GENERATED]->(:Fastq)-[:WAS_USED_BY]->(:Job:FastqToUbam)
WHERE n.timeCreatedEpoch > (datetime().epochSeconds - 604800)
RETURN COUNT(DISTINCT n)

5. How many samples have completed variant calling in the last 7 days?

WITH datetime().epochSeconds - 604800 AS sevenDayEpoch
MATCH (n:PersonalisSequencing)<-[]-(:Sample)<-[]-(:Person)-[:HAS_BIOLOGICAL_OME]->(:Genome)-[:HAS_VARIANT_CALLS]->(v:Merged:Vcf)
WHERE v.timeCreatedEpoch > sevenDayEpoch
RETURN COUNT(DISTINCT n)

6. How many Fastqs are not connected to a PersonalisSequencing node? (Error: broken relationship)

MATCH (n:Fastq)
WHERE NOT (:PersonalisSequencing)-[]->(n)
RETURN COUNT(n)

7. How many samples have finished variant calling and have all essential QC documents?

MATCH (n:PersonalisSequencing)<-[:WAS_USED_BY]-(:Sample)<-[:GENERATED]-(:Person)-[:HAS_BIOLOGICAL_OME]->(g:Genome)-[:HAS_VARIANT_CALLS]->(v:Merged:Vcf)
WHERE (g)-[:HAS_QC_DATA]->(:Flagstat)
AND (g)-[:HAS_QC_DATA]->(:Fastqc)
AND (g)-[:HAS_QC_DATA]->(:Vcfstats)
RETURN COUNT(g)

8. How many QC documents does every sample have?

MATCH (n:PersonalisSequencing)<-[:WAS_USED_BY]-(:Sample)<-[:GENERATED]-(:Person)-[:HAS_BIOLOGICAL_OME]->(g:Genome)-[:HAS_VARIANT_CALLS]->(v:Merged:Vcf)
WITH g
MATCH (g)-[:HAS_QC_DATA]->(qc)
WITH g, COLLECT(qc) AS qcs
RETURN COUNT(g), SIZE(qcs)

Discuss

Join the discussion of this post on GitHub!