Monitoring Application
Introduction
The Monitoring app helps to monitor the Public Catalog system as a whole.
It is a web application displaying a dashboard of the status for the
different services and cron processes, doing checks across all the nodes.
It also records available memory, disk space, Apache requests and response
times to display graphs of these variables over time. To make troubleshooting
easy, it also provides easy access to the numerous logs for the different
services across all the nodes. Finally, it provides links to other admin
panels. The home page is refreshed regularly (with an html meta
).
Access
The Monitoring app is available with the path /monitoring
, using the same
credentials as the Traefik and Solr admin panels. Access is controlled
with traefik.
Re-deploying
If you ever need to re-deploy the stack, you can use the pc-deploy script.
Make sure you run it as the deploy user so that the proper Docker container registry credentials are passed.
UI Sections
Status
This section displays quick-glance status information for key services, jobs, and node data such as (not a complete list):
- Disk space
- VuFind
- Solr
- MariaDB
- FOLIO harvest
- Alphabetical browse update
- Backup jobs
Graphs
Links to various charts with data over time, such as memory, disk usage and response time.
Logs
Links to the logs for each service and job, and each log page shows the logs for each of the nodes in the cluster. For example you can see the Solr logs for each node.
Other admin apps
This section contains links to other outside services, like the Traefik dashboard and Solr's administrative interface.
JSON status for nodes
The monitoring app is running on each node. The status specific to each
node can be obtained with the path /monitoring/node/status
. So for
instance within a container using the docker network, one can get
node 2's status with http://monitoring2/monitoring/node/status
.
Implementation
Implementation is in Python with Flask, in Docker. The starting point
is simply python app/app.py
. It is using a mariadb database called
monitoring
(using galera like the other services).
Here is a summary of what top-level files/directories are for:
static
: CSS and Javascript filestemplates
: Jinja templatesapp.py
: main file, including all the routes, and starting the schedulercollector.py
: regular task saving the variables in a database; also collects Apache requests and response times by looking at the access log.graphs.py
: functions to create graphshome.py
: prepares the home page template using functions instatus.py
.logs.py
: gathers and displays the logs; log files are read from the${STACK_NAME}_logs
docker volume.status.py
: gathers all status informationutil.py
: utilities (mainly to do async http requests in parallel withasyncio
andaiohttp
)
Pylint arguments
--disable=missing-module-docstring
--disable=missing-class-docstring
--disable=missing-function-docstring
--max-line-length=120
--good-names=i,j,k,x,y,ex