Monitoring Application
Introduction
The Monitoring app helps to monitor the Public Catalog system as a whole.
It is a web application displaying a dashboard of the status for the different services
and cron processes, doing checks across all the nodes. It also records available
memory, disk space, apache requests and response times to display graphs of these
variables over time. To make troubleshooting easy, it also provides easy access to
the numerous logs for the different services across all the nodes. Finally, it
provides links to other admin panels. The home page is refreshed regularly (with an html meta
).
Access
The Monitoring app is available with the path /monitoring
, using the same
credentials as the Traefik and Solr admin panels. Access is controlled with traefik.
UI Sections
Status
This section displays quick-glance status information for key services, jobs, and node data such as (not a complete list): * Disk space * VuFind * Solr * MariaDB * FOLIO harvest * Alphabetical browse update * Backup jobs
Graphs
Links to various charts with data over time, such as memory, disk usage and response time.
Logs
Links to the logs for each service and job, and each log page shows the logs for each of the nodes in the cluster. For example you can see the Solr logs for each node.
Other admin apps
This section contains links to other outside services, like the Traefik dashboard and Solr's administrative interface.
JSON status for nodes
The monitoring app is running on each node. The status specific to each node can be
obtained with the path /monitoring/node/status
. So for instance within a container
using the docker network, one can get node 2's status with http://monitoring2/monitoring/node/status
.
Implementation
Implementation is in Python with Flask, in Docker. The starting point is simply
python app/app.py
. It is using a mariadb database called monitoring
(using galera like the other services).
Here is a summary of what top-level files/directories are for:
- static
: CSS and Javascript files
- templates
: Jinja templates
- app.py
: main file, including all the routes, and starting the scheduler
- collector.py
: regular task saving the variables in a database; also collects apache requests and response times by looking at the access log.
- graphs.py
: functions to create graphs
- home.py
: prepares the home page template using functions in status.py
.
- logs.py
: gathers and displays the logs; log files are read from the ${STACK_NAME}_logs
docker volume.
- status.py
: gathers all status information
- util.py
: utilities (mainly to do async http requests in parallel with asyncio
and aiohttp
)
Pylint arguments
--disable=missing-module-docstring
--disable=missing-class-docstring
--disable=missing-function-docstring
--max-line-length=120
--good-names=i,j,k,x,y,ex