BlockPi HyperNode— setting up monitoring (w/ Grafana and Prometheus)
For a good administration of a validator node in any blockchain, a must is keeping the uptime to a maximum to provide a good staking service and helping make the blockchain run rock solid (or could we say block solid?).
So using monitoring and alerting that tracks your node performance are just a must. Nowadays the standard tools for that are: Prometheus for gathering metrics and Grafana for creating fancy usefull panels and sending alerts. In this article we’ll try to explain that for a BlockPI Hypernode (now on testnet).
Installing Prometeus
We are installing it on an Linux Ubuntu Server running on a Raspberry Pi 4. We choose a Raspberry Pi 4 as our Prometheus/Grafana server that holds all the dashboards for our nodes, because it’s quite powerfull in terms of processor for being such a low energy comsuming device.
Let’s go with the commands to achieve the installation.
- - Create user for Prometheus
sudo groupadd — system prometheus
sudo useradd -s /sbin/nologin — system -g prometheus prometheus
sudo mkdir /var/lib/prometheus
for i in rules rules.d files_sd; do sudo mkdir -p /etc/prometheus/${i}; done
2.- Update the system
sudo apt update
sudo apt -y install wget curl vim
mkdir -p /tmp/prometheus && cd /tmp/prometheus
3.- Download Prometheus for your server OS architecture (ARM for our case with the Raspberry Pi 4)
# download for ARM
curl -s https://api.github.com/repos/prometheus/prometheus/releases/latest | grep browser_download_url | grep linux-armv7 | cut -d ‘“‘ -f 4 | wget -qi -# download for AMD
curl -s https://api.github.com/repos/prometheus/prometheus/releases/latest | grep browser_download_url | grep linux-amd64 | cut -d ‘“‘ -f 4 | wget -qi -
4.- Install the binary and create config file
tar xvf prometheus*.tar.gz
cd prometheus*/
sudo mv prometheus promtool /usr/local/bin/prometheus — version
promtool — versionsudo mv prometheus.yml /etc/prometheus/prometheus.yml
5.- Create a service file for handling Prometheus execution
sudo tee /etc/systemd/system/prometheus.service<<EOF
[Unit]
Description=Prometheus
Documentation=https://prometheus.io/docs/introduction/overview/
Wants=network-online.target
After=network-online.target[Service]
Type=simple
User=prometheus
Group=prometheus
ExecReload=/bin/kill -HUP \$MAINPID
ExecStart=/usr/local/bin/prometheus \
— config.file=/etc/prometheus/prometheus.yml \
— storage.tsdb.path=/var/lib/prometheus \
— web.console.templates=/etc/prometheus/consoles \
— web.console.libraries=/etc/prometheus/console_libraries \
— web.listen-address=0.0.0.0:9090 \
— web.external-url=SyslogIdentifier=prometheus
Restart=always[Install]
WantedBy=multi-user.target
EOF
6.- Change directory permissions
for i in rules rules.d files_sd; do sudo chown -R prometheus:prometheus /etc/prometheus/${i}; done
for i in rules rules.d files_sd; do sudo chmod -R 775 /etc/prometheus/${i}; done
sudo chown -R prometheus:prometheus /var/lib/prometheus/
7.- Enable and start the service
sudo systemctl daemon-reload
sudo systemctl start prometheus
sudo systemctl enable prometheussystemctl status prometheus
8.- Secure the service
sudo ufw allow 9090/tcp
sudo ufw enable
Check the access, browse to the server IP with port 9090, for us http://192.168.1.60:9090/
9.- Now that it’s running and accessible we configure it
vim /etc/prometheus/prometheus.yml
->
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.# Default prometheus job
— job_name: “prometheus”
static_configs:
— targets: [“localhost:9090”]# BlockPI node job (getting data from the BlockPI exporter)
- job_name: “prometheus_blockpi”
metrics_path: /node/metrics
static_configs:
— targets: [‘82.223.55.221:8899’]
labels:
instance: ‘BlockPi’# BlockPI Node Exporter job (getting data from the Linux exporter)
— job_name: “prometheus_blockpi_linux”
static_configs:
— targets: [‘82.223.55.221:9100’]
labels:
instance: ‘BlockPi’
As you may have noticed we’ve used 2 jobs one for the BlockPI hypernode metrics itself (port 8899) and another one for the common Node Exporter to get metrics from our server hardware and processes, that we’ll setup in brief (port 9100).
After having set the config up, we can check our config with the following command.
promtool check config /etc/prometheus/prometheus.yml
Installing Node Exporter for Linux monitoring
The Prometheus exporter is installed on the machine we want to monitor so we need to connect via SSH to the server and execute the commands of this section on it.
- - Set user
sudo whoami
sudo groupadd prometheus
sudo useradd — system -s /sbin/nologin -g prometheus prometheus
2.- Set firewall rules for both exporters (node exporter and blockpi hypernode)
sudo ufw allow 9100/tcp
sudo ufw allow 8899/tcp
3.- Install Node Exporter binary
wget -q https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz
sudo tar — strip-components=1 -xf node_exporter-1.3.1.linux-amd64.tar.gz -C /usr/local/bin/
sudo chown -R root: /usr/local/bin/
4.- Create a service for handling it’s execution
sudo vim /etc/systemd/system/node_exporter.service
[Unit]
Description=Prometheus Node Exporter
Documentation=https://github.com/prometheus/node_exporter
Wants=network-online.target
After=network-online.target
[Service]
Type=simple
User=prometheus
Group=prometheus
ExecReload=/bin/kill -HUP $MAINPID
ExecStart=/usr/local/bin/node_exporter \
— collector.cpu \
— collector.diskstats \
— collector.filesystem \
— collector.loadavg \
— collector.meminfo \
— collector.filefd \
— collector.netdev \
— collector.stat \
— collector.netstat \
— collector.systemd \
— collector.uname \
— collector.vmstat \
— collector.time \
— collector.tcpstat \
— collector.hwmon \
— collector.arp \
— web.max-requests=40 \
— web.listen-address=0.0.0.0:9100 \
— web.telemetry-path=/metrics
SyslogIdentifier=prometheus
Restart=always
[Install]
WantedBy=multi-user.target
5.- After saving the service file, you should enable and start the service
sudo chown -R root: /etc/systemd/system/node_exporter.service
sudo chmod 0644 /etc/systemd/system/node_exporter.servicesudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter
sudo systemctl status node_exporter
That’s it! Our node exporter should be running, you could also check another interesting exporter in case you need more metrics than the ones provided by the Node Exporter. Search for Prometheus Linux process exporter, the installation should be quite similar to the one we just did for the Node Exporter.
As we configured in the previous section our Prometheus config rules for the Node Exporter service, our Prometheus server sould be gathering all the data from BlockPI hypernode server.
Checking Prometheus Installation
To check that the installation has gone right, we can visit with the browser our Prometheus server. In our case, we visit:
http://192.168.1.60:9090/targets#pool-blockpi-node
We can see our exporters (BlokcPi and Linux Node exporters) are up and running.
If we click on the blockpi-node endpoint we’ll access to the metrics exposed by the BlockPI hypernode exporter. You can being thinking on which ones are interesting for your Grafana dashboard.
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 2.5908e-05
go_gc_duration_seconds{quantile="0.25"} 3.6188e-05
go_gc_duration_seconds{quantile="0.5"} 4.1278e-05
go_gc_duration_seconds{quantile="0.75"} 5.1296e-05
go_gc_duration_seconds{quantile="1"} 0.000214884
go_gc_duration_seconds_sum 0.002598749
go_gc_duration_seconds_count 56
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 165
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.18"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 1.4405408e+07
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 5.05168752e+08
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.4972e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 1.762769e+06
# HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
# TYPE go_memstats_gc_cpu_fraction gauge
go_memstats_gc_cpu_fraction 1.7660912430073086e-06
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 1.703064e+07
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 1.4405408e+07
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 3.19455232e+08
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 1.7629184e+07
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 42071
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 3.1735808e+08
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 3.37084416e+08
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 1.6563351532684104e+09
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 0
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 1.80484e+06
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 38400
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 46800
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 454648
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 864960
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 2.6079824e+07
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 3.989608e+06
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 2.654208e+06
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 2.654208e+06
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 3.63167832e+08
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 36
# HELP grpc_server_handled_total Total number of RPCs completed on the server, regardless of success or failure.
# TYPE grpc_server_handled_total counter
grpc_server_handled_total{grpc_code="Aborted",grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_handled_total{grpc_code="AlreadyExists",grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_handled_total{grpc_code="Canceled",grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_handled_total{grpc_code="DataLoss",grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_handled_total{grpc_code="DeadlineExceeded",grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_handled_total{grpc_code="FailedPrecondition",grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_handled_total{grpc_code="Internal",grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_handled_total{grpc_code="InvalidArgument",grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_handled_total{grpc_code="NotFound",grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_handled_total{grpc_code="OK",grpc_method="Ping",grpc_service="x.blockpi.RelayService",grpc_type="unary"} 7063
grpc_server_handled_total{grpc_code="OK",grpc_method="Relay",grpc_service="x.blockpi.RelayService",grpc_type="unary"} 672
grpc_server_handled_total{grpc_code="OK",grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_handled_total{grpc_code="OutOfRange",grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_handled_total{grpc_code="PermissionDenied",grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_handled_total{grpc_code="ResourceExhausted",grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_handled_total{grpc_code="Unauthenticated",grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_handled_total{grpc_code="Unavailable",grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_handled_total{grpc_code="Unimplemented",grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
grpc_server_handled_total{grpc_code="Unknown",grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
# HELP grpc_server_msg_received_total Total number of RPC stream messages received on the server.
# TYPE grpc_server_msg_received_total counter
grpc_server_msg_received_total{grpc_method="Ping",grpc_service="x.blockpi.RelayService",grpc_type="unary"} 7063
grpc_server_msg_received_total{grpc_method="Relay",grpc_service="x.blockpi.RelayService",grpc_type="unary"} 672
grpc_server_msg_received_total{grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
# HELP grpc_server_msg_sent_total Total number of gRPC stream messages sent by the server.
# TYPE grpc_server_msg_sent_total counter
grpc_server_msg_sent_total{grpc_method="Ping",grpc_service="x.blockpi.RelayService",grpc_type="unary"} 7063
grpc_server_msg_sent_total{grpc_method="Relay",grpc_service="x.blockpi.RelayService",grpc_type="unary"} 672
grpc_server_msg_sent_total{grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
# HELP grpc_server_started_total Total number of RPCs started on the server.
# TYPE grpc_server_started_total counter
grpc_server_started_total{grpc_method="Ping",grpc_service="x.blockpi.RelayService",grpc_type="unary"} 7063
grpc_server_started_total{grpc_method="Relay",grpc_service="x.blockpi.RelayService",grpc_type="unary"} 672
grpc_server_started_total{grpc_method="ServerReflectionInfo",grpc_service="grpc.reflection.v1alpha.ServerReflection",grpc_type="bidi_stream"} 0
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 10.55
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 65536
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 52
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 6.0424192e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.65632879138e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 3.673358336e+09
# HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.
# TYPE process_virtual_memory_max_bytes gauge
process_virtual_memory_max_bytes 1.8446744073709552e+19
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 333
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0
Grafana Installation and Setup
Grafana is the a tool that connect to many types of external datasources (Prometheus included), fetch data from those and build powerfull dashboards that display our metrics in a usefull way (is better to whatch charts and graphics to represent our data, than too read machine cryptic logs)
1.- Install Grafana server
For the installation of Grafana we’ll refer you to the official guides, to avoid reinventing the wheel and not making this article huge.
Once installed we’ll be able to connect to the server and log-in into it. In our case we should visit with our browser the following url http://192.168.1.60:3000/login
By the way we are installing our grafana server in the same Raspberry Pi 4 we used for our Prometheus server.
2.- Configure the Prometheus data source
For that we’ll need to click on the configuration (the gear icon on the left menu navigation bar), and click on the “Add data source” afterwards.
In the following step select the Prometheus data source.
In the following page we need to set a name for our data source and the url of our Prometheus server, as we have installed it in the same machine it’s accessible on http://localhost:8899
We can explore and play around with the many options it has, but we leave that on your side. Once configured it should appear in the Config data sources web page.
3.- Create a Dashboard
Now that our Grafana server has access to the cool metrics from BlockPI hypernode and Node exporters, we can build a dashboard with our needs. There are infinite ways to do this, but the fastest way is to reuse one of your dashboards, or search for dashboards available (there are tons of them).
For this tutorial we’re letting you download our own dashboard, that you can download as a json file, from our Gitlab repo.
Once downloaded you need to import it in your Grafana website by clicking the import option in the dashboard icon located on the left sidebar menu. Then you can add the json file by clicking on the “Upload JSON file” button or by copy/pasting the content of the json file in the “Import via panel json” textarea field.
4.- Access to your BlockPI hypernode dashboard
Now you can access to your dashboard and watch all that fancy BlockPI node metrics, play around a bit with it, and tweak it to your needs.
And that’s it for today’s tutorial, we hope it’s helpful for you to setup your BlockPI Hypernode monitoring. And stay tuned to BlockPI project because it’s a very interesting and strong one in the crypto space; RPC’s are key in blockchain infrastructure for developing web3 distributed dApps, so we expect a bright future for BlockPI project as providers of multi-chain distributed RPC’s (by now this networks are gonna be supported: Near, Polkadot, Polygon, Solana, Kusama, Flow, Hecco, KCC, and more to come soon).
By now check their website and see what their team is up to.