Netdata tutorial for Production Web Server (Nginx, PHP-fpm, MariaDB)
Table of Contents
A while back I discovered Netdata and I was truly fascinated by itâs take on system monitoring for two main reasons. It was extremely lightweight, something of great usefulness to me, an IoT enthusiast, as it proved to be a very useful tool to debug IoT devices, such as the Raspberry pi.
On a Raspberry pi zero W, it consumes as little as 10% of the cpu, without any optimization whatsoever and update frequency of 1 second. Crazy huh?
Another aspect that I enjoyed as a newcomer to the world of Devops was the intrinsic easiness of both the installation and customization of the platform. Itâs truly a fire and forget application that is configured and programmed in such as way that the default settings will be adequate for a large number of users, myself included.
This fascination and the fact that I had just finished my Integrated Masterâs diploma ,(thus looking to break into the startup world), led me to contribute a bit of docs and code, as a good open source citizen.
Wanting to dig further, I was extremely fortunate to be introduced to a sys-admin at the University of Patras, named Spiros. Spiros and I, installed and configured Netdata on a production server for the University of Patras. This post is an attempt to distill what we learned.
This post describes how to install and configure Netdata for a production server for a typical PHP webserver (Nginx, php, MariaDB).
It was originally written back in January, but it is published now for the first time.
Introduction
This post is intended for the purpose of onboarding a production web-server to the Netdata platform, introducing a new era to metrics and performance monitoring. Netdata is built around 4 main principles:
- Per second data collection for all metrics.
- Collection & Visualization of metrics from all possible sources
- Meaningful presentation of metrics, optimised for visual anomaly detection
- Pre-configured, fire (install) and forget!
Netdata decentralizes monitoring completely. Each Netdata node is autonomous. It collects metrics locally, it stores them locally, it runs checks against them to trigger alarms locally, and provides an API for the dashboards to visualize them. Horizontal Scale
So what Netdata does?
- Metrics are auto-detected, so for 99% of the cases data collection works out of the box.
- Metrics are converted to human readable units, right after data collection, before storing them into the database.
- Metrics are structured, organized in charts, families and applications, so that they can be browsed.
- Dashboards are automatically generated, so all metrics are available for exploration immediately after installation.
- Dashboards are not just visualizing metrics; they are a tool, optimized for visual anomaly detection.
- Hundreds of pre-configured alarm templates are automatically attached to collected metrics.
How netdata collectors work:
Netdata uses internal and external plugins to collect data. Internal plugins run within the Netdata daemon, while external plugins are independent processes that send metrics to Netdata over pipes. There are also plugin orchestrators, which are external plugins with one or more data collection modules.
Requirements for the use-case
- History:
- Period: 1 year
- Granularity: (average)5min
- Data points: cpu, net traffic, ram (system-wide, per application)
- System: Ubuntu 18.04.3 LTS
- VM: Yes
- Applications:
- Ngingx
- phpfm
- drupal
- mariaDB (mysql, innodb style)
- Alarms:
- Critical: downtime (uptime monitor)
- Normal: uptime monitor, unusual senarios (e.g 70% cpu. 99% i/o filesystem)
- Medium: email
- Setup:
- Local registry
- 2 linux VMs webserver, one for upatras.gr, one for various minor sites
- situated in the same secure network, no need for vpn or authentication
Installation
In this section we go through the installation of the netdata platform and all the required libaries, packages and third party programs that will enable the monitoring according to the requirements.
This installation will be replicated to both the machines that will host a web-server, one will work as a slave that will forward metrics to the main machine where all the processing and storage will be conducted. It is explained in the Streaming section.
Netdata
There are multiple ways to install netdata, but the user is advised to use the one-line script that is provided by netdata. According to the requirements, this is the optimal route.
With the one-liner below, you install netdata and all the dependencies. You also agree that Netdata will aggregate anonymous statistics (such as OS version) that is necessary for rapid product development and discovery (prioritize OS binaries, decide features etc.)
bash <(curl -Ss [https://my-netdata.io/kickstart.sh](https://my-netdata.io/kickstart.sh)) --no-updates --stable-channel
comments:
no-updates: Prevent automatic updates of any kind.
stable-channel: Automatically update only on the release of new major versions (thus install latest stable release)
To inspect the installation url, the use can download it from here.
Now that Netdata is (hopefully) installed, please head over to http://localhost:19999 to verify correct installation.
Firewall issues
If you canât connect, itâs possible that you have a firewall installed and that you have to open the port 19999
. Below you can find relevant information and commands for Ubuntu & CentOS.
Ubuntu:
sudo ufw allow 19999/tcp
CentOS:
firewall-cmd --permanent --add-port 19999/tcp
Donât forget to enable the netdata service using sudo systemctl enable netdata
All Installation methods: https://docs.netdata.cloud/packaging/installer/
Installation using Package Manager
In a production environment, it is a good practice to use the Package Manager to install Netdata. Please do so by using the Package Manager of the distribution.
Before Installing Netdata from the Package Manager, we need to add the Netdata Repository from PackageCloud into the system package sources.
Run the following code to add the Repository into your systemâs package sources:
curl -s https://packagecloud.io/install/repositories/netdata/netdata/script.deb.sh | sudo bash
Now we can install netdata (e.g using apt-get install
in Ubuntu):
sudo apt-get install update
sudo apt-get install upgrade
sudo apt-get install netdata
Nginx Proxy - Security Alert
For increased security, it is advised that you configure Netdata to be only accessible from localhost and proxy the connection through an NginX Reverse Proxy server. Please follow the guide to set up the reverse proxy.
Necessary Libraries
As per the uproduction web-server scenario requirements, we need to install some external libraries in order for the plugins that enable the data collection (collectors) can be used by netdata.
Although netdata supports plugins from 4 different ecosystems (Bash, Python, NodeJS, Golang), we will be using Python Plugins as they cover our needs and are the most battle-tested.
News Bullet: Netdata is gradually implementing all the python add-ons to Golang, as Golang offers better performance (which is critical to minimise netdataâs footprint). Although Python plugins will still function as expected, after the transition, new versions of the plugins (e.g with more metrics) will likely appear as Go plugins.
Update/Install either Python 2.7+ or Python 3.1+
To test currently installed python version:
#python 2
python -V
#python 3
python3 -V
Important Note on Python Version
Currently, Netdata uses the systemâs default python version. Thus, for most systems (e.g Ubuntu), Netdata will use Python2.7. If python2.7 is the default one for your system, please install all the libraries for Python2.7.
All systems have a certain Python version already pre-installed. It is advised to use that one for the rest of the tutorial and not install a new one.
Install PiP (Python Package Manager)
Before proceeding, we will install PIP, a necessary Package Manager for all Python libraries.
Depending on the python version that we will be using, install PIP using the Package Manager of the distribution.
Python2:
sudo apt install python-pip
pip install --upgrade pip #update pip to latest version
Python3:
sudo apt install python3-pip
pip install --upgrade pip #update pip to latest version
Install Collector Libraries:
MariaDB(Mysql):
To monitor the MariaDB server, we will be needing the python library python-mysqldb
. Please follow the instructions bellow to install the package system-wide.
It is worth mentioning that depending on your setup, it is possible that you will be needing to use sudo pip
. In that case, please use the -H flag as follows: sudo -H
. It is needed to force pip to install the package to the systemâs home directory, thus making the package available system-wide.
Ubuntu
Depending on whether you use Python2 or Python3, the installation of the required packages and libraries is different. The packages are required so that the Python Library that interfaces with MariaDB can function properly.
Python2
sudo apt-get install default-libmysqlclient-dev
sudo apt-get install python-dev
pip install mysqlclient
Python3
sudo apt-get install default-libmysqlclient-dev
sudo apt-get install python3-dev
pip3 install mysqlclient
CentOS
Python3
yum install mariadb-devel
pip3 install mysqlclient
Python2
yum install mariadb-devel
pip install mysqlclient
Github page: https://github.com/PyMySQL/mysqlclient-python
Docs: https://mysqlclient.readthedocs.io/
Netdata Configuration
Now that everything is installed, you can restart netdata by running sudo systemctl restart netdata
and head over to the dashboard http:<netdata_host_ip>:19999
. Netdata automatically detects data sources and enable plugins.
General Configuration
The configuration of the netdata agent is conducted through a helper script which is found at:
/etc/netdata/edit-config
. It is possible that sudo
is needed to use the script as the script fetches the default configuration, opens it for the user to edit and then saves it in /etc/netdata
.
The main configuration file is named: netdata.conf.
The netdata.conf
file is broken up into various sections, such as [global], [web], [registry], and more. By default, most options are commented, so youâll have to uncomment them (remove the #) and restart netdata for Netdata to recognise your change.
We will be leaving the configuration at default for an initial testing period, before we proceed to any modifications to address specific needs. Bellow, the user can find all relevant documentation for easy access.
One metrics we do need to change, is the storage capacity of the dbengine, which is the default saving mechanism and our choice as per the requirements.
Note that there are several options regarding the mechanism with which netdata will save data, including the dbengine and a Round-Robin in-memory only database. For more information: https://docs.netdata.cloud/database
Collectors (Data Collection)
Netdata supports internal and external data collection plugins:
- internal plugins are written inÂ
C
 and run as threads inside theÂnetdata
 daemon. - external plugins may be written in any computer language and are spawn as independent long-running processes by theÂ
netdata
 daemon. They communicate with theÂnetdata
 daemon viaÂpipes
 (stdout
 communication).
To minimize the number of processes spawn for data collection, Netdata also supports plugin orchestrators.
-
plugin orchestrators are external plugins that do not collect any data by themeselves. Instead they support data collection modules written in the language of the orchestrator. Usually the orchestrator provides a higher level abstraction, making it ideal for writing new data collection modules with the minimum of code.
Currently Netdata provides plugin orchestrators BASH v4+ charts.d.plugin, node.js node.d.plugin and python v2+ (including v3) python.d.plugin.
We will be using the Python plugin orchestrator, each of the collectors states below is in essence a module of the Python.d plugin.
General Information
When Netdata starts, it auto-detects dozens of data sources, such as database servers, web servers, and more. To auto-detect and collect metrics from a service or application you just installed, you need to restart Netdata.
Download and install all the software that you want to monitor (in our case: nginx, PHP-fmp, MariaDB) prior to the netdata installation, to leverage automatic discovery)
We can configure both internal and external plugins, along with the individual modules.
- InÂ
netdata.conf
,Â[plugins]
 section: Enable or disable internal or external plugins withÂyes
 orÂno
. - InÂ
netdata.conf
,Â[plugin:XXX]
 sections: Each plugin has a section for changing collection frequency or passing options to the plugin. - InÂ
.conf
 files for each external plugin: Enable/Disable specific modules or set plugin specific configurations. For example, atÂ/etc/netdata/python.d.conf
. - InÂ
.conf
 files for each module. Set module specific configurations, such as the metrics endpoint. : For example, atÂ/etc/netdata/python.d/nginx.conf
.
We will be leaving the configurations at default for the testing period. In case you have installed an application with non-standard settings (e.g different socket location) please do make the proper change in the appropriate .conf
file. Otherwise, no configuration is needed (unless explicitly stated in this guide).
#Example
sudo /etc/netdata/edit-config python.d/nginx.conf
Chosen Collectors:
As per the requirements of production web-server, the following plugins were chosen:
- web_log: Parses the log files of known servers (e.g nginx) and detects irregularities
- mysql: connects to a mysql database and fetches various metrics
- php-FPM: Uses the /metrics endpoint of a PHP-FPM powered server and fetches metrics
- nginx: Uses the metrics endpoint( socket, HTTP endpoint) and fetches metrics
Notes on editing the configuration files:
Python plugins use YAML files to be configured, and one of the most important common configurable variable, is the resource address that each plugin will reach to gather statistics.
To facilitate configuration, most Netdata plugins have already pre-configured a number of possible endpoints for each plugin. For example, mysql
searches on multiple directories for the right Unix socket
It does so by defining different jobs
, each job
for one endpoint .
Note that if two or more jobs have the same name
, only one will run. Meaning that the plugin will use the endpoint for which itâs job was the first one to succeed in retrieving information.
Thus, when configuring the plugins, you can edit only one job to point it to the right direction (for example, if you have changed the default directory for the nginx_status_page
.
It is also advised to change the name of all the jobs to the desired name (that will be used by the dashboard) so that you donât have to find out specifically which job is the one that is activated by the plugin.
đ web_log
Parses the log file of various web servers (nginx, apache, gunicord, etc.) to detect the following scenarios:
- too many redirects
- too many bad requests
- too many internal server errors
-
unreasonably too many requests
- unreasonably few requests
- unreasonably slow responses
- too few successful responses
đ§ Edit the configuration file to point the collector to the correct weblog metrics endpoint ( the logs of your web server e.g apache, nginx, etc. ).
sudo /etc/netdata/edit-config python.d/web_log.conf
Make sure that web_server logs and the relevant directory is accessible by the user netdata
đ docs page: https://docs.netdata.cloud/collectors/python.d.plugin/web_log/
đ mysql
It monitors one or many mysql servers. To function as intended it requires a python library (as mentioned above) and a netdata user to access the database and gather metrics.
To create the netdata
 user, execute the following in the MySQL shell:
create user 'netdata'@'localhost';
grant usage on *.* to 'netdata'@'localhost';
flush privileges;
đ§ Edit the configuration file to point the collector to the correct mysql metrics endpoint.
đPlease note that the default settings probably already cover your use-case.
sudo /etc/netdata/edit-config python.d/mysql.conf
đ docs page: https://docs.netdata.cloud/collectors/python.d.plugin/mysql/
đ PHP-FPM:
This module will monitor one or more php-fpm instances depending on configuration.
đ§ Edit the configuration file to point the collector to the correct PHP-FPM metrics endpoint.
sudo /etc/netdata/edit-config python.d/phpfpm.conf
đdocs page: https://docs.netdata.cloud/collectors/python.d.plugin/phpfpm/
đ nginx:
This module will monitor one or more nginx servers depending on configuration. Servers can be either local or remote.
đ§ Edit the configuration file to point the collector to the correct nginx metrics endpoint.
sudo /etc/netdata/edit-config python.d/nginx.conf
đdocs page: https://docs.netdata.cloud/collectors/python.d.plugin/nginx/
httpcheck:
Module monitors remote http server for availability and response time.
The user must edit the .conf
file to point it to the web serverâs location. You can use regex expression, as stated in the configuration file, in order for the plugin to search a specific string in the website.
This is handful in case the PHP server has crashed, so nginx does not return the proper website, but rather the nginx home screen. HTTP (200) is returned either way, meaning that httpcheck without regex canât tell the difference.
sudo /etc/netdata/edit-config python.d/httpcheck.conf
đdocs page: https://docs.netdata.cloud/collectors/python.d.plugin/httpcheck/
DB engine
DB engine is the custom database implemented and used by netdata. It stores most recent data in-memory and âspillsâ historical data into the disk, compressing them to increase storage performance even further. Notably, it is optimised for HDDs, offering performance that is acceptable versus the more efficient SSDs.
The configuration variables that are of interest to us are the following:
[global]
page cache size = 32
dbengine disk space = 256
page cache size
 sets the maximum amount of RAM (in MiB) the database engine will use for caching and indexing. dbengine disk space
 sets the maximum disk space (again, in MiB) the database engine will use for storing compressed metrics.
These default settings will retain about a dayâs worth of metrics when Netdata collects roughly 4,000 metrics every second. If you increase either page cache size
 or dbengine disk space
, Netdata will retain even more historical metrics.
The use-case of production web-server involves roughly 2,000 metrics per second, thus the default settings will save 2 days worth of data. For a retention period of 1 week, the following settings are advised.
When using the dbengine, the history
configuration variable is irrelevant, please ignore it.
[global]
page cache size = 32
dbengine disk space = 1024
Note that both fields affect the memory footprint of the dbengine
and as a result netdata in general. For the specific use-case tests were conducted and was decided that the change in memory consumption is not noteworthy. For more info: https://docs.netdata.cloud/database/engine/#memory-requirements
đ With the database engine active, you can back up your /var/cache/netdata/dbengine/ folder to another location for redundancy.
dbengine docs: https://docs.netdata.cloud/database/engine
dbengine & history retention period: https://docs.netdata.cloud/docs/tutorials/longer-metrics-storage/
Health & Alarms
Each Netdata node runs an independent thread evaluating health monitoring checks. This thread has lock free access to the database, so that it can operate as a watchdog.
Netdata also supports alarm templates, so that an alarm can be attached to all the charts of the same context (i.e. all network interfaces, or all disks, or all mysql servers, etc.).
Each alarm can execute a single query to the database using statistical algorithms against past data, but alarms can be combined. So, if you need 2 queries in the database, you can combine 2 alarms together (both will run a query to the database, and the results can be combined).
For the given use case, we will be using the stock alarms as they cover all the scenarios. The user is advised to adjust the alarms in the future for more fine-grained controls.
To edit the configuration files for the alarms of each plugin, we will use the edit-config
command, like this:
# Example for editing the alarms of the httpcheck plugin
sudo /etc/netdata/edit-config health.d/httpcheck.conf
To get an easy overview over the active alarms, head over to the dashboard and click on the Alarms section at the top. After clicking the category all, you can overview the activated alarms, their configuration and an apt description.
Notifications
An important aspect of the alarms component is the notification functionality. Netdata is able to inform the user on important events, via multiple mediums, from e-mail to discord integration.
To configure the alarming configuration (such as email recipient, etc.) we edit the following configuration file:
sudo /etc/netdata/edit-config health_alarm_notify.conf
Since we will be using Email
, we only edit the following variables inside the configuration file, moreover if we want to enable the notifications ONLY for critical alarms, please follow the mentioned structure:
DEFAULT_RECIPIENT_EMAIL="YOUR_EMAIL"
# to receive only critical alarms, set it to "YOUR_EMAIL|critical"
Email Configuration
To configure email with netdata, the user needs to configure a terminal email client. sendemail
is advised as is one of the most common clients and also the default choice for netdata.
To configure please follow the instructions: https://dbaron.org/linux/sendmail
For more information on the syntax of the alarms (in order to fine tune the alarm criteria or add new alarms), please visit the health.d docs and the badges doc (they use same structure).
Health.d: https://docs.netdata.cloud/health/
Badges: https://docs.netdata.cloud/web/api/badges/
Streaming
Each Netdata is able to replicate/mirror its database to another Netdata, by streaming collected metrics, in real-time to it. This is quite different to data archiving to third party time-series databases.
When Netdata streams metrics to another Netdata, the receiving one is able to perform everything a Netdata instance is capable of:
- visualize them with a dashboard
- run health checks that trigger alarms and send alarm notifications
- archive metrics to a backend time-series database
Per the requirements, we will be using a slave-master setup, where a node will behave as the master node and gateway to the system, while each slave will only serve as aggregator of data.
In essence the Master Netdata node will have 2 databases from which it will draw charts and raise alarms.
- The first database is from the machine where the master node is installed
- The second database is from the machine where the slave node is installed
Master Node
When setting up the Master Node, we will be configuring settings for API_KEYS and not for distinct slaves, meaning that master node will treat each slave that is configured with an identical API_KEY, the same.
Depending on the use case, we might want to have different settings for different slave machines (e.g different history retention policy). We generate N API_KEYS
for N
distinct slave groups
. All the slaves
that belong to the same slave group
will be treated the same by the Master Node
.
To generate N API_KEYS
, we run N times the command: uuidgen
. In our case, we generate only one (1) API_KEY
.
Now it is time to setup the Master Node:
- Setup netdata (without installing any extra libraries, as Netdata Master will only monitor the system and itself, serving as the central hub of the metrics from all the slaves).
- edit
stream.conf
and set[stream].enabled: yes
sudo /etc/netdata/edit-config stream.conf
- The
API_KEY
mentioned bellow is theAPI_KEY
we generated usinguuidgen
.
[API_KEY]
enabled = yes
default history = 3600
default memory mode = dbengine
health enabled by default = auto
allow from = *
- For the initial testing period we wonât be editing anything else. You can return later to customise it according to your specific needs.
- Since both the machines belong to the same secured network, we wonât be needing authentication and TLS support.
You can find in-detail explanation of all the available configuration options for the streaming functionality in the docs.
đdocs: https://docs.netdata.cloud/streaming/
Slave Node
The setup of each slave Node is identical:
- Setup netdata and the required libraries according to the guide above
- When setting up the dbengine, use the default settings that will allow close to 2 days word of data to be stored in the slave for redundancy.
- edit
stream.conf
- Use the
API_KEY
that was generated in the Master Slave phase. sudo /etc/netdata/stream.conf
- Use the
[stream]
enabled = yes
destination = IP:PORT #Master
api key = API_KEY #generated in step (1)
Now each slave node is configured to stream the entirety of the metrics that it collects to the master node.
Registry
Using Netdata, your monitoring infrastructure is embedded on each server, limiting significantly the need of additional resources. Netdata is blazingly fast, very resource efficient and utilizes server resources that already exist and are spare (on each server). This allows scaling out the monitoring infrastructure.
However, the Netdata approach introduces a few new issues that need to be addressed, one being the list of Netdata we have installed, i.e. the URLs our Netdata servers are listening.
To solve this, Netdata utilizes a central registry. This registry, together with certain browser features, allow Netdata to provide unified cross-server dashboards.
The registry keeps track of 4 entities:
-
machines: i.e. the Netdata installations (a random GUID generated by each Netdata the first time it starts; we call this machine_guid)
For each Netdata installation (eachÂ
machine_guid
) the registry keeps track of the different URLs it is accessed. -
persons: i.e. the web browsers accessing the Netdata installations (a random GUID generated by the registry the first time it sees a new web browser; we call this person_guid)
For each person, the registry keeps track of the Netdata installations it has accessed and their URLs.
-
URLs of Netdata installations (as seen by the web browsers)
For each URL, the registry keeps the URL and nothing more. Each URL is linked to persons and machines.
-
accounts: i.e. the information used to sign-in via one of the available sign-in methods. Depending on the method, this may include an email, an email and a profile picture. Only applicable via cloud log-in.
Local Registry
For security purposes and per the production web-server requirements, we will be establishing an agent node as a local registry and then configure all the netdata nodes to advertise themselves to that registry. This way, the user will be able to access the local registry and easily navigate towards all the available netdata instances.
đ§ To turn any Netdata into a registry, edit /etc/netdata/netdata.conf
and set:
[registry]
enabled = yes
registry to announce = http://<netdata_host_ip>:19999
registry hostname = Master node #hostname used only in the registry
And now set the other netdata instances to advertise to that registry, edit again /etc/netdata/netdata.conf
:
[registry]
enabled = no
registry to announce = http://<netdata_host_ip>:19999
registry hostname = Slave node #hostname used only in the registry
No additional configuration is required. Restart netdata and head over to the Dashboard to evaluate setup.
đdocs: https://docs.netdata.cloud/registry/
Backends (Long-term Storage)
Up until this point, netdata has been setup to save data for 1 week in the Master Node and for 1 day in the slave Node.
Remember that the slaveâs database is been replicated to the Master Node, with retention period of 1 week.
To enable long-term historic data we will be using the backends functionality of netdata. According to the requirements, we will be using Prometheus to grab data from the Master Node and save it in a time-series long term database.
In essence, we will point Prometheus to netdata in order to fetch data and Grafana to Prometheus in order to visualize them.
We will set the Prometheus sampling at 3m, since for historic data 1s accuracy is not required.
Prometheus Installation
We will be installing Prometheus on the machine where the Netdata Master Node runs.
To install Prometheus run the following commands:
cd /tmp && curl -s https://api.github.com/repos/prometheus/prometheus/releases/latest \
| grep "browser_download_url.*linux-amd64.tar.gz" \
| cut -d '"' -f 4 \
| wget -qi -
# create Prometheus user
sudo useradd -r prometheus
# create prometheus directory
sudo mkdir /opt/prometheus
sudo chown prometheus:prometheus /opt/prometheus
# Untar prometheus directory
sudo tar -xvf /tmp/prometheus-*linux-amd64.tar.gz -C /opt/prometheus --strip=1
Prometheus Configuration
Prometheus is setup using .yaml
.
- run
touch prometheus.yaml
sudo nano prometheus.yaml
- Copy the following configuration, replace NETDATA_IP with localhost:
# my global config
global:
scrape_interval: 3m # Set the scrape interval to every 5 seconds. Default is every 1 minute.
evaluation_interval: 5s # Evaluate rules every 5 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Attach these labels to any time series or alerts when communicating with
# external systems (federation, remote storage, Alertmanager).
external_labels:
monitor: "codelab-monitor"
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first.rules"
# - "second.rules"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["0.0.0.0:9090"]
- job_name: "netdata-scrape"
metrics_path: "/api/v1/allmetrics"
params:
# format: prometheus | prometheus_all_hosts
# You can use `prometheus_all_hosts` if you want Prometheus to set the `instance` to your hostname instead of IP
format: [prometheus_all_hosts] #stream both Master & Slave node
#
# sources: as-collected | raw | average | sum | volume
# default is: average
#source: [as-collected]
#
# server name for this prometheus - the default is the client IP
# for Netdata to uniquely identify it
#server: ['prometheus1']
honor_labels: true
static_configs:
- targets: ["<netdata_host_ip>:19999"]
Some notes on the configuration :
- We set the source as
average
which is the default setting. It means that Prometheus will store the average value of each interval (3m). If we setas-collected
, we would be saving the exact value of each metric every 3 minutes. - Prometheus_all_hosts: It facilitates storage & retrieval, since each metric is saved with a prefix depending on the host_machine from where it originated (remember, we have 2 machines).
- Prometheus needs only to point to the Master Node. Master Node will export both itâs databases.
- For all intents and purposes, we will be using the default settings for Prometheus
Test Prometheus
đđđđ Go to [http://localhost:9090](http://localhost:9090)
to verify that everything is working.
Please donât forget to verify that the systemâs firewall allows prometheus to be accessed from the web. For more information, please refer back to the Firewall Section.
You can read more in detail on how to setup Prometheus for more customised options at the docs:
đ https://prometheus.io/docs/prometheus/latest/getting_started/
Prometheus Service
Finally, itâs needed to create and install prometheus as a service. Follow the commands:
sudo touch /etc/systemd/system/prometheus.service
sudo nano /etc/systemd/system/prometheus.service
and paste the following text inside the service file:
[Unit]
Description=Prometheus Server
AssertPathExists=/opt/prometheus
[Service]
Type=simple
WorkingDirectory=/opt/prometheus
User=prometheus
Group=prometheus
ExecStart=/opt/prometheus/prometheus --config.file=/opt/prometheus/prometheus.yml --log.level=info
ExecReload=/bin/kill -SIGHUP $MAINPID
ExecStop=/bin/kill -SIGINT $MAINPID
[Install]
WantedBy=multi-user.target
Finally
sudo systemctl start prometheus
sudo systemctl enable prometheus
Prometheus and High-Availability
At this point, it is important to make a succinct note on Prometheus as a production-ready Long-Term Storage solution. In reality Prometheus is not intended to be used for long-term archiving, despite itâs efficiency (1.3 bytes/sample) because it does not support High-Availability and Fault-Tolerance capabilities out of the box.
It is recommended for the user to research the following Open-Source projects that are placed on top of Prometheus and provide such functionality:
Grafana
Installation
The installation of Grafana is more straight forward, per the use-case and for easy management, the use of docker containers is advised.
Install docker:
sudo apt install apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable"
sudo apt update
sudo apt install docker-ce
Run Grafana container:
docker run -i -p 3000:3000 grafana/grafana
Note that using the above command, we run Grafana as a âprogramâ and itâs more about testing our setup, rather than deploying in for production.
In order to ensure that Grafana will always run, we need to make docker start on reboot, following the Docker documentation and then run the container using the following argument:
docker run -i -p 3000:3000 --restart unless-stopped grafana/grafana
Grafana Configuration
Please donât forget to verify that the systemâs firewall allows Grafana to be accessed from the web. For more information, please refer back to the Firewall Section.
- Visit http://localhost:3000/
- Login with username: Admin, Password: Admin
- Click Add data source and point it to Prometheus (
http://localhost:9090
) - Finally, start by creating a new dashboard and adding graphs.
đMore information on starting with Grafana: https://grafana.com/docs/guides/getting_started/
Acknowledgement
I would like to thank Spyros Konstantopoulos, System Administrator at the University of Patras and Dimitrios Giannopoulos, PhD student at the Network Architecture & Management Group of the ECE department, for their feedback and insightful comments on this work.