Post

Monitoring Journey (2) - Prometheus, Loki, Grafana Installation & Connection

From direct installation to connection & visualization of Prometheus, Loki, Grafana A to Z

Monitoring Journey (2) - Prometheus, Loki, Grafana Installation & Connection

This post has been translated from Korean to English by Gemini CLI.

This content consists of methods for directly installing monitoring (Loki-Prometheus-Grafana) on internal instances during a project. If there is any incorrect content or a more convenient method, please leave a comment or contact me at joyson5582@gmail.com!

This content continues from the previous content.

Prerequisites

These contents were carried out on EC2 Ubuntu (architecture - 64-bit Arm). Each service was installed on a separate EC2 instance. - Easy to change

  • All instances except Grafana are internal instances with no public IP.
  • I will not explain in detail how to access each internal instance. (You can move internally by putting Key.pem in Public EC2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Notifies `systemd` that the service creation file has been changed and loads it anew.
sudo systemctl daemon-reload

# Starts the service.
sudo systemctl start prometheus.service

# Sets the service to start automatically.
sudo systemctl enable prometheus.service

# Restarts the service.
sudo systemctl restart prometheus.service

# Checks the service status.
sudo systemctl status prometheus.service

At this time, if you type these commands directly, status may show RUNNING when it is running. If you don’t know this and just move on, you might wander around for a long time (🥲🥲), so wait and try typing status one more time.

That is, if the service changes?

1
2
sudo systemctl daemon-reload
sudo systemctl restart prometheus.service

Service file log again -> Restart

If the configuration file changes?

1
sudo systemctl reload prometheus.service

You can just reload, but it’s better to restart.

Prometheus

To start, go to the instance where Prometheus will be installed.

1
2
3
wget https://github.com/prometheus/prometheus/releases/download/v2.54.1/prometheus-2.54.1.linux-arm64.tar.gz

tar -xvzf prometheus-2.54.1.linux-arm64.tar.gz

Go to releases, install the file that suits you, and decompress it. After decompressing, enter the folder and modify the settings through vi prometheus.yml.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ['localhost:9100']

 - job_name: "develop-database"
    static_configs:
      - targets: ['internal_server<IP:Port>']
        labels:
          alias: 'Dev DB'
          
  - job_name: 'develop-server'
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: ['internal_server<IP:Port>']
        labels:
          alias: 'Dev Server'
  • scrape_configs is where you define various tasks to monitor. (Actually, if you’re not going to show Prometheus on the web, the job_name:"prometheus" part is not necessary, but I didn’t delete it to check through internal requests.)

  • static_configs is where you fixedly define the metric targets to collect.

    • metrics_path: The path to send metric requests.
    • We can attach additional labels to the request. (AWS EC2’s internal IP may change, so it is identified by labels. - It can be found dynamically through service discovery, but this will be implemented much later.)

There is promtool in the folder.

1
2
3
4
./promtool check config prometheus.yml

Checking prometheus.yml
 SUCCESS: prometheus.yml is valid prometheus config file syntax

You can check the syntax of the config file through this.

Internal settings are complete. Now let’s set up the service file.

sudo vi /etc/systemd/system/prometheus.service Let’s create a service through.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[Unit]
Description=Prometheus Server
Wants=network-online.target
After=network-online.target

[Service]
User=root
Group=root
Type=simple
ExecStart=/home/ubuntu/prometheus-2.5.1.linux-arm64/prometheus \
        --config.file=/home/ubuntu/prometheus-2.5.1.linux-arm64/prometheus.yml \
        --web.listen-address=":80"
[Install]
WantedBy=multi-user.target
  • Wants, After: The service wants the network (network-online) to be on before it starts and to run after it is fully configured.
  • User, Group: You can create a separate user, but it seemed like over-engineering (since it’s an internal server + key is needed anyway), so I specified root.
  • Type simple: Executes the command in ExecStart as a single process.
  • ExecStart: Specifies the executable file + sets options.
  • WantedBy: Makes the service start automatically.
1
2
3
4
5
sudo systemctl daemon-reload
sudo systemctl start prometheus.service
sudo systemctl enable prometheus.service
sudo systemctl restart prometheus.service
sudo systemctl status prometheus.service

After typing the command,

500

If it appears like this, it’s successful.

MySQL Exporter Installation

Why install?

It is very difficult to directly extract necessary information from the DB. This Exporter makes it easy to extract and transmit necessary information.

Let’s move to the internal Instance where MySQL is installed.

1
2
wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.14.0/mysqld_exporter-0.14.0.linux-arm64.tar.gz
tar xzvf mysqld_exporter-0.14.0.linux-arm64.tar.gz

Move to the folder and create vi mysqld_exporter.cnf settings.

1
2
3
[client]
user=exporter
password=exporter_password

Then, connect to MySQL.

1
2
3
4
CREATE USER 'exporter'@'localhost' IDENTIFIED BY 'exporter_password' WITH MAX_USER_CONNECTIONS 2; 
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'localhost'; 
FLUSH PRIVILEGES; 
EXIT;

Create a USER to collect metrics and grant permissions.

Create a service file through sudo vi /etc/systemd/system/mysqld_exporter.service.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
[Unit]
Description=MySQL Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=root
Group=root
Type=simple
Restart=always
ExecStart=/home/ubuntu/mysqld_exporter-0.14.0.linux-arm64/mysqld_exporter \
--config.my-cnf /home/ubuntu/mysqld_exporter-0.14.0.linux-arm64/mysqld_exporter.cnf \
--collect.global_status \
--collect.info_schema.innodb_metrics \
--collect.auto_increment.columns \
--collect.info_schema.processlist \
--collect.binlog_size \
--collect.info_schema.tablestats \
--collect.global_variables \
--collect.info_schema.query_response_time \
--collect.info_schema.userstats \
--collect.info_schema.tables \
--collect.perf_schema.tablelocks \
--collect.perf_schema.file_events \
--collect.perf_schema.eventswaits \
--collect.perf_schema.indexiowaits \
--collect.perf_schema.tableiowaits \
--collect.slave_status

[Install]
WantedBy=multi-user.target
  • Add contents to collect.
1
2
3
4
5
sudo systemctl daemon-reload
sudo systemctl start mysqld_exporter
sudo systemctl enable mysqld_exporter
sudo systemctl restart mysqld_exporter
systemctl status mysqld_exporter

Restart and configure the service, and Mysql_Exporter is done!

If you want to check if the DB is collecting data properly? You can check through curl http://<DB server internal address:port>/metrics.

If you want to check if Prometheus is receiving DB data? curl http://<Prometheus server internal address:port>/api/v1/targets | jq .

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
{
	"discoveredLabels": {
	  "__address__": "XXX",
	  "__metrics_path__": "/metrics",
	  "__scheme__": "http",
	  "__scrape_interval__": "15s",
	  "__scrape_timeout__": "10s",
	  "alias": "Dev DB",
	  "job": "dev-database"
	},
	"labels": {
	  "alias": "Dev DB",
	  "instance": "10.0.100.36:9104",
	  "job": "dev-database"
	},
	"scrapePool": "dev-database",
	"scrapeUrl": "XXX/metrics",
	"globalUrl": "XXX/metrics",
	"lastError": "",
	"lastScrape": "2024-09-08T07:48:50.058573713Z",
	"lastScrapeDuration": 0.086236764,
	"health": "up",
	"scrapeInterval": "15s",
	"scrapeTimeout": "10s"
}

If health is up like this, it is successfully fetching.

Grafana

https://grafana.com/grafana/download Go to this site and download the appropriate Grafana.

1
2
wget https://dl.grafana.com/enterprise/release/grafana-enterprise_11.2.0_amd64.deb
sudo apt install ./grafana-enterprise_11.2.0_arm64.deb

Install dependencies.

1
2
3
4
5
6
sudo systemctl start grafana-server.service 
sudo systemctl enable grafana-server.service 
sudo systemctl stop grafana-server.service 
sudo systemctl restart grafana-server.service
sudo systemctl status grafana-server.service
sudo netstat -ntap | grep LISTEN | grep 3000

This automatically installs and opens the service.

If you want to change the port? Create a service file sudo vi /etc/systemd/system/grafana-server.service and add it.

1
2
3
4
[Service]
User=root
Group=grafana
ExecStart=/usr/sbin/grafana-server --config=/etc/grafana/grafana.ini --homepath=/usr/share/grafana

In sudo vi /etc/grafana/grafana.ini file,

1
2
# The http port to use
http_port = 80

Change the port. (You also need to set User=root above to access port 80)

Loki

1
2
sudo apt-get update
sudo apt-get install -y wget unzip

Since the installer is a .zip file, you need to install unzip.

1
2
3
4
wget https://github.com/grafana/loki/releases/download/v2.9.0/loki-linux-arm64.zip
unzip loki-linux-arm64.zip
chmod +x loki-linux-arm64
sudo mv loki-linux-arm64 /usr/local/bin/loki

Install and move.

1
2
sudo mkdir /etc/loki
sudo vi /etc/loki/loki-config.yaml

Write the configuration file.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
auth_enabled: false

server:
  http_listen_port: 80

ingester:
  lifecycler:
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1
    final_sleep: 0s
  chunk_idle_period: 5m
  chunk_retain_period: 30s
  max_transfer_retries: 0

schema_config:
  configs:
    - from: 2024-09-05
      store: boltdb-shipper
      object_store: filesystem
      schema: v11
      index:
        prefix: index_
        period: 24h

storage_config:
  boltdb_shipper:
    active_index_directory: /tmp/loki/index
    cache_location: /tmp/loki/boltdb-cache
    shared_store: filesystem
  filesystem:
    directory: /tmp/loki/chunks

limits_config:
  enforce_metric_name: false
  reject_old_samples: true
  reject_old_samples_max_age: 480h

chunk_store_config:
  max_look_back_period: 480h

table_manager:
  retention_deletes_enabled: false
  retention_period: 0s

Ingester

Loki receives log data and stores it in chunks. After a certain period, it is stored in long-term storage. At this point, we decided that we don’t need to permanently store logs (in Loki). Therefore,

  • replication_factor: Maintain only one replica.
  • kvstore: Maintain key-value store in memory.
  • max_transfer_retries: Number of retries on failure.
  • chunk_retain_period: Time to keep inactive chunks in memory. If necessary, let’s find out more and configure it.

    schema_configs

    This is the schema Loki uses when storing & indexing data.

  • from: Schema applies from this date.
  • store: Log index storage method (specified as boltdb-shipper: stores indexes in local file system + synchronizes to central shared storage).
  • object_store: Log data storage.
  • schema: Schema version (differences in indexing and storage methods per version).
  • index.prefix: Prefix to attach to the index file name.
  • index.period: Created periodically (meaning a new index is created).

    storage_config

    This is the storage configuration.

  • active_index_directory: Directory to store indexes on local disk.
  • cache_location: Directory to cache indexes.
  • shared_store: Uses file system as central storage.

    limit_config

    This is the configuration for data storage limits.

  • reject_old_samples: Set to not collect old samples (if true, log data older than the set period is rejected).
  • reject_old_samples_max_age: Log samples older than the specified time are rejected (limits logs older than the date from being collected).

    chunk_store_config

    This is the configuration for data (chunk) storage.

  • max_look_back_period: Maximum period that can be queried (if 0s, past data query is disabled).

The configuration file is done, and let’s create the service as above.

1
sudo vi /etc/systemd/system/loki.service
1
2
3
4
5
6
7
8
9
10
11
12
[Unit]
Description=Loki Log Aggregation System
After=network.target

[Service]
User=root
ExecStart=/usr/local/bin/loki --config.file /etc/loki/loki-config.yaml
Restart=on-failure

[Install]
WantedBy=multi-user.target

1
2
3
sudo systemctl daemon-reload
sudo systemctl restart loki
sudo systemctl status loki

curl -G -s "http://10.0.100.87/loki/api/v1/query" --data-urlencode 'query={app="corea"}' | jq .

If you send a request to Loki like this and values exist, it’s a success.

1
2
3
4
5
6
7
8
9
10
11
{
                    "level":"WARN",
                    "class":"c.exception.ExceptionResponseHandler",
                    "requestId":"c79d9efd-12cb-4c6a-9c17-8c01a30f53b0,
                    "message": "No Resource exception [errorMessage = No static resource .env., cause = null,error ={}]"
                    }
org.springframework.web.servlet.resource.NoResourceFoundException: No static resource .env.
	at org.springframework.web.servlet.resource.ResourceHttpRequestHandler.handleRequest(ResourceHttpRequestHandler.java:585)
	at org.springframework.web.servlet.mvc.HttpRequestHandlerAdapter.handle(HttpRequestHandlerAdapter.java:52)
...


Grafana, Loki, Prometheus (+Mysql Exporter) installation is complete, and now let’s learn how to put data in and use it.

Spring - Prometheus

Spring Boot makes Prometheus very easy to use. Simply install dependencies & expose actuators, and you’re done.

1
2
implementation 'org.springframework.boot:spring-boot-starter-actuator'
implementation 'io.micrometer:micrometer-registry-prometheus'

Install dependencies.

1
2
3
4
5
6
management:
  endpoints:
    web:
      exposure:
        include: health,prometheus
        exclude: threaddump, heapdump

As such, you can explicitly expose it in application.yml. (At this time, if you are thinking about security, configure Prometheus to only allow internal communication - because thread pool status, resource usage, execution environment path, etc. can be sensitive.)

1
2
3
4
5
6
7
8
9
10
11
management:
  server:
    port: 9091
  endpoints:
    web:
      exposure:
        include: prometheus
  metrics:
    export:
      prometheus:
        enabled: true

As such, Prometheus alone can also be on port 9091.

Spring - Loki

Loki receives log data. That means the Spring server needs to send data to Loki.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
<included>
    <appender name="LOKI" class="com.github.loki4j.logback.Loki4jAppender">
        <http>
            <url>${LOKI_URL}</url>
        </http>
        <format>
            <label>
                <pattern>app=Corea,host=${HOSTNAME},level=%level</pattern>
                <readMarkers>true</readMarkers>
            </label>
            <message>
                <pattern>
                    {
						"level":"%level",  
						"class":"%logger{36}",  
						"requestId":"%X{requestId}",  
						"time":"%date{yyyy-MM-dd'T'HH:mm:ss.SSSZ}",  
						"thread":"%thread",  
						"message": "%message"
                    }
                </pattern>
            </message>
        </format>
    </appender>
</included>

I specified the appender as above. The pattern (index) was written as application name (can be separated into Corea-Prod/Corea-Dev later), host (IP address), and level (log level).

The message consists of: Log level, originating class, request ID, time, executing thread, and body.

I judged that this is sufficient for now, so I configured it this way, and each person can configure what they need.

1
2
3
4
5
6
7
8
9
{
	"level":"DEBUG",
	"class":"c.room.controller.RoomController",
	"requestId":"29243afd-87c5-4059-9e55-af4c8b6236f7",
	"time":"2024-09-22T01:05:50.733+0900",
	"thread":"http-nio-8080-exec-11",
	"message": "return [time=2024-09-22T01:05:50.733171980, ip=58.143.138.249,url=http://api.code-review-area.com/rooms/opened, httpMethod=GET, class=corea.room.controller.RoomController, method=openedRooms, elapsedMillis=7
result=class corea.room.dto.RoomResponses]"
}

It comes like this. Then, let’s use these logs visually.

Grafana

Data Source Connection

Now, it’s time to connect Prometheus and Loki that we configured above.

Connection - Add new connection - Select the data source to connect.

500

Specify the server URL and connect. (Below that, isn’t it the realm of complex experts?) At the very bottom, you can save and test through save & test.

Grafana Dashboard Configuration

Now, we can customize the dashboard based on the data we received from the data source. However,

500

It’s very difficult to do it for the first time in a state where you don’t know anything. Therefore, there are many dashboards already created by others, and you can import them.

I used these dashboards first:

New -> Import -> Enter ID -> Load brings up existing dashboards configured by others.

500

500

As such, it shows meaningful data simply! 🙂

DB Connection

This is something I learned a bit after the existing monitoring work, and it’s very useful, so I’m writing about it. If you look at Connections - Add new Connections, you’ll see DBs like MySQL.

500

Below, Connection - Host URL, Database name Authentication - Username, Password Put them in and Save & Test to connect to the DB!

500

In this way, you can easily specify through the Builder, and conveniently query & download visually. It makes it possible to access the DB from outside and perform direct operations, which is difficult to do otherwise!!!!

So, are there only advantages?

1
The database user should only be granted SELECT permissions on the specified database & tables you want to query. Grafana does not validate that queries are safe so queries can contain any SQL statement. For example, statements like USE otherdb; and DROP TABLE user; would be executed. To protect against this we Highly recommend you create a specific MySQL user with restricted permissions. Check out the docs for more information.

If you look at the User Permission above, there is a paragraph like this. It means: The DB user should only be granted SELECT permissions. Since query safety is not validated, queries can contain any SQL statement. Therefore, it is recommended to create and use a specific MySQL user with restricted permissions.

500

Deletion is also possible like this 🚨🚨 But, what’s the big deal? (It’s not just querying from outside, but also creating, deleting, and modifying? Lucky Vicky🍀) It seems best to pay attention to security and create a specific user and grant only query permissions. (Because if it’s compromised, all DBs can be destroyed.)

500

In this way, you can also create a data dashboard that is easy for frontend developers to view and use.

Conclusion

All settings are now complete. To briefly explain again:

  • Register Server Instance, DB Instance in Prometheus settings.
    • Spring server installs & exposes Prometheus.
    • DB server installs MySQL Exporter.
  • Spring server sends logs to Loki via HTTP API.
  • Grafana connects Prometheus & Loki data sources for visualization.

Using PromQL in Prometheus and LogQL in Loki allows you to display data on the dashboard as desired, but I don’t think I’ll study that part. (The learning curve is too high, and the default provided is sufficient.)

I don’t know when it will be (the mission is too busy…) The next content will probably cover setting up alerts through monitoring. (Slow queries, server overload & error rates, etc.)

References

[Wooteco 6th Level 3] Grafana, Loki, Prometheus - Log and Metric Monitoring [Assignment] Building MySQL monitoring using prometheus and grafana

This post is licensed under CC BY 4.0 by the author.