Sunday 14 February 2021

Microservices and Observability

 

1. Log Aggregation - In  a typical microservice architecture, there may be hundreds of servers involved and looking at the logs from relevant servers to debug a failure would be nearly impossible. Typically all the logs from various servers are aggregated to a centralized logging service with searching capability such as ELK stack (Elastic Search, Logstash and Kibana). There can be namespaces to identify the origin of the logs.

2. Distributed tracing - In a microservice architecture, a request could be spread across many microservices and for debugging any issues, we may have to check the logs from multiple services.

A tracing id is to be generated typically by the gateway and passed as a request header to all the services and is to be sent back in the response header. The tracing id can be picked from the response header of the request or any of the log message to identify the remaining part of logs in different services. 

3. Metrics and Alarms - Metrics from the application can be loaded to a centralized metric service. This can be further utilized for setting up alarm notifications for getting the immediate attention based on the severity. Tools like grafana provides a configurable dashboard for the metrics to get an overall picture of how the application has been performing. Typically a canary test also would be deployed along with each services that will generate failure or trigger alarms so that owner of the service will be alerted even before customer faces the issue.

4. Audit Logging - User actions are typically logged for any future auditing purpose like compliance, security, customer support etc.

5. Health Check API - A micro service may have health check API which typically check connections to infrastructure services like DB connection pool, disk space etc. This API is periodically called by a service registry or loadbalancer to identify the healthy service instances.

6. Log Deployment changes - Every deployment changes to production may be logged to identify if a particular issue started occurring after a certain production deployment.

7. Exception Tracking - All exceptions can be reported to a centralized exception tracking service which can notify the developers about failures.

No comments:

Post a Comment