Debug service mesh events and errors with Consul proxy access logs
Consul helps you securely connect applications running in any environment, at any scale. Consul observability features enhance your service mesh capabilities with enriched metrics, logs, and distributed traces so you can improve performance and debug your distributed services with precision.
Consul proxy access logs give you detailed event and error information about your service mesh applications. This includes upstream/downstream application connections, request status codes, errors, and additional information that you can use to diagnose and troubleshoot your distributed applications. Once you enable proxy access logs in Consul, you do not need to configure or instrument your applications in the service mesh to leverage proxy metrics. While metrics provide a general overview of system health and performance, logs provide the context and details needed to diagnose issues and identify the root cause of problems.
In this tutorial, you will enable proxy access logs for your Consul sidecars. You will use Grafana to explore dashboards that provide information regarding events, errors, and operations for your service mesh applications. In the process, you will learn how using these features can provide you with faster incident resolution, reduced operational overhead, and contribute to a more holistic view of your service mesh applications.
Scenario overview
HashiCups is a coffee shop demo application. It has a microservices architecture and uses Consul service mesh to securely connect the services. At the beginning of this tutorial, you will use Terraform to deploy the HashiCups microservices, a self-managed Consul cluster, and an observability suite on Elastic Kubernetes Service (EKS). The HashiCups service public-api
will initially be in a broken state to show how proxy access logs help you diagnose and debug service mesh applications.
The Consul proxy sidecar container can emit access logs that contain response codes, protocols, timings, and additional diagnostic information for all inbound and outbound network traffic throughout your service mesh. By configuring the Consul Helm chart, you can configure the proxies to emit these logs so Promtail can scrape and store them in Loki. You can then visualize the access logs with Grafana.
In this tutorial, you will:
- Deploy the following resources with Terraform:
- Elastic Kubernetes Service (EKS) cluster
- A self-managed Consul datacenter on EKS
- Grafana, Loki, and Promtail on EKS
- HashiCups demo application on EKS
- Perform the following procedures:
- Review and enable proxy access logs
- Explore the demo application (broken state)
- Debug the demo application with Grafana dashboards
- Restore the demo application to a working state
Prerequisites
The tutorial assumes that you are familiar with Consul and its core functionality. If you are new to Consul, refer to the Consul Getting Started tutorials collection.
For this tutorial, you will need:
- An AWS account configured for use with Terraform
- (Optional) An HCP account
- aws-cli >= 2.0
- terraform >= 1.0
- consul >= 1.16.0
- consul-k8s >= 1.2.0
- helm >= 3.0
- git >= 2.0
- kubectl > 1.24
- jq >= 1.6
Clone GitHub repository
Clone the GitHub repository containing the configuration files and resources.
$ git clone https://github.com/hashicorp-education/learn-consul-proxy-access-logs
Change into the directory that contains the complete configuration files for this tutorial.
$ cd learn-consul-proxy-access-logs/self-managed/eks
Review repository contents
This repository contains Terraform configuration to spin up the initial infrastructure and all files to deploy Consul, the demo application, and the observability suite resources.
The eks
directory contains the following Terraform configuration files:
aws-vpc.tf
defines the AWS VPC resourceseks-cluster.tf
defines Amazon EKS cluster deployment resourceseks-consul.tf
defines the self-managed Consul deploymenteks-hashicups-with-consul.tf
defines the HashiCups resourceseks-observability.tf
defines the Loki, Promtail, and Grafana resourcesoutputs.tf
defines outputs you will use to authenticate and connect to your Kubernetes clusterproviders.tf
defines AWS and Kubernetes provider definitions for Terraformvariables.tf
defines variables you can use to customize the tutorial
The directory also contains the following subdirectories:
../../dashboards
contains the JSON configuration files for the example Grafana dashboardsapi-gw
contains the Kubernetes configuration files for the Consul API gatewayconfig
contains the Kubernetes configuration files for the Consul proxy defaultshashicups
contains the Kubernetes configuration files for HashiCupshelm
contains the Helm charts for Consul, Grafana, Loki, and Promtail
Deploy infrastructure and demo application
With these Terraform configuration files, you are ready to deploy your infrastructure.
Initialize your Terraform configuration to download the necessary providers and modules.
$ terraform init Initializing the backend... Initializing provider plugins...## ... Terraform has been successfully initialized!## ...
Then, deploy the resources. Confirm the run by entering yes
.
$ terraform apply ## ...Do you want to perform these actions? Terraform will perform the actions described above. Only 'yes' will be accepted to approve. Enter a value: yes ## ... Apply complete! Resources: 94 added, 0 changed, 0 destroyed.
Note
The Terraform deployment could take up to 15 minutes to complete. Feel free to explore the next sections of this tutorial while waiting for the environment to complete initialization.
Connect to your infrastructure
Now that you have deployed the Kubernetes cluster, configure kubectl
to interact with it.
$ aws eks --region $(terraform output -raw region) update-kubeconfig --name $(terraform output -raw kubernetes_cluster_id)
Ensure all services are up and running successfully
Check the pods across all namespaces to confirm they are running successfully.
$ kubectl get pods --namespace consul && kubectl get pods --namespace observability && kubectl get pods --namespace defaultNAME READY STATUS RESTARTS AGEconsul-connect-injector-9b944b6c4-9zst8 1/1 Running 0 30hconsul-server-0 1/1 Running 0 30hconsul-server-1 1/1 Running 0 30hconsul-server-2 1/1 Running 0 30hconsul-webhook-cert-manager-9d7cc8cc5-jz9hx 1/1 Running 0 30hNAME READY STATUS RESTARTS AGEgrafana-7cc8f655bb-jrdjm 1/1 Running 0 30hloki-0 1/1 Running 0 30hloki-canary-486w4 1/1 Running 0 30hloki-canary-4gbqq 1/1 Running 0 30hloki-canary-xqvbh 1/1 Running 0 30hloki-gateway-f9f9888c5-97t9g 1/1 Running 0 30hloki-grafana-agent-operator-d7c684bf9-mlm54 1/1 Running 0 30hloki-logs-8m2jr 2/2 Running 0 30hloki-logs-jn9nq 2/2 Running 0 30hloki-logs-qsg5z 2/2 Running 0 30hpromtail-7thvg 1/1 Running 0 30hpromtail-9c8dg 1/1 Running 0 30hpromtail-z87c8 1/1 Running 0 30hNAME READY STATUS RESTARTS AGEapi-gateway-85bfcc496-rvzl7 1/1 Running 0 30hfrontend-65c5595d9b-8zcvg 2/2 Running 0 30hnginx-6bc8bbd795-7r2q8 2/2 Running 0 30hpayments-54f65dc56-zvdpc 2/2 Running 0 30hproduct-api-db-5b8f8f9ddf-z88fz 2/2 Running 0 30hproduct-api-f99cdbdb7-6lr8t 2/2 Running 0 30hpublic-api-6c86987789-b749n 2/2 Running 0 30h
Enable Consul proxy access logs
In this section, you will review the parameters that enable Consul proxy access logs, update your proxy defaults to apply the new configuration, and restart your service mesh sidecar proxies to use the new configuration.
Review and enable access logs
The ProxyDefaults
configuration entry lets you configure global defaults across all sidecar proxies for Consul service mesh proxy configurations. The config/proxy-defaults.yaml
file enables accessLogs
for all of your Consul sidecar proxies. For more detailed information about proxy defaults, refer to the Consul proxy defaults documentation.
config/proxy-defaults.yaml
apiVersion: consul.hashicorp.com/v1alpha1kind: ProxyDefaultsmetadata: name: globalspec: accessLogs: enabled: true
Apply your proxy defaults configuration to enable access logs.
$ kubectl apply -f config/proxy-defaults.yamlproxydefaults.consul.hashicorp.com/global created
Restart sidecar proxies
You need to restart your sidecar proxies to apply the updated proxy defaults. To do so, redeploy your HashiCups application.
$ kubectl rollout restart deployment --namespace defaultdeployment.apps/api-gateway restarteddeployment.apps/frontend restarteddeployment.apps/nginx restarteddeployment.apps/payments restarteddeployment.apps/product-api restarteddeployment.apps/product-api-db restarteddeployment.apps/traffic-generator restarted
Promtail will now begin scraping the stdout
and stderr
log streams for all proxy sidecars. Refer to the Consul proxy access logs documentation to learn more about customizing the logging format and the default logging parameters.
Confirm sidecar configuration
Confirm that your sidecar proxy configuration has been successfully updated by viewing the Envoy admin interface. You can connect to the Envoy admin interface by port-forwarding port 19000
from a service that has a sidecar proxy.
$ kubectl port-forward deploy/frontend 19000:19000
Open http://localhost:19000/config_dump in your browser to find the Envoy configuration. Search for Consul Listener Filter Log
, which includes the details that Promtail will scrape and push to Loki.
"access_log": [ { "name": "Consul Listener Filter Log", "typed_config": { "@type": "type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog", "log_format": { "json_format": { "response_code": "%RESPONSE_CODE%", "downstream_remote_address": "%DOWNSTREAM_REMOTE_ADDRESS%", "duration": "%DURATION%", "protocol": "%PROTOCOL%", "upstream_service_time": "%RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)%", "x_forwarded_for": "%REQ(X-FORWARDED-FOR)%", "response_flags": "%RESPONSE_FLAGS%", "upstream_local_address": "%UPSTREAM_LOCAL_ADDRESS%", "user_agent": "%REQ(USER-AGENT)%", "request_id": "%REQ(X-REQUEST-ID)%", "route_name": "%ROUTE_NAME%", "upstream_host": "%UPSTREAM_HOST%", "method": "%REQ(:METHOD)%", "bytes_received": "%BYTES_RECEIVED%", "path": "%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%", "authority": "%REQ(:AUTHORITY)%", "start_time": "%START_TIME%", "requested_server_name": "%REQUESTED_SERVER_NAME%", "response_code_details": "%RESPONSE_CODE_DETAILS%", "downstream_local_address": "%DOWNSTREAM_LOCAL_ADDRESS%", "upstream_transport_failure_reason": "%UPSTREAM_TRANSPORT_FAILURE_REASON%", "upstream_cluster": "%UPSTREAM_CLUSTER%", "connection_termination_details": "%CONNECTION_TERMINATION_DETAILS%", "bytes_sent": "%BYTES_SENT%" } } } } ]
The presence of this stanza confirms that Consul has configured the Envoy sidecar to emit access logs. For more detailed information about each access log field, refer to the Envoy documentation.
Explore the demo application (broken state)
In this section, you will visit your demo application to explore the HashiCups UI.
Retrieve the Consul API gateway public DNS address.
$ export CONSUL_APIGW_ADDR=http://$(kubectl get svc/api-gateway -o json | jq -r '.status.loadBalancer.ingress[0].hostname') && echo $CONSUL_APIGW_ADDRhttp://a4cc3e77d86854fe4bbcc9c62b8d381d-221509817.us-west-2.elb.amazonaws.com
Open the Consul API gateway's URL in your browser and explore the HashiCups UI. Notice that HashiCups is in a broken state and unable to retrieve coffees from the backend services.
Explore events and errors dashboard
Consul proxy access logs let you see detailed event and error information regarding the connections and requests of your service mesh applications. In this section, you will use Grafana to see how this information provides diagnostic and troubleshooting capabilities for your distributed applications.
Navigate to the access log events dashboard.
$ export GRAFANA_ACCESS_LOGS_DASHBOARD=http://$(kubectl get svc/grafana --namespace observability -o json | jq -r '.status.loadBalancer.ingress[0].hostname')/d/access-logs-events-and-errors/ && echo $GRAFANA_ACCESS_LOGS_DASHBOARDhttp://a20fb6f2d1d3e4be296d05452a378ad2-428040929.us-west-2.elb.amazonaws.com/d/access-logs-events-and-errors/
Note
The example dashboards take a few minutes to populate with data after the proxy access logs are enabled.
Notice that the example dashboard panes provide detailed event and error insights for HashiCups. For example, the Response code distribution
pie chart gives you the success/failure ratio of HTTP requests in your service mesh during a specific time window. In this pie chart, notice that 500
response codes are present, which indicates communication issues with the service mesh application.
Search for the string "response_code":500
in the search field.
Notice how this action applies a filter to the respective visualizations and raw logs containing that value so you can zoom into error logs for further analysis and troubleshooting. Click on one of the raw logs to view the entire access log contents.
Notice that these 500
response codes are generated when nginx
communicates with the /api
path. Based on the architecture diagram and this information, you can deduce that there is an error with the public-api
service.
Tip
Consul proxy access logs contain a large set of information that you can use to create custom dashboards for monitoring your service mesh applications according to your production environment’s unique requirements. Refer to the Consul proxy access logs documentation for a complete list of available logs.
Restore HashiCups functionality
In this section, you will restore HashiCups functionality by using the insights you gained from the access logs dashboard.
Open the hashicups/public-api.yaml
and investigate the deployment resource configuration. Notice that the ERROR_RATE
environment variable is set to 100
. This developer testing environment variable is currently causing the public-api
container to return synthetic errors 100% of the time. Update this value to 0
and save your changes.
hashicups/public-api.yaml
apiVersion: apps/v1kind: Deploymentmetadata: name: public-api namespace: defaultspec: ## ... template: ## ... spec: serviceAccountName: public-api containers: - name: public-api ## ... env: - name: BIND_ADDRESS value: ":8080" - name: PRODUCT_API_URI value: "http://product-api:9090" - name: PAYMENT_API_URI value: "http://payments:1800" - name: ERROR_RATE # update value to 0 to fix the problem value: "100"## ...
Re-deploy your public-api
deployment so public-api
will no longer create synthetic errors.
$ kubectl apply -f hashicups/public-api.yaml --namespace defaultservice/public-api unchangedserviceaccount/public-api unchangedservicedefaults.consul.hashicorp.com/public-api unchangeddeployment.apps/public-api configured
Open the HashiCup's URL in your browser and refresh the HashiCups UI.
$ echo $CONSUL_APIGW_ADDRhttp://a4cc3e77d86854fe4bbcc9c62b8d381d-221509817.us-west-2.elb.amazonaws.com
Notice that the HashiCups UI functions correctly. You have successfully resolved the problem using Consul's proxy access logs feature.
Clean up resources
Destroy the Terraform resources to clean up your environment. Confirm the destroy operation by inputting yes
.
$ terraform destroy ## ...Do you really want to destroy all resources? Terraform will destroy all your managed infrastructure, as shown above. There is no undo. Only 'yes' will be accepted to confirm. Enter a value: yes ## ... Destroy complete! Resources: 0 added, 0 changed, 94 destroyed.
Note
Due to race conditions with the cloud resources in this tutorial, you may need to run the destroy
operation twice to remove all the resources.
Next steps
In this tutorial, you enabled proxy access logs in the Consul service mesh to enhance the diagnostic, troubleshooting, and event auditing capabilities of your service mesh applications. You did not need to configure or instrument for your applications to enable these features, leading to a very quick time-to-value for your service mesh applications. This integration offers faster incident resolution, increased application understanding, and reduced operational overhead.
For more information about the topics covered in this tutorial, refer to the following resources: