A microservice architecture breaks up the monolith application into many smaller pieces and introduces new communication patterns between services like fault tolerance and dynamic routing.One of the major challenges with the management of a microservices architecture is trying to understand how services are composed, how they are connected and how all the individual components operate, from global perspective and drilling down into particular detail.
Besides the advantages of breaking down services into micro services (like agility, scalability, increased reusability, better testability and easy upgrades and versioning), this paradigm also increases the complexity of securing them due to a shift of the method calls via in-process communication into many separate network requests which need to be secured. Every new service you introduce needs to be protected from man-in-the-middle attacks and data leaks, manage access control, and audit who is using which resources and when. Not forgetting the fact that each service can be written in different programming languages. A Service Mesh like Istio provides traffic control and communication security capabilities at the platform level and frees the application writers from those tasks, allowing them to focus on business logic.
But just because the Service Mesh helps to offload the extra coding, developers still need to observe and manage how the services are communicating as they deploy an application. With the OpenShift Service Mesh, Kiali has been packaged along with Istio to make that task easier. In this post we will show how to use Kiali capabilities to observe and manage an Istio Service Mesh. We will use a reference demo application to demonstrate how Kiali can compare different service versions and how you can configure traffic routing using Istio config resources. Then we will add mutual TLS to all the demo components in order to make our deployment communications more secure. Kiali will assist in this process helping to spot misconfigurations and unprotected communications.
How Does Kiali Work? Using A/B Testing as an Example
One pretty common exercise that Service Mesh is perfect for is performing A/B testing to compare application versions. And with a microservices application, this can be more complex than with a single monolith. Let’s use a reference demo application to demonstrate how Kiali can compare different service versions and how you can configure traffic routing using Istio config resources.
Travel Agency Demo
Travel Agency is a demo microservices application that simulates a travel portal scenario. It creates two namespaces:
- A first travel-portal namespace will deploy an application per city representing a personalized portal where users enter to search and book travels, in our example we will have three applications: paris-portal, rome-portal and london-portal. Every portal application will have two services: web service will handle users from the web channel, meanwhile vip service will take requests from priority channels like special travel brokers. All portals consume a travel quote service hosted in the next namespace.
- A second travel-agency namespace will host services that will calculate quotes for travel. A main travels service will be the business entry point for the travel agency. It receives a city and a user as parameters and it calculates all elements that compose a travel budget: airfares, lodging, car reservation and travel insurances. There are several services that calculate separate prices and the travel service is responsible to aggregate them in a single response. Additionally, some users like vip users can have access to special discounts, managed by a specific discounts service.
The interaction between all services from the example can be shown in the following picture:
In the next steps we are going to deploy a new version of every service in the travel agency namespace that will run in parallel with the first version deployed. Let’s imagine that the next version will add new features that we want to test with live users and compare how are the results. Obviously, in the real world this could be complex and highly dependent on the domain, but for our example, we will focus on the response time that portals will get assuming that a slower portal will cause our users to lose interest.
One of the first steps we can do in Kiali is to enable Response time labels on the Graph:
The graph helps us to identify those services that could have some problems. In our example everything is green and healthy, but the Response time shows some suspects that the new version 2 probably has some slower features compared with version 1.
Our next stop will be to take a closer look into the travels application metrics:
Under the Inbound Metrics tab we will have data about the portal calls, Kiali can show metrics split by several criteria. Grouping by app shows that all portals have increased the response time since the moment version 2 was deployed.
If we show Inbound metrics grouped by app and version, then we spot an interesting difference: response time in general has been increased, but portals that handle vip users have worse behaviour.
Also, we can continue using Kiali to investigate and correlate these results with traces:
And also with logs from the workloads if it would be necessary to get more information:
Taking Action with Kiali
From our investigation phase we have spotted a slower response time from version 2 and even slower for vip user requests.
There can be multiple strategies from here, like undeploying the whole version 2, partial deployment of version 2 service by service, limiting which users can access the new version, or a combination of all of those.
In our case, we are going to show how we can use Kiali Actions to add Istio traffic routing into our example that can help to implement some of the previous strategies.
A first action we can perform is to add Istio resources to route traffic coming from vip users to version 2 and the rest of the requests to version 1.
Kiali allows to create Istio configurations from a high level user interface. From the actions located in the service details we can open the Matching Routing Wizard and discriminate requests using headers as it is shown in the picture:
Kiali will create the specifics VirtualService and DestinationRule resources under the service. As part of our strategy we will add similar rules for the suspected services: travels, flights, hotels, cars and insurances.
When we have finished creating Matching Routing for our version 2 services we can check that Kiali has created the correct Istio configuration using the “Istio Config” section:
Once this routing configuration is applied we can see the results in the Response time edges of the Graph:
Now in our example, all traffic coming from vip portals will be routed to the version 2, meanwhile the rest of the traffic is using the previous version 1 which has returned to its normal response time. The graph also shows that vip user requests have extra load as they need to access the discounts service.
If we examine the discounts service, we can see big differences between response time from version 1 versus version 2:
Once we have spotted a clear cause for the slower response, we can decide to move most of the traffic to the version 1 but maintain some of the traffic to version 2 to get more data and observe the differences. This action will help to not impact too much into the overall performance of the app.
We can use the Weighted Routing Wizard to set 90% of the traffic into version 1 and maintain only a 10% for version 2:
Once the Istio configuration is created we can enable Request percentage in the graph and examine the discounts service:
Kiali also allows to suspend traffic partially or totally for a specific destination using the Suspend Traffic Wizard:
This action allows you to stop traffic for a specific workload, in other words, to distribute the traffic between the rest of connected workloads. Also, user can stop the whole service returning a fast HTTP 503 error to implement a strategy to “fail sooner” and recover rather than letting the slow requests flood the overall application.
Make the Service Mesh Work for You
Microservices scenarios demand good observability tooling and practices. In this post, we have showed how to combine Kiali capabilities to observe, correlate traces via Jaeger Integration, define strategies and perform actions on a Istio based microservices application.
The ServiceMesh is a great tool that solves complex problems introduced by the Microservices paradigm. As a component of the OpenShift Service Mesh, Kiali provides the key observability features you need to truly understand all the telemetry and distributed tracing out of the service mesh. Kiali does all the work of correlating and processing the status of the Service Mesh, which means it becomes easier to quickly take a look at the status of the service mesh.
No longer do you have to deal with separate consoles, or understanding how to configure special dashboards. You don’t have to learn how to fetch traces for each service or understand which rules are needed to apply an A/B test. Kiali builds a topology from traffic metrics and it combines multiple namespace in a single view. Using animations, user can identify a slow bottleneck mapped with a slow animation in the graph. And it is all seamlessly integrated under the OpenShift Service Mesh console.
With OpenShift Service Mesh and Kiali, developers have the tools they need to offload the complexity of creating and managing the complex intraservice communications that are the glue to deploying a microservices-based application.