Gartner recently released their latest magic quadrant for Application Performance Monitoring (APM) and in this report mentioned five key dimensions, two of which were Application Mapping and Transaction Profiling. These two dimensions are critical for users to identify performance bottlenecks in distributed applications, whose architecture design is typically based around SOA or Cloud concepts.
The point we’d like to emphasize in this post is this: To quickly find bottlenecks in distributed or SOA architectures, these two dimensions must be visible simultaneously to the user (troubleshooter). Ok, get ready, we’re going to say something very “unvendor” – these two dimensions should actually be “features” and not separate “products”. The only two APM products that combine these views in a single product today are AppDynamics and dynaTrace.
Unfortunately, the rest of the APM solutions don’t work that way. Most of the APM vendors who claimed to support these two dimensions for the MQ require customers to buy two or more distinct products – one of them usually a re-branded CMDB tool. The downside is that it is nowhere near as efficient, especially if the troubleshooter has to log into 2-3 different products and has to try to stitch together this view in their own mind.
Two Quick Definitions:
Application Mapping: Real-time discovery and visualization of all the interactions your application has with its underlying app infrastructure.
Transaction Profiling: Distinguishing unique transactions (ie “Login” vs “Checkout”) and tracking the unique flow and code execution of a single business transaction across the underlying distributed application infrastructure. The key here is that the latency along each hop must be captured, compared vs a baseline and displayed to troubleshoot bottlenecks.
How does Application Mapping work?
To give you a simple analogy, imagine you did the Gumball Rally (fast road trip) across America from the west coast to the east coast. Your journey would take you across many different states (physical servers) and cities (application nodes). Each competitor (user) would drive (transaction) across America taking different routes (flows) to try and get to the finish line the first. If you mapped out a few journeys it might look something like this:
You can apply the above road trip analogy to your applications and infrastructure. By tracking every Business Transaction (journey) of an application across your Infrastructure you understand the Servers (States) and application nodes (Cities) they touch. From this data your able to construct a complete map of your application.
For example, an application map could look like this:
The reason why Application mapping is important is so that the user sees the bigger picture of how application infrastructure components interact and service user requests (aka business transactions). The user gains high-level visibility into the performance and health of their entire application as opposed to looking at infrastructure silo’s individually, which can tell a completely different story. For example, if you look at your application from a JVM perspective you’ll simply see the response time of every JVM, rather than the response time of its infrastructure interactions (e.g. LDAP, message buses, databases, web services). Your business runs across your infrastructure, therefore your application monitoring and mapping needs to reflect this.
Some APM vendors like AppDynamics provide this capability in real-time by monitoring the flows of business transactions. Other vendors choose to monitor server configuration and port communication as a means to understand infrastructure dependencies. These approaches are completely different from one another. One reflects the true business and application activity via business transaction flows; the other reflects the infrastructure activity via server communication. If you truly want to map how your business runs across your IT infrastructure then I’d recommend you go with the business transaction approach.
What should Transaction Profiling look like?
Using my road trip analogy, instead of recording the time spent in each state and city, you’d record the time spent on every road (interfaces) as well as everything you did (functions) in each state and city. When a business transaction flows across its infrastructure it can perform many things like sleeping, waiting, processing and traveling to other servers and application nodes along the way. It’s therefore important you have a profile of your business transaction execution so you can perform root cause analysis rapidly when problems occur.
For example, here is what a transaction profile looks like for an individual business transaction “Checkout” that took 10032 milliseconds to execute:
A transaction profile like above provides the exact latency breakdown of how a single business transaction executed across the various servers and application nodes in the infrastructure. Once this latency is identified, the next step is to drill down into the relevant application node and take a look at its code execution profile.
For example, here is code execution profile of how a single business transaction executed across the E-Commerce and Inventory application nodes shown from the above flow:
E-Commerce Code Execution:
Inventory Code Execution:
Code execution profiles allow you to get granular visibility of where latency was spent in a server and application node, right down to the individual line of code, which invokes exact interface, class and method responsible.
Again, transaction profiling differs across APM vendors. For example, AppDynamics provides complete visibility of both transaction flow (across all tiers) and code execution (all interfaces/functions) for transactions that either fail or slow down using its dynamic base-lining analytics, which triggers transaction profiling when required. Other vendors are forced to profile everything regardless of whether a transaction runs fast or slow because their agents have no intelligence or analytics to determine whether to they need to collect flow and code execution data for transactions. Collecting everything simply doesn’t scale unless you want to manage a farm of APM management servers and databases which cost the earth to purchase and maintain.
When you see code execution details in an APM product, take a close look at the interfaces, classes and methods your being presented with. You may only see standard JavaEE interfaces like servlets, EJB, web services, JDBC and JMS with their individual response times. You might also see a few custom interfaces that represent your own bespoke classes and methods, which construct your application. No matter what you see, be sure to check whether this information comes with response time breakdowns for each interface/class/method shown. A transaction profile with partial code execution visibility significantly reduces the ability for a user to find the root cause of issues.
Conclusion:
Application Mapping & Transaction Profiling are key ingredients in being able to quickly determine the root cause of bottlenecks in distributed environments like SOA and Cloud. In our view, these are not separate “products”, they are “features” that must work together to achieve a singular purpose. Application support teams need a single solution and user interface to manage application performance, they don’t need several.
App Man.