Many of the enterprise systems we encounter are very complex. Unfortunately, the IT teams that develop and support these solutions will oftentimes have inaccurate documentation to show all of the elements. The design of the software is handled by one organization whereas the actual implementation onto the underlying components end up within other teams such as build, middleware, or operations. Additionally, these solutions are built upon layers of projects that stretch back over many years. So the people that held the knowledge are no longer available. Even though the teams own the code they are actually ‘black boxes’.
In order to design a validation and troubleshooting process for new multi-tier applications I typically go to the whiteboard. Then we start drawing logical maps or event sequence charts (ESC) to figure out how it works.
Then we translate this diagram into Visio to capture it. These documents can become obsolete quickly. In order to debug performance or multi-tier errors these architectural diagrams are then correlated to a server level diagram which is created by the operations teams.
Unfortunately the definition of what the application or widget at the top of the processing line in the ESC is not what is handed to the operations teams in charge of deployment. They usually receive a deployable artifact that holds a different concept of the application such as a JAVA WAR file with a totally different name.
Sometimes we need to review the Maven POM to regenerate the architectural concepts from the ESC into what the build team defined into the deployment package. Then we can query the operations folks about what was deployed onto a particular web server. If you have worked through these steps then you know how long it takes.
Imagine if the system is currently down?
What if there are more layers in the staffing model or in the solution?
This one time, we had 20 or more people in the room to define the end to end path before we could even begin to troubleshoot the downtime. As we had more and more people at the whiteboard I started calculating the lost revenue as well as the accuracy of the resultant model. In my analysis the model lost incremental correctness as more than two people were involved. I was further dismayed each time developers, testers, ops folks, or architects had to describe their role in the solution because they had never collaborated before!
Imagine my excitement the first time I used AppDynamics and it automatically mapped the multi-tier application based upon their tagging technology. Not only did it map out what the architecture team believed to be the solution, but it found eight more tiers within the application. They didn’t know the actual development had varied that much from their design principles.
This diagram and discussion happened on the very first day the solution was turned on. We were able to compare production to test as well. Instead of debating the accuracy of the diagram we were redesigning the solution for improvement. We removed the blame-storming and confusion from the room.
This is a sample flow-map from AppDynamics UI.
Now we can start to optimize! In the next post I will correlate the flowmap data to our UCML to define performance and capacity strategies.