DEBUG: PAGE=domain, TITLE=NelsonHall Blog,ID=1469,TEMPLATE=blog
toggle expanded view
  • NelsonHall Blog

    We publish lots of information and analyst insights on our blogs. Here you can find the aggregated posts across all NelsonHall program blogs and much more.

    explore
  • Events & Webinars

    Keep up to date regarding some of the many upcoming events that NelsonHall participates in and also runs.

    Take the opportunity to join/attend in order to meet and discover live what makes NelsonHall a leading analyst firm in the industry.

    explore

Subscribe to blogs & alerts:

manage email alerts using the form below, in order to be notified via email whenever we publish new content:

Search research content:

Access our analyst expertise:

Only NelsonHall clients who are logged in have access to our analysts and advisors for their expert advice and opinion.

To find out more about how NelsonHall's analysts and sourcing advisors can assist you with your strategy and engagements, please contact our sales department here.

TCS Deploys SRE Services to Cloud Application Testing

 

TCS recently briefed NelsonHall on its approach to site reliability engineering (SRE) in the context of quality engineering (QE).

SRE emerged almost a decade ago as part of the shift-right move, targeting production environments beyond traditional IT infrastructure activities such as services desk and monitoring activities. While no definition of SRE has fully emerged, TCS points out that SRE focuses on two topics: resiliency and reliability, through with observability and AIOps, automation, and chaos engineering as key services.

TCS prioritizes cloud-hosted applications for its SRE services, as cloud hosting increases the likelihood of application outage since applications that have been migrated were not initially designed and configured for cloud or multi-cloud hosting.

Generally, there has been very little SRE in QE activity, even though the industry has emphasized shift-right for several years. The shift-right notion in QE refers to feeding back production information to dev and test teams, breaking down the traditional silos between build and run activities. And in activities such as application monitoring (relying on the APM tools) and associated AI use cases (to make sense of APM-triggered events), the classification of defects found in production, and in sentiment analysis, have become common.

We think shift-right activities can still be improved, building on monitoring activities. Chaos engineering is a good example of a developing proactive service. More importantly, the feedback from production to dev and test needs to be improved, and we think SRE will help here.

Observability/Monitoring, AIOps, and Chaos Engineering

TCS' approach to SRE relies on application monitoring, AIOps, automation, and chaos engineering. Application monitoring ('observability') remains at the core of TCS' portfolio. For this, the company will deploy APM tools, collect logs and traces, and provide reporting. One of the challenges in application monitoring is data dissemination across different applications and databases. Accordingly, data centralization is a priority for TCS.

Once it has collected monitoring data, TCS deploys AI models (AIOps) to automate event detection and correlation and eventually move to a prediction phase. TCS' main AI use cases are predictive alerts, root cause analysis, event prioritization, and outage likelihood. The company will use third-party tools such as Dynatrace (combined with application monitoring) or deploy its own IP, depending on the client's tool usage.

For deployment and recoverability, its next step after AIOps, TCS will complement application deployment with automated rollbacks and ticket creation. At this stage, when facing application defects, the SRE team will also involve the development teams to conduct RCA and fix application defects.

TCS will also conduct chaos engineering. Chaos engineering complements performance engineering and testing in that it evaluates applications' behavior under more strenuous conditions. With chaos engineering, TCS will conduct attacks such as instance shutdown, increased CPU usage, and black holes to assess how the applications being tested behave. TCS has integrated tools such as Gremlins and Azure Chaos Studio in its DevOps portfolio to embed chaos engineering as part of continuous testing.

Demand Is Still Nascent

TCS typically deploys SRE teams of six engineers for monitoring applications. It highlights that SRE adoption is still nascent, and it will lead such programs with marquee clients initially.

In broad terms, the future of SRE lies in DevOps and becoming part of continuous testing, where all activities are scheduled and automated, for new build/release execution. TCS is an early mover in this area and is currently honing its tools and consulting capabilities. Platforms combining tools and targeting comprehensive services as part of continuous testing are the company's next step.

No comments yet.

Post a comment to this article:

close