Continuous validation is the process of monitoring an application for anomalies after deployment. Anomalies are any disruption in normal operations that could affect the application’s users, including:
The idea behind continuous verification is to collect data from historic deployments, analyze it via machine learning, and create a baseline of good deployments. This allows a continuous verification system to identify that something is not right and take corrective action—for example, roll back the application to a stable version. This should be done as quickly and smoothly as possible before customers become aware of the problem.
This is part of our series of articles about GitOps.
In this article:
When different software engineers who write code separately have different expectations, the resulting code combination can create problems. If the development team can identify these issues quickly, they can prevent similar gaps in the next code they write. However, if the expectation gap remains unaddressed, each code write may diverge further, increasing the likelihood of undesirable results.
One way to uncover the diverging expectations of different developers is to combine their code and run it together. Agile methodologies such as continuous integration (CI) can help achieve this. Teams can use integration tests to test the specific functionalities of combined code features from separate developers.
CI pipelines require publishing all code edits to a common repository, where the development team can run integration tests to identify changes that break the application. The rapid feedback loops of CI make software changes more easily reversible—developers can quickly apply new changes, test them, and revert the changes if necessary. Reversibility is especially important for complex systems with multiple teams working on a joint project.
Continuous delivery (CD) introduces the advantages of CI to the next stage in the development cycle. CD focuses on automation to prepare code for deployment in the production environment. Software engineers can use CD tools to select builds that have passed the CI stage, delivering them to promote production.
CD provides another feedback loop for developers, providing insights on how code changes perform when running in production. This approach enables frequent deployments, which can more easily address expectation gaps. They are less liable to break because they are more likely to catch additional expectation gaps. CI/CD is now an industry norm.
Continuous verification (CV) can be seen as an extension of the established practices of CI/CD. It is a new practice that involves proactive experimentation, enabled by tools for verifying system behavior. Verification differs from validation—while the former deals with business outcomes, the latter emphasizes software correctness.
CV differs from traditional quality assurance practices, which favor reactive tests using software validation methodologies that look for known properties. Other common verification methods include monitoring, alerts, code tests and reviews, and site reliability engineering (SRE). While these practices remain useful (and common), CV provides additional capabilities for addressing the specific challenges of complex systems.
Important advantages of continuous verification include:
Complex systems have open-ended and constantly evolve, requiring a different approach to understanding known software properties (i.e., output constraints). Like CI/CD, continuous verification addresses the need to navigate complex computing systems.
Continuous verification does not introduce a new software engineering paradigm—it merely integrates development practices for similar solutions. The main difference is that CD changes how organizations think about application development and operations. Instead of validating the internal machinations of a system, DevOps teams can verify system outputs—focusing on performance rather than structure. This approach saves time and resources, which is increasingly important as systems grow more complex.
Cloud service providers typically offer various monitoring and alerting services to help detect misconfiguration issues and vulnerabilities. However, most monitoring systems provide retrospective alerts, which the security team might not see in time to prevent damage. They also might not provide adequate visibility into the root cause of security issues. When developers apply fixes, they might be ineffective or affect the functional or other cloud assets.
When using multiple CSPs, the incompatibility of the different cloud systems can exacerbate security issues. The security might struggle to see the context or deal with floods of false positives. Continuous verification helps address these challenges by tracking assets from deployment.
CV requires proactively monitoring cloud assets after deployment to identify security issues throughout their lifecycle. Agents attached to cloud assets continuously check their performance and configuration against established policies to identify anomalous behaviors. This strategy mitigates cloud data breaches and provides full visibility into an organization’s assets across cloud environments.
Continuous verification treats security issues as part of an interconnected asset inventory, not isolated incidents. A problem with one asset could impact other assets. Security teams should combine CV with policy-based automation to apply predetermined responses to security issues. This setup can automatically revoke user access when it detects suspicious activity.
CV can also integrate security processes into the development pipeline to accelerate deployment. It can identify vulnerabilities at earlier stages to enable faster and easier resolution.
SRE teams can apply continuous verification principles to their development pipeline to block the delivery of software that demonstrates quality issues during the development and integration stages. They can also use this approach to evaluate production health after new deployments and determine whether they want to roll out or roll back the new software version.
Here are some examples of CV strategies:
An artifact repository serves as a technical contract between the CI and CD pipelines. The CI pipeline provides a structured mechanism for producing artifacts, which the team can later store in the artifact repository. It is safe to assume that the application is fully built and packaged for any artifact stored in the repository. Stored artifacts must pass vulnerability checks and test coverage validation using static code analysis, unit tests, and the like.
Before a pipeline moves an artifact to production, it is important to ensure the transporting pipeline is reliable and can handle frequent deliveries. Increasing the overall coverage of automated verification tasks, the SRE team can accelerate the delivery process and ensure the software reaches the production environment on schedule.
Each stage in the pipeline should include verification. For example, the development stage should cover:
The data provided by event logs and visibility solutions typically supply quality gates (i.e., metrics inform quality enforcement). A machine learning algorithm or defined threshold provides the criteria for passing or failing each quality gate. Using machine learning algorithms to process log events and observability data allows analysts to compare the actual state of an application to the defined state (i.e., the normal baseline for performance, functionality, etc.).
This type of verification usually depends on the deployment strategy used. For example, in a canary deployment, it is useful to apply quality gates to identify quality regression in the new instance but not in old instances.
During the early stages of the pipeline, the quality gates can validate the success of a database migration and application launch. In the integration stage, the quality gates should focus on testing against regression based on systematically applied loads—for example, automated performance or functional tests.
Each additional quality gate can throttle the pipeline’s throughput because more manual approvals are required. If there is a need to ship features to production quickly, it might be necessary to adjust or remodel the pipeline. For this approach to be effective, the SRE team must be highly skilled and the observability tools highly mature.
for up to 20 instances