Job Title: Senior DevOps and Site Reliability Engineer
Location: Washington, DC (Onsite – No Remote)
Job Description
Randstad is seeking a Senior DevOps and Site Reliability Engineer (SRE) to support a key client in the DC Metro area. This is a high-impact, senior-level role responsible for enhancing the reliability, performance, security, and scalability of mission-critical production environments hosted on AWS.
The ideal candidate is a hands-on technical leader with deep expertise in DevOps, Infrastructure-as-Code (IaC), observability, and incident response. You will implement automation at scale, lead reliability engineering efforts, and help define SRE practices across cross-functional teams.
Key Responsibilities
Deployment & Automation
Build and maintain CI/CD pipelines (GitHub Actions, AWS CodePipeline, Jenkins).
Automate infrastructure provisioning using IaC tools (Terraform, CloudFormation, AWS CDK).
Develop automation scripts and self-service tools to streamline operations.
Use programming languages (Python, Go, Java) for automation and debugging.
Site Reliability Engineering
Lead incident response as an on-call engineer, including disaster recovery activities.
Conduct post-incident reviews and implement systemic improvements.
Define and monitor SLIs, SLOs, and manage error budgets.
Use observability tools (Dynatrace preferred, ELK Stack, AppDynamics) for monitoring and root cause analysis.
Implement distributed tracing and anomaly detection dashboards.
Performance, Capacity & Cost Optimization
Forecast system capacity needs and plan for scalability.
Lead cost optimization across cloud infrastructure.
Willingness to support on-call duties and non-standard hours during critical incidents.
Information
Locations Position Open to Only localsIndustry Information TechnologyStatus OpenJob Age 50 Day'sCreated Date 10/10/2025No.of Positions 1Duration 12Zip Code