Loading...
USM Jobs / Senior DevOps and Site Reliability Engineer
High Contract

JB060313 - Senior DevOps and Site Reliabi Apply

  • Start Date:
    Interview Types
  • Skills AWS, cloud infrastru..
    Visa Types Green Card, US Citiz..
Job Title: Senior DevOps and Site Reliability Engineer
Location: Washington, DC (Onsite – No Remote)

Job Description
Randstad is seeking a Senior DevOps and Site Reliability Engineer (SRE) to support a key client in the DC Metro area. This is a high-impact, senior-level role responsible for enhancing the reliability, performance, security, and scalability of mission-critical production environments hosted on AWS.
The ideal candidate is a hands-on technical leader with deep expertise in DevOps, Infrastructure-as-Code (IaC), observability, and incident response. You will implement automation at scale, lead reliability engineering efforts, and help define SRE practices across cross-functional teams.

Key Responsibilities
Deployment & Automation
  • Build and maintain CI/CD pipelines (GitHub Actions, AWS CodePipeline, Jenkins).
  • Automate infrastructure provisioning using IaC tools (Terraform, CloudFormation, AWS CDK).
  • Develop automation scripts and self-service tools to streamline operations.
  • Use programming languages (Python, Go, Java) for automation and debugging.
Site Reliability Engineering
  • Lead incident response as an on-call engineer, including disaster recovery activities.
  • Conduct post-incident reviews and implement systemic improvements.
  • Define and monitor SLIs, SLOs, and manage error budgets.
  • Use observability tools (Dynatrace preferred, ELK Stack, AppDynamics) for monitoring and root cause analysis.
  • Implement distributed tracing and anomaly detection dashboards.
Performance, Capacity & Cost Optimization
  • Forecast system capacity needs and plan for scalability.
  • Lead cost optimization across cloud infrastructure.
  • Implement performance/resiliency testing frameworks.
  • Manage auto-scaling configurations for resource optimization.
Security & Governance
  • Investigate and respond to security incidents.
  • Automate compliance checks and security enforcement.
  • Drive adoption of zero-trust security models in cloud environments.
  • Apply ITIL principles using ITSM tools (ServiceNow preferred).

Required Qualifications
Education & Experience
  • Bachelor’s in Computer Science, Engineering, or related field.
  • 5–8 years in DevOps, SRE, or Platform Engineering roles.
  • 3+ years supporting high-availability production systems.
  • Proven experience leading complex technical initiatives.
Technical Expertise
  • Strong expertise in AWS cloud infrastructure (certification a plus).
  • Mastery of IaC tools – Terraform, CloudFormation, AWS CDK.
  • Proficient in Python, Go, or Java.
  • Deep understanding of observability/APM tools – Dynatrace strongly preferred.
  • Familiarity with database technologies (relational, NoSQL, cloud-native).
Professional & Leadership Skills
  • Effective team mentor and cross-functional collaborator.
  • Strong documentation skills (e.g., RCAs, technical articles).
  • Willingness to support on-call duties and non-standard hours during critical incidents.