Do you have any Walmart formers for this devops role?
Only Walmart formers will be considered.
About the Team
Join a specialized Infrastructure Engineering team focused on deploying and
managing cloud-hosted TigerGraph clusters. This role spans the full cluster
lifecycle—from provisioning and performance testing to observability and
operations—ensuring optimal performance and reliability of TigerGraph’s hosted
environments.
________________________________________
Key Responsibilities
Cluster Provisioning & Setup
• Define and implement cluster sizing strategies based on workload and capacity
planning.
• Lead deployment of TigerGraph clusters across supported cloud platforms.
• Manage infrastructure cost approvals and budgeting.
Cloud Infrastructure & Operations
• Provision cloud infrastructure components including compute, storage, and
networking.
• Implement secure networking configurations and ensure alignment with security
policies.
• Collaborate with architecture and domain teams to fulfill security and
deployment requirements.
Performance & Resiliency
• Conduct benchmarking, load testing, and stress simulations to validate
readiness.
• Apply best practices for scalable and fault-tolerant cluster configurations.
Observability & Operational Readiness
• Set up monitoring, alerting, and dashboarding tools for real-time operational
visibility.
• Develop and maintain runbooks, standard operating procedures (SOPs), and
incident response workflows.
Ongoing Cluster Management
• Manage upgrades, scaling activities, and infrastructure right-sizing.
• Optimize shard distribution and maintain balanced cluster performance.
• Monitor and reduce cloud resource consumption for cost efficiency.
________________________________________
Required Skills & Experience
• 5+ years of experience in cloud infrastructure engineering (AWS, GCP, or
Azure)
• Hands-on experience with distributed systems or graph databases (TigerGraph
preferred)
• Expertise in infrastructure-as-code tools (Terraform, CloudFormation)
• Experience with performance/load testing tools and frameworks
• Proficient in observability tools (e.g., Prometheus, Grafana, Datadog)
• Strong understanding of operational documentation, incident management, and
SOPs
• Familiarity with Kubernetes and container orchestration (a plus)
________________________________________
Preferred Qualifications
• Experience with performance testing tools like JMeter, Locust, or Gatling
• Background in managing medium to large-scale data clusters, with a focus on
scalability and fault tolerance
• Prior experience with graph databases, especially TigerGraph or Neo4j
Secondary Skills - Nice to
Haves
Information
Locations Position Open to RemoteIndustry Information TechnologyStatus OpenJob Age 6 Day'sCreated Date 08/08/2025No.of Positions 2Duration 6-12 monthsZip Code