Sr Site Reliability Engineer | Dayshift | Remote
Philippines
Full Time
Experienced
ZigZag is looking for a Sr Site Reliability Engineer to join our team!
Overview
As a Site Reliability Engineer, you’ll design, build, and maintain the infrastructure and automation that power our platform. Working closely with software engineering teams and SRE peers, you'll embed reliability, performance, and compliance into the development lifecycle. Your focus will be on scalability, resilience, security, and operational efficiency across all environments.
Key Responsibilities
Reliability Engineering & Operational Excellence
- Design, implement, and continuously improve highly available, scalable, secure, and resilient cloud infrastructure and platform services.
- Define and evolve Service Level Indicators (SLIs), Service Level Objectives (SLOs), and operational metrics to drive measurable reliability outcomes.
- Lead incident response activities, major incident management, root cause analysis, and post-incident reviews focused on systemic improvement.
- Drive reduction of operational toil through automation, standardisation, and self- healing platform capabilities.
- Develop and maintain disaster recovery, backup, failover, and resilience strategies to meet defined RTO and RPO objectives.
- Conduct capacity planning, performance analysis, and proactive optimisation of infrastructure and application environments.
- Champion operational maturity and continuous improvement practices across engineering teams.
- Architect, build, and maintain scalable cloud-native infrastructure primarily within AWS environments.
- Develop and maintain infrastructure-as-code using tools such as Terraform and CloudFormation.
- Build reusable platform components and shared services that improve developer productivity and operational consistency
- Develop automation tooling and operational frameworks using scripting and programming languages such as Python.
- Evaluate, implement, and optimise third-party infrastructure and platform tooling.
- Ensure infrastructure configurations, architecture decisions, and operational processes are thoroughly documented and auditable.
- Design and maintain comprehensive observability solutions covering metrics, logging, tracing, alerting, and dashboarding.
- Improve platform visibility and telemetry using tools such as AWS CloudWatch, Sumo Logic, Datadog, Grafana, or equivalent technologies.
- Develop actionable alerting strategies that reduce noise and improve incident response effectiveness.
- Analyse system behaviour and performance trends to proactively identify risks and optimisation opportunities.
- Drive adoption of observability best practices across engineering teams.
- Design and enhance robust CI/CD pipelines and deployment strategies that support safe, reliable, and low-risk software delivery.
- Enable engineering teams through self-service infrastructure and deployment capabilities.
- Improve software delivery efficiency through automation, standardisation, and platform engineering practices.
- Collaborate with engineering teams to embed reliability, scalability, performance, and security considerations into the SDLC.
- Support progressive delivery practices including blue/green deployments, canary releases, and zero-downtime deployments.
- Partner with security and engineering teams to maintain secure and compliant infrastructure environments.
- Support vulnerability management and remediation processes using tools such as Snyk, Lacework, Tenable Nessus, or equivalent platforms.
- Assist in maintaining compliance with frameworks and standards including PCI- DSS, ISO27001, SOC 2, and internal security controls.
- Contribute to security hardening, access management, audit readiness, and operational risk reduction initiatives.
- Ensure operational processes and infrastructure controls align with organisational governance requirements.
- Act as a technical leader and mentor within the SRE and broader engineering teams.
- Contribute to engineering standards, operational best practices, and platform strategy.
- Influence reliability-focused engineering culture across teams.
- Collaborate effectively with cross-functional stakeholders including Engineering, Product, Security, Architecture, and external vendors.
- Support continuous improvement initiatives and foster a culture of accountability, learning, and operational excellence.
- 5+ years of experience in Site Reliability Engineering, DevOps Engineering, Platform Engineering, or related infrastructure roles.
- Strong hands-on experience operating production workloads within AWS cloud environments
- Deep experience with infrastructure-as-code tools such as Terraform and/or CloudFormation.
- Strong experience designing and supporting CI/CD pipelines and modern software delivery practices.
- Strong understanding of distributed systems, microservices architecture, networking, and cloud-native technologies.
- Experience implementing observability and monitoring solutions across complex environments.
- Strong scripting and automation experience using Python, Bash, or similar languages.
- Experience managing production incidents and conducting structured root cause analysis.
- Strong understanding of system reliability, scalability, security, and operational best practices.
- Excellent analytical, troubleshooting, and problem-solving capabilities.
- Strong communication and stakeholder engagement skills.
- Ability to work effectively in fast-paced, agile, and collaborative engineering environments.
- Experience with Kubernetes, container orchestration, and platform engineering practices.
- Experience with Buildkite, GitHub Actions, GitLab CI, or equivalent CI/CD tooling.
- Exposure to service mesh, event-driven architectures, and distributed tracing.
- Experience supporting regulated environments and compliance frameworks such as PCI-DSS, ISO27001, or SOC 2.
- Experience with FinOps, cloud cost optimisation, and infrastructure performance tuning.
- Familiarity with security engineering and DevSecOps practices.
- Experience mentoring engineers or leading technical initiatives.
ZigZag is committed to building a diverse, inclusive, and equitable workplace. We believe that talent knows no borders, and we welcome individuals from all backgrounds to help us shape the future of work. Guided by transparency and agility, we foster an environment where everyone is valued and empowered to thrive.
By submitting this application, you acknowledge that you have read and agree with the company’s Privacy Policy.
Apply for this position
Required*