Lead Site Reliability Engineer

2 weeks ago


Tel Aviv, Tel Aviv, Israel Grubhub Full time

Why Work For Us

Grubhub, part of Wonder Group Inc, is all about connecting hungry diners with our network of over 375,000 merchants nationwide. Innovative technology, user-friendly platforms and streamlined delivery capabilities set us apart and make us an industry leader in the world of online food ordering. When you join our team, you become part of a community that works together to innovate, solve problems, grow, work hard and have a ton of fun in the process

The Impact You Will Make:

This role is crucial for simplifying the dining experience for students across the US. You will be instrumental in architecting resilient and self-healing solutions, managing AWS infrastructure, closing observability gaps, designing scaling approaches, and shaping incident management processes. Your contributions will span the entire development lifecycle, encompassing the building and maintenance of CI/CD pipelines. Your role will be pivotal in ensuring the platform's scalability to support Grubhub's continuously expanding customer base, evidenced by the addition of 30 new campuses and a 25% year-over-year increase in order volume.

Duties will include, but are not limited to:

  • Architecting resilient and self-healing solutions.
  • Managing AWS infrastructure.
  • Closing observability gaps.
  • Designing scaling approaches.
  • Shaping incident management processes.
  • Building and maintaining CI/CD pipelines.
  • Co-owning critical production service designs, ensuring their high reliability.
  • Actively driving improvements in reliability and observability using SLOs and telemetry data.
  • Developing and enhancing internal tools and automation software to effectively and safely maintain production services.
  • Leading reliability-focused practices, including Failure Analysis, Load and Capacity Planning, Service Reviews, Architecture Design, and Incident Postmortems.
  • Mentoring junior engineers.

What You Bring to the Table:

Experience:

5+ years of experience.

Technical Skills:

  • Deep knowledge of CI/CD tools (e.g., Jenkins, GitHub Actions).
  • Software engineering experience in Python, Go, or a similar object-oriented language.
  • Proficiency with datastores (MySQL, Mongo, Cassandra, Redis) and message brokers (Kafka/SQS/RabbitMQ).
  • Experience with Microservice Architecture and Application Design.
  • Distributed monitoring experience, including SLOs, metrics, and tracing.
  • Working knowledge of Kubernetes-based software solutions and their ecosystem.
  • Working knowledge of Cloud technologies (AWS, Compute/Containers, Storage, Linux, networking).

Soft Skills:

  • Strong technical writing, documentation, and communication skills.
  • Experience with highly trafficked web-based services.

And Of Course, Perks

  • Private Health Insurance fully covered for the employee
  • New Parent Leave
  • 20 Vacation Days annually
  • 10Bis Card provided for office visits – includes a daily lunch allowance of 65 ILS


  • Tel Aviv, Tel Aviv, Israel JFrog Full time ₪120,000 - ₪180,000 per year

    At JFrog, we're reinventing DevOps to help the world's greatest companies innovate -- and we want you along for the ride. This is a special place with a unique combination of brilliance, spirit and just all-around great people. Here, if you're willing to do more, your career can take off. And since software plays a central role in everyone's lives, you'll be...


  • Tel Aviv, Tel Aviv, Israel Aerospike Full time ₪900,000 - ₪1,200,000 per year

    Aerospike is the real-time database for mission-critical use cases and workloads, including machine learning, generative, and agentic AI. Aerospike powers millions of transactions per second with millisecond latency, at a fraction of the total cost of ownership compared to other databases.Global leaders, including Adobe, Airtel, Barclays, Criteo, DBS Bank,...


  • Tel Aviv, Tel Aviv, Israel Aerospike Full time ₪120,000 - ₪180,000 per year

    Aerospike is the real-time database for mission-critical use cases and workloads, including machine learning, generative, and agentic AI. Aerospike powers millions of transactions per second with millisecond latency, at a fraction of the total cost of ownership compared to other databases. Global leaders, including Adobe, Airtel, Barclays, Criteo, DBS...


  • Tel Aviv, Tel Aviv, Israel Shavit Software Full time ₪90,000 - ₪120,000 per year

    We're Hiring: Site Reliability Engineer Responsibilities:Ensure availability, reliability, and performance of cloud-based systemsMonitor, troubleshoot, and investigate incidentsImprove deployment, scaling, and self-healing processesManage full lifecycle of applications and systems through codeWork with Kubernetes and microservices-based environmentsWrite and...


  • Tel Aviv, Tel Aviv, Israel Cato Networks Full time ₪120,000 - ₪180,000 per year

    Welcome to the future of cloud networking and securityCato Networks is the first company to converge enterprise networking and security into one centralized and global service that is delivered by cloud. It is led by networking and security pioneer Shlomo Kramer (Check Point, Imperva) and early investor (Palo Alto Networks, Exabeem, Trusteer and more)....


  • Tel Aviv, Tel Aviv, Israel Wiz Full time ₪90,000 - ₪120,000 per year

    Come join the company that is reinventing cloud security and empowering businesses to thrive in the cloud. As the fastest-growing startup ever, Wiz is on a mission to help organizations secure cloud environments that will accelerate their businesses. Trusted by security teams all over the world, we have a proven track record of success and a culture that...


  • Tel Aviv, Tel Aviv, Israel Finubit Full time ₪80,000 - ₪120,000 per year

    About Finubit:Finubit is a fast-moving startup creating the bank's next-generation cloud platform — a modern, Kubernetes-native and AI-driven foundation that powers engineering for over a thousand developers.We're rethinking how banks build, deploy, and operate systems at scale — combining GitOps, ChatOps, and AI automation to enable...


  • Tel Aviv, Tel Aviv, Israel Navan Full time $104,000 - $130,878 per year

    At , we're building the next generation of AI-powered workforces. As a dedicated team within Navan, our mission is to advance the state of agentic AI. We are the builders of Navan Cognition: a multi-agent AI platform that has already transformed our internal operations by handling challenging, real-world business processes with a focus on reliability and...


  • Tel Aviv, Tel Aviv, Israel Tripeur - a Navan company Full time $104,000 - $130,878 per year

    At , we're building the next generation of AI-powered workforces. As a dedicated team within Navan, our mission is to advance the state of agentic AI. We are the builders of Navan Cognition: a multi-agent AI platform that has already transformed our internal operations by handling challenging, real-world business processes with a focus on reliability and...

  • Reliability Engineer

    2 weeks ago


    Tel Aviv, Tel Aviv, Israel Navan Full time ₪90,000 - ₪120,000 per year

    At , we're building the next generation of AI-powered workforces. As a dedicated team within Navan, our mission is to advance the state of agentic AI. We are the builders of Navan Cognition: a multi-agent AI platform that has already transformed our internal operations by handling challenging, real-world business processes with a focus on reliability and...