Lead Site Reliability Engineer
2 weeks ago
Why Work For Us
Grubhub, part of Wonder Group Inc, is all about connecting hungry diners with our network of over 375,000 merchants nationwide. Innovative technology, user-friendly platforms and streamlined delivery capabilities set us apart and make us an industry leader in the world of online food ordering. When you join our team, you become part of a community that works together to innovate, solve problems, grow, work hard and have a ton of fun in the process
The Impact You Will Make:
This role is crucial for simplifying the dining experience for students across the US. You will be instrumental in architecting resilient and self-healing solutions, managing AWS infrastructure, closing observability gaps, designing scaling approaches, and shaping incident management processes. Your contributions will span the entire development lifecycle, encompassing the building and maintenance of CI/CD pipelines. Your role will be pivotal in ensuring the platform's scalability to support Grubhub's continuously expanding customer base, evidenced by the addition of 30 new campuses and a 25% year-over-year increase in order volume.
Duties will include, but are not limited to:
- Architecting resilient and self-healing solutions.
- Managing AWS infrastructure.
- Closing observability gaps.
- Designing scaling approaches.
- Shaping incident management processes.
- Building and maintaining CI/CD pipelines.
- Co-owning critical production service designs, ensuring their high reliability.
- Actively driving improvements in reliability and observability using SLOs and telemetry data.
- Developing and enhancing internal tools and automation software to effectively and safely maintain production services.
- Leading reliability-focused practices, including Failure Analysis, Load and Capacity Planning, Service Reviews, Architecture Design, and Incident Postmortems.
- Mentoring junior engineers.
What You Bring to the Table:
Experience:
5+ years of experience.
Technical Skills:
- Deep knowledge of CI/CD tools (e.g., Jenkins, GitHub Actions).
- Software engineering experience in Python, Go, or a similar object-oriented language.
- Proficiency with datastores (MySQL, Mongo, Cassandra, Redis) and message brokers (Kafka/SQS/RabbitMQ).
- Experience with Microservice Architecture and Application Design.
- Distributed monitoring experience, including SLOs, metrics, and tracing.
- Working knowledge of Kubernetes-based software solutions and their ecosystem.
- Working knowledge of Cloud technologies (AWS, Compute/Containers, Storage, Linux, networking).
Soft Skills:
- Strong technical writing, documentation, and communication skills.
- Experience with highly trafficked web-based services.
And Of Course, Perks
- Private Health Insurance fully covered for the employee
- New Parent Leave
- 20 Vacation Days annually
- 10Bis Card provided for office visits – includes a daily lunch allowance of 65 ILS
-
Site Reliability Engineering Manager
2 weeks ago
Tel Aviv, Tel Aviv, Israel JFrog Full time ₪120,000 - ₪180,000 per yearAt JFrog, we're reinventing DevOps to help the world's greatest companies innovate -- and we want you along for the ride. This is a special place with a unique combination of brilliance, spirit and just all-around great people. Here, if you're willing to do more, your career can take off. And since software plays a central role in everyone's lives, you'll be...
-
Senior Site Reliability Engineer
4 days ago
Tel Aviv, Tel Aviv, Israel Aerospike Full time ₪900,000 - ₪1,200,000 per yearAerospike is the real-time database for mission-critical use cases and workloads, including machine learning, generative, and agentic AI. Aerospike powers millions of transactions per second with millisecond latency, at a fraction of the total cost of ownership compared to other databases.Global leaders, including Adobe, Airtel, Barclays, Criteo, DBS Bank,...
-
Senior Site Reliability Engineer
2 days ago
Tel Aviv, Tel Aviv, Israel Aerospike Full time ₪120,000 - ₪180,000 per yearAerospike is the real-time database for mission-critical use cases and workloads, including machine learning, generative, and agentic AI. Aerospike powers millions of transactions per second with millisecond latency, at a fraction of the total cost of ownership compared to other databases. Global leaders, including Adobe, Airtel, Barclays, Criteo, DBS...
-
Site Reliability Engineer
2 weeks ago
Tel Aviv, Tel Aviv, Israel Shavit Software Full time ₪90,000 - ₪120,000 per yearWe're Hiring: Site Reliability Engineer Responsibilities:Ensure availability, reliability, and performance of cloud-based systemsMonitor, troubleshoot, and investigate incidentsImprove deployment, scaling, and self-healing processesManage full lifecycle of applications and systems through codeWork with Kubernetes and microservices-based environmentsWrite and...
-
Site Reliability Engineer
2 days ago
Tel Aviv, Tel Aviv, Israel Cato Networks Full time ₪120,000 - ₪180,000 per yearWelcome to the future of cloud networking and securityCato Networks is the first company to converge enterprise networking and security into one centralized and global service that is delivered by cloud. It is led by networking and security pioneer Shlomo Kramer (Check Point, Imperva) and early investor (Palo Alto Networks, Exabeem, Trusteer and more)....
-
Site Reliability Engineer
2 days ago
Tel Aviv, Tel Aviv, Israel Wiz Full time ₪90,000 - ₪120,000 per yearCome join the company that is reinventing cloud security and empowering businesses to thrive in the cloud. As the fastest-growing startup ever, Wiz is on a mission to help organizations secure cloud environments that will accelerate their businesses. Trusted by security teams all over the world, we have a proven track record of success and a culture that...
-
Site Reliability Engineer
2 weeks ago
Tel Aviv, Tel Aviv, Israel Finubit Full time ₪80,000 - ₪120,000 per yearAbout Finubit:Finubit is a fast-moving startup creating the bank's next-generation cloud platform — a modern, Kubernetes-native and AI-driven foundation that powers engineering for over a thousand developers.We're rethinking how banks build, deploy, and operate systems at scale — combining GitOps, ChatOps, and AI automation to enable...
-
Sr. Site Reliability Engineer
2 weeks ago
Tel Aviv, Tel Aviv, Israel Navan Full time $104,000 - $130,878 per yearAt , we're building the next generation of AI-powered workforces. As a dedicated team within Navan, our mission is to advance the state of agentic AI. We are the builders of Navan Cognition: a multi-agent AI platform that has already transformed our internal operations by handling challenging, real-world business processes with a focus on reliability and...
-
Sr. Site Reliability Engineer
2 weeks ago
Tel Aviv, Tel Aviv, Israel Tripeur - a Navan company Full time $104,000 - $130,878 per yearAt , we're building the next generation of AI-powered workforces. As a dedicated team within Navan, our mission is to advance the state of agentic AI. We are the builders of Navan Cognition: a multi-agent AI platform that has already transformed our internal operations by handling challenging, real-world business processes with a focus on reliability and...
-
Reliability Engineer
2 weeks ago
Tel Aviv, Tel Aviv, Israel Navan Full time ₪90,000 - ₪120,000 per yearAt , we're building the next generation of AI-powered workforces. As a dedicated team within Navan, our mission is to advance the state of agentic AI. We are the builders of Navan Cognition: a multi-agent AI platform that has already transformed our internal operations by handling challenging, real-world business processes with a focus on reliability and...