Senior Site Reliability Engineer
6 days ago
Why Work For Us
Grubhub, part of Wonder Group Inc, is all about connecting hungry diners with our network of over 375,000 merchants nationwide. Innovative technology, user-friendly platforms and streamlined delivery capabilities set us apart and make us an industry leader in the world of online food ordering. When you join our team, you become part of a community that works together to innovate, solve problems, grow, work hard and have a ton of fun in the process
About the Opportunity:
Grubhub, a leader in connecting diners with restaurants nationwide, is seeking a Senior Site Reliability Engineer to join our Campus and On-Site team. This role is crucial for simplifying the dining experience for students across the US. You will be instrumental in architecting resilient and self-healing solutions, managing AWS infrastructure, closing observability gaps, designing scaling approaches, and shaping incident management processes. Your contributions will span the entire development lifecycle, encompassing the building and maintenance of CI/CD pipelines. Collaboration with other SRE teams is vital for guidance, knowledge sharing, and fostering camaraderie.
About the Team:
Our On-Site SRE team is dedicated to building more resilient and self-healing solutions. You'll contribute to managing AWS infrastructure, addressing observability challenges, designing scalable systems, and refining incident management processes. We emphasize close collaboration with other SRE teams for mutual support, knowledge exchange, and team spirit. You will also partner with service owners to design and build robust CI/CD pipelines and contribute to the long-term architectural vision of our products.
The Day to Day:
As an SRE within the "Runtime Engineering" organization, you will co-own critical production service designs, ensuring their high reliability. You will actively drive improvements in reliability and observability using SLOs and telemetry data. Your responsibilities include developing and enhancing internal tools and automation software to effectively and safely maintain production services. You will also lead reliability-focused practices, including Failure Analysis, Load and Capacity Planning, Service Reviews, Architecture Design, and Incident Postmortems. As a senior engineer, you will also be responsible for mentoring junior engineers.
What You'll Need:
- Experience:
- Senior SRE:
4+ years of experience - SRE II:
2+ years of experience - Technical Skills:
- Deep knowledge of CI/CD tools (e.g., Jenkins, GitHub Actions).
- Software engineering experience in Python, Go, or a similar object-oriented language.
- Proficiency with datastores (MySQL, Mongo, Cassandra, Redis) and message brokers (Kafka/SQS/RabbitMQ).
- Experience with Microservice Architecture and Application Design.
- Distributed monitoring experience, including SLOs, metrics, and tracing.
- Working knowledge of Kubernetes-based software solutions and their ecosystem.
Working knowledge of Cloud technologies (AWS, Compute/Containers, Storage, Linux, networking).
Soft Skills:
- Strong technical writing, documentation, and communication skills.
- Experience with highly trafficked web-based services.
About Our Tech:
The On-Site tech stack primarily utilizes Python, with some services written in Go, for tooling, automation, and service code. We leverage Django as our primary web framework. For monitoring, we use New Relic and Splunk. Our robust infrastructure is built with Infrastructure as Code (IaC) using Terraspace (wrapped around Terraform). Our services run on Kubernetes, deployed via Helm. Our cloud technologies encompass various AWS services, including EKS, S3, ElastiCache, and Lambda. Data technologies include MongoDB, MySQL (RDS), Redis (ElasticCache), RabbitMQ, and Kafka. CI/CD is managed through Jenkins. The On-Site tech stack handles a significant portion of Grubhub's daily orders and is rapidly growing. Your role will be pivotal in ensuring the platform's scalability to support our continuously expanding customer base, evidenced by the addition of 30 new campuses and a 25% year-over-year increase in order volume.
Perks:
We offer flexible PTO, comprehensive health programs, abundant opportunities for learning and career growth, and engaging events led by our Culture Crew. Grubhub is an equal opportunity employer committed to diversity and inclusion. We value innovation, problem-solving, calculated risk-taking, hard work, and, most importantly, having a lot of fun
-
Senior Site Reliability Engineer
2 weeks ago
Tel Aviv, Tel Aviv, Israel Aerospike Full time ₪120,000 - ₪180,000 per yearAerospike is the real-time database for mission-critical use cases and workloads, including machine learning, generative, and agentic AI. Aerospike powers millions of transactions per second with millisecond latency, at a fraction of the total cost of ownership compared to other databases. Global leaders, including Adobe, Airtel, Barclays, Criteo, DBS...
-
Site Reliability Engineer
8 hours ago
Tel Aviv, Tel Aviv, Israel YozmaTech Full time ₪80,000 - ₪120,000 per yearWho We AreWalnutis a fast growing startup in the sales automation space, backed bytop-notch investors.Wehelp some of the best sales teams in the world to dramaticallyperform better, by easily creating failure-free, interactive, andpersonalized demo experiences for each prospect - without coding.Weare a young company that values culture, transparency,...
-
Site Reliability Engineer
8 hours ago
Tel Aviv, Tel Aviv, Israel Cato Networks Full time ₪90,000 - ₪120,000 per yearNow we're looking for a visionary Site Reliability Engineer to join the R&D team. In this critical role, you will support our growing operation, network, and systems. You will play a pivotal role in administering our internal systems as well as participate in key design decisions. In this position, you can innovate, build best practice processes, and...
-
Site Reliability Engineer
8 hours ago
Tel Aviv, Tel Aviv, Israel Taboola Full time ₪120,000 - ₪180,000 per yearRealize your potential by joining the leading performance-driven advertising companyAsSite Reliability Engineeron the IT Production team in our TLV Office, you'll play a vital role in building robust services and solving infrastructure challenges with automations while working with cutting-edge technologies and bringing those to their limits on our mostly...
-
Site Reliability Engineer
8 hours ago
Tel Aviv, Tel Aviv, Israel Taboola Full time ₪120,000 - ₪180,000 per yearRealize your potential by joining the leading performance-driven advertising company As Site Reliability Engineer on the IT Production team in our TLV Office, you'll play a vital role in building robust services and solving infrastructure challenges with automations while working with cutting-edge technologies and bringing those to their limits on our...
-
Site Reliability Engineer
2 weeks ago
Tel Aviv, Tel Aviv, Israel Cato Networks Full time ₪120,000 - ₪180,000 per yearWelcome to the future of cloud networking and securityCato Networks is the first company to converge enterprise networking and security into one centralized and global service that is delivered by cloud. It is led by networking and security pioneer Shlomo Kramer (Check Point, Imperva) and early investor (Palo Alto Networks, Exabeem, Trusteer and more)....
-
Site Reliability Engineer
8 hours ago
Tel Aviv, Tel Aviv, Israel Wiz Full time ₪100,000 - ₪120,000 per yearSUMMARY We're looking for a Site Reliability Engineer (SRE) to join the R&D team and spread the power of Wiz. In this role, you'll design and build scalable systems to ensure Wiz runs seamlessly and supports our rapid business growth. You'll be part of an innovative, high-performing team, working with cutting-edge technologies in a fast-paced, agile...
-
Staff Site Reliability Engineer
1 week ago
Tel Aviv, Tel Aviv, Israel Okta Full time ₪120,000 - ₪180,000 per yearGet to know OktaOkta is The World's Identity Company. We free everyone to safely use any technology, anywhere, on any device or app. Our flexible and neutral products, Okta Platform and Auth0 Platform, provide secure access, authentication, and automation, placing identity at the core of business security and growth.At Okta, we celebrate a variety of...
-
Staff Site Reliability Engineer
1 week ago
Tel Aviv, Tel Aviv, Israel Okta Full time ₪80,000 - ₪120,000 per yearGet to know OktaOkta is The World's Identity Company. We free everyone to safely use any technology, anywhere, on any device or app. Our flexible and neutral products, Okta Platform and Auth0 Platform, provide secure access, authentication, and automation, placing identity at the core of business security and growth.At Okta, we celebrate a variety of...
-
Staff Site Reliability Engineer
8 hours ago
Tel Aviv, Tel Aviv, Israel Okta Full time ₪120,000 - ₪180,000 per yearGet to know OktaOkta is The World's Identity Company. We free everyone to safely use any technology, anywhere, on any device or app. Our flexible and neutral products, Okta Platform and Auth0 Platform, provide secure access, authentication, and automation, placing identity at the core of business security and growth.At Okta, we celebrate a variety of...