Sr. Site Reliability Engineer
2 weeks ago
At , we're building the next generation of AI-powered workforces. As a dedicated team within Navan, our mission is to advance the state of agentic AI. We are the builders of Navan Cognition: a multi-agent AI platform that has already transformed our internal operations by handling challenging, real-world business processes with a focus on reliability and accuracy. Now, we're taking the next step by opening this technology up to other companies.
Joining our team means joining the frontline of AI innovation, crafting the foundation for a rapidly unfolding, AI-powered business era.What You'll Do:
- Design, build, and support tooling, automation, and infrastructure to maximize the reliability, scalability, and performance of Navan Cognition.
- Proactively identify, mitigate, and resolve issues, leveraging AI-driven insights and automation where possible.
- Develop robust monitoring, alerting, and incident response strategies; ensure actionable observability across all critical systems.
- Drive best practices in CI/CD, Infrastructure-as-Code, environment provisioning, and disaster recovery.
- Collaborate closely with engineering teams to build, deploy, and maintain highly available services in production.
- Take responsibility for uptime, reliability, and the operational excellence of Navan Cognition.
- Help define and measure SLOs/SLAs to ensure world-class service delivery.
- 3+ years in Site Reliability, DevOps, or related Infrastructure Engineering roles in 24/7 production environments.
- Deep experience operating, automating, and supporting distributed systems on AWS or similar clouds.
- Experience with Infrastructure-as-Code (e.g., Terraform, CloudFormation) and CI/CD tooling (e.g., Jenkins, Github Actions, etc.).
- Strong skills in Python, Bash, or comparable scripting languages for automation.
- Hands-on experience with observability stacks (e.g., New Relic, Grafana, CloudWatch, Datadog) and incident response.
- Familiarity with microservices architectures and patterns for resilience/scalability (e.g., throttling, retries, circuit breakers).
- Experience with common data stores (MySQL/RDS, DocumentDB, Elasticsearch, Redis).
- Working knowledge of backends (bonus: performance optimization and monitoring); experience with Java, Python, or Go is a plus.
- Interest or experience in applying AI for infrastructure automation, monitoring, or optimization (a strong plus).
- A collaborative mindset with strong communication skills, able to work independently and comfortably across teams and disciplines.
Thrives in a fast-paced, high-growth environment and ready to tackle complex system challenges at scale.
Data-driven, analytical thinker with the ability to dive into metrics, identify insights, and drive product improvements
- Startup-ready: thrive in fast-paced, ambiguous environments; bias for learning, action, and innovation
-
Sr. Site Reliability Engineer
2 weeks ago
Tel Aviv, Tel Aviv, Israel Tripeur - a Navan company Full time $104,000 - $130,878 per yearAt , we're building the next generation of AI-powered workforces. As a dedicated team within Navan, our mission is to advance the state of agentic AI. We are the builders of Navan Cognition: a multi-agent AI platform that has already transformed our internal operations by handling challenging, real-world business processes with a focus on reliability and...
-
Site Reliability Engineering Manager
2 weeks ago
Tel Aviv, Tel Aviv, Israel JFrog Full time ₪120,000 - ₪180,000 per yearAt JFrog, we're reinventing DevOps to help the world's greatest companies innovate -- and we want you along for the ride. This is a special place with a unique combination of brilliance, spirit and just all-around great people. Here, if you're willing to do more, your career can take off. And since software plays a central role in everyone's lives, you'll be...
-
Site Reliability Engineer
2 weeks ago
Tel Aviv, Tel Aviv, Israel Shavit Software Full time ₪90,000 - ₪120,000 per yearWe're Hiring: Site Reliability Engineer Responsibilities:Ensure availability, reliability, and performance of cloud-based systemsMonitor, troubleshoot, and investigate incidentsImprove deployment, scaling, and self-healing processesManage full lifecycle of applications and systems through codeWork with Kubernetes and microservices-based environmentsWrite and...
-
Site Reliability Engineer
3 days ago
Tel Aviv, Tel Aviv, Israel Cato Networks Full time ₪120,000 - ₪180,000 per yearWelcome to the future of cloud networking and securityCato Networks is the first company to converge enterprise networking and security into one centralized and global service that is delivered by cloud. It is led by networking and security pioneer Shlomo Kramer (Check Point, Imperva) and early investor (Palo Alto Networks, Exabeem, Trusteer and more)....
-
Senior Site Reliability Engineer
5 days ago
Tel Aviv, Tel Aviv, Israel Aerospike Full time ₪900,000 - ₪1,200,000 per yearAerospike is the real-time database for mission-critical use cases and workloads, including machine learning, generative, and agentic AI. Aerospike powers millions of transactions per second with millisecond latency, at a fraction of the total cost of ownership compared to other databases.Global leaders, including Adobe, Airtel, Barclays, Criteo, DBS Bank,...
-
Senior Site Reliability Engineer
3 days ago
Tel Aviv, Tel Aviv, Israel Aerospike Full time ₪120,000 - ₪180,000 per yearAerospike is the real-time database for mission-critical use cases and workloads, including machine learning, generative, and agentic AI. Aerospike powers millions of transactions per second with millisecond latency, at a fraction of the total cost of ownership compared to other databases. Global leaders, including Adobe, Airtel, Barclays, Criteo, DBS...
-
Site Reliability Engineer
2 weeks ago
Tel Aviv, Tel Aviv, Israel Finubit Full time ₪80,000 - ₪120,000 per yearAbout Finubit:Finubit is a fast-moving startup creating the bank's next-generation cloud platform — a modern, Kubernetes-native and AI-driven foundation that powers engineering for over a thousand developers.We're rethinking how banks build, deploy, and operate systems at scale — combining GitOps, ChatOps, and AI automation to enable...
-
Site Reliability Engineer
3 days ago
Tel Aviv, Tel Aviv, Israel Wiz Full time ₪90,000 - ₪120,000 per yearCome join the company that is reinventing cloud security and empowering businesses to thrive in the cloud. As the fastest-growing startup ever, Wiz is on a mission to help organizations secure cloud environments that will accelerate their businesses. Trusted by security teams all over the world, we have a proven track record of success and a culture that...
-
Lead Site Reliability Engineer
2 weeks ago
Tel Aviv, Tel Aviv, Israel Grubhub Full timeWhy Work For UsGrubhub, part of Wonder Group Inc, is all about connecting hungry diners with our network of over 375,000 merchants nationwide. Innovative technology, user-friendly platforms and streamlined delivery capabilities set us apart and make us an industry leader in the world of online food ordering. When you join our team, you become part of a...
-
Reliability Engineer
2 weeks ago
Tel Aviv, Tel Aviv, Israel Navan Full time ₪90,000 - ₪120,000 per yearAt , we're building the next generation of AI-powered workforces. As a dedicated team within Navan, our mission is to advance the state of agentic AI. We are the builders of Navan Cognition: a multi-agent AI platform that has already transformed our internal operations by handling challenging, real-world business processes with a focus on reliability and...