Senior Site Reliability Engineer
2 days ago
Aerospike is the real-time database for mission-critical use cases and workloads, including machine learning, generative, and agentic AI. Aerospike powers millions of transactions per second with millisecond latency, at a fraction of the total cost of ownership compared to other databases.
Global leaders, including Adobe, Airtel, Barclays, Criteo, DBS Bank, Experian, Grab, HDFC Bank, PayPal, Sony Interactive Entertainment, The Trade Desk, and Wayfair, rely on Aerospike for customer 360, fraud detection, real-time bidding, profile stores, recommendation engines, and other use cases.
At Aerospike, we dream big and deliver even bigger. Our mission is to unleash the power of the world's real-time data with a database built for infinite scale, speed, and sustainability.
If you're ready to shape the future of data, join us.
Senior Site Reliability EngineerAs a Senior Site Reliability Engineer (SRE) for Aerospike, you will be instrumental in designing, building, and optimizing a scalable, highly resilient cloud platform. You will focus on improving reliability, performance, and automation to ensure seamless delivery and operation of our cloud platform services. Your responsibilities will include developing robust infrastructure, implementing intelligent monitoring systems, and driving continuous improvement initiatives that enhance system efficiency, scalability, and overall platform stability.
Key Responsibilities
- Designing, deploying, and optimizing large-scale Aerospike cloud platform infrastructure and services across multiple environments
 - Leading the development and enhancement of automation and infrastructure-as-code solutions to improve operational efficiency
 - Building and maintaining monitoring, alerting, and observability implementations to proactively detect and resolve system issues
 - Leading incident response activities, conducting post-mortems, and driving continuous improvement initiatives
 - Designing and enforcing security best practices for cloud infrastructure and access control
 - Collaborating with development teams to ensure reliable service delivery and alignment with SRE best practices
 - Participating in on-call rotation, responding to critical incidents and minimizing downtime through proactive mitigation strategies
 - Establishing documentation standards, runbooks, and system configurations for team knowledge sharing
 - Leading capacity planning and performance optimization efforts
 - Mentoring junior engineers and sharing knowledge to build team capabilities
 
- 6 years of experience in Site Reliability Engineering (SRE), DevOps, or related fields, with a focus on building scalable, resilient, and automated cloud-based systems
 - Hands-on experience designing, deploying, and optimizing production-grade, business-critical systems in cloud environments
 - Expertise with at least one major public cloud provider (AWS, Google Cloud, or Azure), including cloud-native services and architectures
 - Strong proficiency in infrastructure-as-code (IaC) tools such as Terraform to enable automated and reproducible infrastructure
 - Experience in CI/CD pipeline design and implementation, enabling seamless, automated software delivery and infrastructure updates
 - Deep understanding of Linux/Unix systems, networking fundamentals, and distributed system architectures
 - Proficiency in scripting and software development using Python, Bash, or Go to build automation, tooling, and infrastructure enhancements
 - Experience with containerization and orchestration technologies such as Docker and Kubernetes for efficient service deployment and scaling
 - Hands-on experience with monitoring, logging, and observability tools (e.g., Prometheus, Grafana, Datadog, Elasticsearch, Kibana) to drive data-driven system improvements
 - Strong problem-solving skills with an engineering-first mindset for improving system reliability, scalability, and performance
 - Experience implementing security best practices for cloud infrastructure, access control, and data protection
 - Excellent English communication skills (verbal and written) to collaborate effectively across teams and document key processes
 
Preferred Skills and Qualifications
- Hands-on experience managing and optimizing database deployments and services in production environments, ensuring high availability and performance
 - Familiarity with Aerospike or other distributed NoSQL databases
 - Advanced understanding of security practices and implementation in cloud environments
 - Relevant industry certifications, such as AWS Certified DevOps Engineer, AWS Certified Solutions Architect, Google Professional Cloud DevOps Engineer, or equivalent
 - Kubernetes certifications such as Certified Kubernetes Administrator (CKA), Certified Kubernetes Application Developer (CKAD), or Certified Kubernetes Security Specialist (CKS)
 - Proficiency with configuration management tools (Ansible, Terraform, or similar) in complex environments
 - Experience leading collaborative development practices and advanced version control workflows
 
Aerospike is an Equal Opportunity Employer. We are committed to providing an environment free from discrimination on the basis of race, religion, color, sex, gender identity, sexual orientation, age, non-disqualifying physical or mental disability, national origin, veteran status, or any other basis covered by appropriate law.
- 
					
						Senior Site Reliability Engineer
4 days ago
Tel Aviv, Tel Aviv, Israel Aerospike Full time ₪900,000 - ₪1,200,000 per yearAerospike is the real-time database for mission-critical use cases and workloads, including machine learning, generative, and agentic AI. Aerospike powers millions of transactions per second with millisecond latency, at a fraction of the total cost of ownership compared to other databases.Global leaders, including Adobe, Airtel, Barclays, Criteo, DBS Bank,...
 - 
					
						Site Reliability Engineering Manager
2 weeks ago
Tel Aviv, Tel Aviv, Israel JFrog Full time ₪120,000 - ₪180,000 per yearAt JFrog, we're reinventing DevOps to help the world's greatest companies innovate -- and we want you along for the ride. This is a special place with a unique combination of brilliance, spirit and just all-around great people. Here, if you're willing to do more, your career can take off. And since software plays a central role in everyone's lives, you'll be...
 - 
					
Site Reliability Engineer
2 weeks ago
Tel Aviv, Tel Aviv, Israel Shavit Software Full time ₪90,000 - ₪120,000 per yearWe're Hiring: Site Reliability Engineer Responsibilities:Ensure availability, reliability, and performance of cloud-based systemsMonitor, troubleshoot, and investigate incidentsImprove deployment, scaling, and self-healing processesManage full lifecycle of applications and systems through codeWork with Kubernetes and microservices-based environmentsWrite and...
 - 
					
						Site Reliability Engineer
2 days ago
Tel Aviv, Tel Aviv, Israel Cato Networks Full time ₪120,000 - ₪180,000 per yearWelcome to the future of cloud networking and securityCato Networks is the first company to converge enterprise networking and security into one centralized and global service that is delivered by cloud. It is led by networking and security pioneer Shlomo Kramer (Check Point, Imperva) and early investor (Palo Alto Networks, Exabeem, Trusteer and more)....
 - 
					
						Site Reliability Engineer
2 weeks ago
Tel Aviv, Tel Aviv, Israel Finubit Full time ₪80,000 - ₪120,000 per yearAbout Finubit:Finubit is a fast-moving startup creating the bank's next-generation cloud platform — a modern, Kubernetes-native and AI-driven foundation that powers engineering for over a thousand developers.We're rethinking how banks build, deploy, and operate systems at scale — combining GitOps, ChatOps, and AI automation to enable...
 - 
					
Sr. Site Reliability Engineer
2 weeks ago
Tel Aviv, Tel Aviv, Israel Navan Full time $104,000 - $130,878 per yearAt , we're building the next generation of AI-powered workforces. As a dedicated team within Navan, our mission is to advance the state of agentic AI. We are the builders of Navan Cognition: a multi-agent AI platform that has already transformed our internal operations by handling challenging, real-world business processes with a focus on reliability and...
 - 
					
Site Reliability Engineer
2 days ago
Tel Aviv, Tel Aviv, Israel Wiz Full time ₪90,000 - ₪120,000 per yearCome join the company that is reinventing cloud security and empowering businesses to thrive in the cloud. As the fastest-growing startup ever, Wiz is on a mission to help organizations secure cloud environments that will accelerate their businesses. Trusted by security teams all over the world, we have a proven track record of success and a culture that...
 - 
					
						Senior Site Reliability and Production Engineer
11 hours ago
Tel Aviv, Tel Aviv, Israel Palo Alto Networks Full time ₪90,000 - ₪120,000 per yearCompany Description Our MissionAt Palo Alto Networks everything starts and ends with our mission:Being the cybersecurity partner of choice, protecting our digital way of life.Our vision is a world where each day is safer and more secure than the one before. We are a company built on the foundation of challenging and disrupting the way things are done, and...
 - 
					
						Sr. Site Reliability Engineer
2 weeks ago
Tel Aviv, Tel Aviv, Israel Tripeur - a Navan company Full time $104,000 - $130,878 per yearAt , we're building the next generation of AI-powered workforces. As a dedicated team within Navan, our mission is to advance the state of agentic AI. We are the builders of Navan Cognition: a multi-agent AI platform that has already transformed our internal operations by handling challenging, real-world business processes with a focus on reliability and...
 - 
					
						Lead Site Reliability Engineer
2 weeks ago
Tel Aviv, Tel Aviv, Israel Grubhub Full timeWhy Work For UsGrubhub, part of Wonder Group Inc, is all about connecting hungry diners with our network of over 375,000 merchants nationwide. Innovative technology, user-friendly platforms and streamlined delivery capabilities set us apart and make us an industry leader in the world of online food ordering. When you join our team, you become part of a...