Site Reliability Engineer
2 weeks ago
With more than 2,000 active customers, ControlUp is a leading digital employee experience improvement platform that offers an integrated monitoring, optimization, security, and compliance solution, transforming the way IT teams manage their environments and troubleshoot issues. Our solutions provide IT professionals with deep visibility and actionable insights, empowering them to proactively enhance end-user experiences while saving hundreds of thousands of dollars and valuable time for millions of active users.
Job Description:
We are seeking a highly skilled Site Reliability Engineer (SRE) to own production stability, system performance, financial operations (FinOps), and cost of goods sold (COGS) management in a large-scale environment. You will work closely with engineering, product, and customer teams to ensure our advanced technology stack is optimized to meet and exceed customer SLAs.
Key Responsibilities- Maintain and improve production stability across a large-scale infrastructure with thousands of Kubernetes nodes and instances.
- Monitor, analyze, and optimize system performance to ensure seamless user experience and SLA adherence.
- Implement and drive FinOps practices to manage cloud cost efficiency and cost of goods sold (COGS) effectively.
- Utilize ControlUp and other advanced monitoring/observability tools to proactively detect issues and ensure SLA compliance.
- Collaborate with development and operations teams to automate deployments, scaling, and incident response.
- Design and implement robust alerting, incident management, and post-mortem processes.
- Continuously evaluate and adopt cutting-edge technologies to improve reliability, performance, and cost efficiency.
- Provide technical guidance and best practices for infrastructure and application scalability.
- Participate in on-call rotations to respond to critical incidents and minimize downtime.
- Proven experience as an SRE or similar role in large-scale environments with thousands of Kubernetes nodes and instances.
- Strong expertise in Kubernetes, container orchestration, and cloud infrastructure (AWS, GCP, Azure, or similar).
- Solid understanding of performance tuning, monitoring, and observability tools (experience with ControlUp is a strong plus).
- Experience with FinOps principles and tools to manage cloud costs and optimize resource utilization.
- Deep knowledge of production incident management, root cause analysis, and SLA management.
- Proficiency in scripting and automation (Python, Go, Bash, etc.).Familiarity with CI/CD pipelines and infrastructure as code (Terraform, Helm, etc.).Excellent communication skills and ability to work collaboratively across teams.
-
IT Site Specialist
5 days ago
Azrieli Building G, Herzliya, Israel Samsung Electronics Full time ₪60,000 - ₪120,000 per yearPosition SummaryLocation: Samsung Research IL -SRIL - Herzliya Department: ITReports To: Managing Director SRILJob Type: Full-Time, hired via a 3rd party outsource company. Overview:The IT Site Specialist is the sole IT representative on-site, responsible for maintaining the reliability and performance of all local IT systems, infrastructure, and network...
-
Plant Electrical Engineer
7 days ago
Israel TAPI Full time ₪104,000 - ₪130,878 per yearתיאור החברהJoin us for an exciting opportunity to take a key role in maintaining and operating the electrical infrastructure at TAPI, one of the most advanced active pharmaceutical ingredient (API) plants in the industry. In this role, you will be responsible for the reliable, safe, and efficient operation of complex electrical systems that support...
-
Manager, SRE Engage
5 days ago
Israel DocuSign Full time ₪120,000 - ₪240,000 per yearCompany OverviewDocusign brings agreements to life. Over 1.5 million customers and more than a billion people in over 180 countries use Docusign solutions to accelerate the process of doing business and simplify people's lives. With intelligent agreement management, Docusign unleashes business-critical data that is trapped inside of documents. Until now,...
-
QA Test Automation Engineer – Engine Team
5 days ago
Israel Nayax Full time ₪60,000 - ₪120,000 per yearJoin us at Nayax, a global fintech leader (NASDAQ; TASE: NYAX) revolutionizing the world of cashless payments, consumer engagement, and business management solutions. With more than 1,200 employees across 12 offices worldwide. At Nayax, you'll be part of a diverse and innovative community where your work makes a real impact and helps shape the future of...
-
Sr. Supplier Quality Engineer
2 weeks ago
Israel VAST Data Full time ₪120,000 - ₪180,000 per yearVAST Data is looking for a Senior Supplier Quality Engineer responsible for working with Vast's partners to ensure HW suppliers develop and implement world class supplier quality programs.This is a great opportunity to be part of one of the fastest-growing infrastructure companies in history, an organization that is in the center of the hurricane being...
-
Director of Engineering
5 days ago
Naimi Park, Or Yehuda, Israel AudioCodes Full time ₪100,000 - ₪120,000 per yearKey Responsibilities:· New Product Introduction (NPI) & R&D Transfer to production.· Manage a team of ~10 engineers across Israel, China, US.· Manage transition of hardware products from R&D to mass production, including documentation, HW readiness, SW readiness, test readiness, and supplier onboarding.· Lead DFM reviews and feedback loops to improve...
-
Head of AI Solutions
2 weeks ago
Israel Kayhut Full time ₪120,000 - ₪180,000 per yearAbout The PositionAbout the RoleWe are seeking a hands-on leader to drive the implementation of AI technologies that improve how developers work day to day. This role is at the intersection of technology, product, and organizational change. You will work closely with engineering managers and developers to embed AI tools into everyday workflows, integrate...
-
Data Center Chief Engineer
5 days ago
Israel Amazon Data Services Israel Full time $120,000 - $180,000 per yearDESCRIPTIONAs Chief Engineer, you will be responsible for ensuring that all electrical, mechanical, and fire/life safety equipment within the data center is operating at peak efficiency. This involves planned preventative maintenance of equipment, daily corrective work, and emergency response. You are expected to be a singular focal point for all facility...
-
Production Engineer
4 days ago
Israel Spinomenal Full time ₪90,000 - ₪120,000 per yearSpinomenal is seeking a technically skilled, hands-on Production Engineer to own release success, system stability, and incident resolution in our fast‑paced iGaming environment. As the gatekeeper to production, you'll coordinate with QA, R&D, DevOps, and Support to ensure every launch is smooth, every issue is contained, and future problems are prevented....
-
QA Engineer
2 weeks ago
Israel West Pharmaceutical Services Full time ₪60,000 - ₪120,000 per yearRequisition ID: 71494Date: Sep 29, 2025Location:ILDepartment: QualityDescription:Job SummaryThis is a backfill for maternity leave. In this role, you will be responsible to control all Quality functions including the expansion and maintenance of the Quality System, provide support for product and process improvements by collecting, compiling, and analyzing...