72 Condition Monitoring jobs in Ireland
Site Reliability Engineer

Posted 27 days ago
Job Viewed
Job Description
Site Reliability Engineer
**About** **Trellix:**
**Trellix, the trusted CISO ally, is redefining the future of cybersecurity and soulful work.** Our comprehensive, GenAI-powered platform helps organizations confronted by today's most advanced threats gain confidence in the protection and resilience of their operations. Along with an extensive partner ecosystem, we accelerate technology innovation through artificial intelligence, automation, and analytics to empower over 53,000 customers with responsibly architected security solutions.
We also recognize the importance of closing the 4-million-person cybersecurity talent gap. We aim to create a home for anyone seeking a meaningful future in cybersecurity and look for candidates across industries to join us in soulful work. More at .
**_Role Overview:_**
We are seeking a talented SRE Engineer to join the dynamic Trellix EDR Cloud DevOps and SRE team. As a key member of our engineering team, you will be responsible for designing, developing, and deploying robust and scalable software solutions. You will work closely with cross-functional teams (Engineering, DevOps and SRE teams) to deliver innovative solutions that meet our business objectives. You will also be responsible for building and delivering world-class CI/CD pipelines to support an highly scalable and secure cloud environment in addition to supporting cloud service operations, deployments, and security.
**Responsibilities:**
+ **Operations:**
+ **Part of a global team providing operational & escalation coverage including event response and recovery efforts of critical services.**
+ **Periodic deployment of features, patches and hotfixes to maintain the Security posture of our Cloud Services.**
+ **Have ownership and responsibility for high availability of Production environments**
+ **Ability to work in shifts on a rotational basis and participate in On-Call duties**
+ **Assist with creating and updating runbooks & SOPs**
+ **Input into the monitoring of systems applications and supporting data.**
+ **Report on system uptime and availability**
+ **Software Development:**
+ **Design, develop, and maintain secure and high-quality CI/CD pipelines following industry best practices.**
+ **Design, develop and maintain Infrastructure as Code(IaC) platforms for efficient build and management of infrastructure in the public cloud platforms.**
+ **Build, Deploy and manage applications in the public cloud environments using containerization technologies like Docker and Kubernetes.**
+ **Contribute to the design and implementation of microservices architecture, breaking down complex systems into smaller, independent services.**
+ **Possess a basic understanding of dashboard design principles and tools to visualize data effectively.**
+ **Experience using modern Monitoring and Alerting tools (Prometheus, Grafana, PagerDuty, etc.)**
+ **Problem-Solving:**
+ **Troubleshoot and debug complex software issues, leveraging your strong problem-solving skills.**
+ **Conduct thorough root cause analysis, analyze system logs, and collaborate with cross-functional teams to implement robust solutions.**
+ **Collaboration & Learning:**
+ **Work closely with product managers, designers, and other engineers to translate business requirements into technical solutions.**
+ **Work effectively with teams across the organization to deliver projects on time and within budget.**
+ **Stay up-to-date with the latest technologies and industry trends to drive innovation.**
**Qualifications:**
+ **Technical Skills:**
+ **3-4 years of experience in software development, with strong proficiency in Python or Go.**
+ **Experience with cloud platforms (AWS or GCP) and containerization technologies (Docker, Kubernetes) is essential.**
+ **Problem-Solving:** **A keen eye for detail and a knack for troubleshooting complex issues.**
+ **Communication:** **Excellent communication and collaboration skills to work effectively with diverse teams.**
+ **Learning Agility:** **A passion for learning and a drive to stay up-to-date with emerging technologies.**
**Preferred Qualifications:**
+ **Automation:** **Experience with CI/CD pipelines and automation tools (e.g., Jenkins, GitLab CI/CD).**
+ **Certifications:** **AWS and/or Kubernetes Certification**
+ **Database:** **Knowledge of database systems (SQL and NoSQL) for data storage and retrieval.**
+ **Security:** **Understanding of security best practices to build secure applications.**
+ **DevOps:** **Exposure to DevOps principles and practices to streamline development and deployment processes.**
**Education:** **Bachelors in Computer Science or a related field.**
**_Company Benefits and Perks:_**
We believe that the best solutions are developed by teams who embrace each other's unique experiences, skills, and abilities. We work hard to create a dynamic workforce where we encourage everyone to bring their authentic selves to work every day. We offer a variety of social programs, flexible work hours and family-friendly benefits to all of our employees.
+ Retirement Plans
+ Medical, Dental and Vision Coverage
+ Paid Time Off
+ Paid Parental Leave
+ Support for Community Involvement
We're serious about our commitment to a workplace where everyone can thrive and contribute to our industry-leading products and customer support, which is why we prohibit discrimination and harassment based on race, color, religion, gender, national origin, age, disability, veteran status, marital status, pregnancy, gender expression or identity, sexual orientation or any other legally protected status.
Sr Site Reliability Engineer
Posted 3 days ago
Job Viewed
Job Description
Site Reliability Engineers must be passionate about learning and evolving with current technology trends. They strive to innovate and are relentless in pursuing a flawless customer experience. They have an "automate everything" mindset, helping us bring value to our customers by deploying services with incredible speed, consistency, and availability.
**Job Responsibilities:**
+ Engage in and improve the lifecycle of services from conception to EOL, including system designconsulting, and capacity planning
+ Define and implement standards and best practices related to: System Architecture, Servicedelivery, metrics and the automation of operational tasks
+ Support services, product & engineering teams by providing common tooling and frameworks todeliver increased availability and improved incident response
+ Improve system performance, application delivery and efficiency through automation, processrefinement, postmortem reviews, and in-depth configuration analysis
+ Collaborate closely with engineering professionals within the organization to deliver reliableservices
+ Increase operational efficiency, effectiveness, and quality of services by treating operationalchallenges as a software engineering problem (reduce toil)
+ Guide junior team members and serve as a champion for SiteReliability Engineering
+ Actively participate in incident response, including on-call responsibilities
**Required Qualifications:**
+ Must have at least 3 years of hands-on experience working in Engineering or Cloud
+ Minimum 2 years' experience with public cloud platforms (e.g. GCP, AWS, Azure)
+ Minimum 2 years' Experience in configuration and maintenance of applications and/orsystems infrastructure for large scale customer facing company
+ Experience coding in higher-level languages (e.g., Python, JavaScript, C++, or Java)
**Preferred Qualifications:**
+ Knowledge of Cloud based applications & Containerization Technologies
+ Demonstrated understanding of best practices in metric generation and collection, log aggregationpipelines
+ Demonstrable fundamentals in 2 of the following: Computer Science, Cloud architecture, Security,or Network Design fundamentals Demonstrable fundamentals in 2 of the following: Computer Science, Cloud architecture, Security, or Network Design fundamentals
**Where we're going**
UKG is on the cusp of something truly special. Worldwide, we already hold the #1 market share position for workforce management and the #2 position for human capital management. Tens of millions of frontline workers start and end their days with our software, with billions of shifts managed annually through UKG solutions today. Yet it's our AI-powered product portfolio designed to support customers of all sizes, industries, and geographies that will propel us into an even brighter tomorrow!
UKG is proud to be an equal opportunity employer and is committed to promoting diversity and inclusion in the workplace, including the recruitment process.
Disability Accommodation in the Application and Interview Process
For individuals with disabilities that need additional assistance at any point in the application and interview process, please email
NOTICE ON HIRING SCAMS
UKG will never ask you for a copy of your driver's license, social security card, or passport during a job interview. For new hires, we do not ask for payment for equipment purchase, cost for training, or to receive onboarding documents. UKG does not make job offers outside of our formal hiring process. To help protect yourself against potential hiring scams, learn more about our formal hiring process, outlined here ( .
ABOUT OUR JOB DESCRIPTIONS
All job descriptions are written to accurately reflect the open job and include general work responsibilities. They do not present a comprehensive, detailed inventory of all duties, responsibilities, and qualifications required for the job. Management reserves the right to revise the job or require that other or different tasks be performed if or when circumstances change.
It is the policy of Ultimate Software to promote and assure equal employment opportunity for all current and prospective Peeps without regard to race, color, religion, sex, age, disability, marital status, familial status, sexual orientation, pregnancy, genetic information, gender identity, gender expression, national origin, ancestry, citizenship status, veteran status, and any other legally protected status entitled to protection under federal, state, or local anti-discrimination laws. This policy governs all matters related to recruitment, advertising, and initial selection of employment. It shall also apply to all other aspects of employment, including, but not limited to, compensation, promotion, demotion, transfer, lay-offs, terminations, leave of absence, and training opportunities.
Staff Site Reliability Engineer

Posted 27 days ago
Job Viewed
Job Description
This is an exciting opportunity for someone who is passionate about driving innovation, enhancing service reliability, and making a tangible impact on the organization's success.
**What you get to do in this role:**
+ Provide relief and sustainable resolution to issues within our infrastructure.
+ Use your knowledge and experience in software development, systems engineering, and networking to proactively prevent repeatable issues.
+ Lead internal stakeholders and partner teams to improve the reliability, scalability and performance of the infrastructure through improved system design.
+ Champion and contribute to a culture of intolerance to manual activity, which results in an automation environment delivering repeatable and scalable response to system issues.
**To be successful in this role you have:**
+ Experience in leveraging or critically thinking about how to integrate AI into work processes, decision-making, or problem-solving. This may include using AI-powered tools, automating workflows, analyzing AI-driven insights, or exploring AI's potential impact on the function or industry.
+ Excellent Knowledge of Linux systems.
+ Comfortable designing, authoring, testing, and debugging code in a team setting in one of the following languages such as Python, Go, Java, or Ruby.
+ Experience working with systems at scale - supporting critical services with focus on automation, observability, availability, and performance.
+ Experience with MySQL and PostgreSQL database administration, troubleshooting, and performance tuning.
+ Develop and maintain telemetry and monitoring solutions using OpenTelemetry standards to gain deep insights into system behaviour, proactively address issues, optimise performance, and improve efficiency through comprehensive data collection, analysis, and visualisation.
+ Proven experience in defining and managing SLAs.
+ Collaborate with development teams to ensure new services align with architectural standards and best practices.
Good to have:
+ Expertise in Observability and Monitoring of applications, services, and networks at scale.
+ Experience with DevOps automation, CI/CD pipeline and agile methodologies such as Gitlab CI-CD.
+ Experience writing test specifications and understand the fundamentals of test automation.
+ Experience working with Cloud technologies such as Azure and AWS.
+ Experience in configuration management of infrastructure using Ansible.
+ Experience with Kubernetes to orchestrate the deployment, scaling, and management of containers.
+ Hands-on experience with Microsoft Azure, Google Cloud (GCP) and Amazon Web Services (AWS), including designing, implementing, and maintaining reliable and scalable systems.
We also have pluses! They are not a 'must', but please highlight them on your resume if you have any of these: experience with cloud engineering, knowledge of core AI/ML techniques and algorithms, familiar with implementing Chaos engineering principles, experience in incident response process, post-mortem practices, or service best practice standards and web applications engineering.
**What you can expect from us:**
At ServiceNow, we make work better for everyone - including our own employees. We know that your best work happens when you live your best life and share your unique talents, so we do everything we can to make that possible for our employees. Win as a Team is part of our culture, and we aspire to wow our customers. We stay hungry and humble and focus on creating belonging. Sustainability, inclusivity, and diversity are key focus areas within our business framework so that we have transparency, equity, and accountability to deliver meaningful, measurable change. With our vision and dedication for a better future already underway. Join us on this journey!
In addition to a competitive salary, supportive teams, and a real opportunity to progress in your career with a forward-thinking organisation, we provide resources to help you and your loved ones be well. From benefits plans and programs, to mental health resources that offer coaching and 24/7 support, to family support resources and parental leave programs - we want to help you take care of yourself and your loved ones. Below is a glimpse into even more of our offerings or click here for a full list: ( Along with holidays, we have company-wide designated global well-being days where everyone is off and can spend time doing what matters most.
+ Good working culture to support the balance you need in both work and life.
+ Parental leave programs.
+ Childcare and caregiving benefits.
+ A learning experience platform built using our own technology, to support your learning and development goals as well as a tuition reimbursement program.
+ A global, cross-functional mentoring program.
+ We also have team building activities, various employee belonging groups, volunteering, and community outreach programs.
**Work Personas**
We approach our distributed world of work with flexibility and trust. Work personas (flexible, remote, or required in office) are categories that are assigned to ServiceNow employees depending on the nature of their work and their assigned work location. Learn more here ( . To determine eligibility for a work persona, ServiceNow may confirm the distance between your primary residence and the closest ServiceNow office using a third-party service.
**Equal Opportunity Employer**
ServiceNow is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, creed, religion, sex, sexual orientation, national origin or nationality, ancestry, age, disability, gender identity or expression, marital status, veteran status, or any other category protected by law. In addition, all qualified applicants with arrest or conviction records will be considered for employment in accordance with legal requirements.
**Accommodations**
We strive to create an accessible and inclusive experience for all candidates. If you require a reasonable accommodation to complete any part of the application process, or are unable to use this online application and need an alternative method to apply, please contact for assistance.
**Export Control Regulations**
For positions requiring access to controlled technology subject to export control regulations, including the U.S. Export Administration Regulations (EAR), ServiceNow may be required to obtain export control approval from government authorities for certain individuals. All employment is contingent upon ServiceNow obtaining any export license or other approval that may be required by relevant export control authorities.
From Fortune. ©2025 Fortune Media IP Limited. All rights reserved. Used under license.
Principal Site Reliability Engineer

Posted 27 days ago
Job Viewed
Job Description
OCI Incident Response is the first line of defense for maintaining the high availability of Oracle's cloud. We make customer-impacting events shorter, less frequent, and less impactful by providing large-scale incident management. We are front-and-center in driving down event duration by utilizing our operational experience, knowledge of best practices, and ability to develop tools to automate incident management.
We are looking for a Principal SRE to join our OCI teamThis role is part of a globally distributed team responsible for detecting, triaging, and mitigating OCI service-impacting events as quickly as possible. You will be a part of one of these regional teams and be responsible for minimizing the downtime of OCI services. You will achieve this through delivering excellent major incident management and by operating systems with high scalability, performance, and security that prevent incidents from occurring.
Oracle's Cloud is state-of-the-art and constantly evolving. When it experiences issues, your team will respond within minutes to ensure customer impact is mitigated. This experience will expose you to the inner workings of OCI's systems and organizations. You will interact with and influence leaders from across the Oracle business and will drive broad cross-organization programs meant to iteratively improve OCI-wide service availability. We are an agile team with significant impact. If you want to be a part of a fast-moving team breaking new ground, we would like to speak with you!
Career Level - IC4
**Responsibilities**
Oracle's Cloud is innovative and constantly evolving. When it experiences issues, your team will respond within minutes to ensure customer impact is mitigated. This experience will expose you to the inner workings of OCI's systems and organizations. You will interact with and influence leaders from across the Oracle business and will drive broad cross-organization programs meant to iteratively improve OCI-wide service availability. We are an agile team with significant impact. If you want to be a part of a fast-moving team breaking new ground, we would like to speak with you!
**Responsibilities**
+ Solve complex problems related to infrastructure cloud services and automate common tasks to enable continuous availability with minimal human overhead
+ Command and coordinate SMEs and Service leaders to restore service as quickly as possible during Major Incidents while keeping accurate and timely data on the progress of such incidents
+ Utilize a deep understanding of cloud computing design patterns and their dependencies to mitigate complex Major Incidents.
+ Embed a methodical approach to troubleshoot large, complex, interconnected systems used in Incident Detection & Orchestration
+ Documents pertinent information relating to Incidents that aids process improvement, identifies deviations and enables the creation of an Incident Knowledge Base
+ Monitors and evaluates high-level service and infrastructure dashboards and takes action to address identified anomalies
+ Identifies opportunities and takes ownership for automation and/or continuous improvement of Incident Management process steps and best practices
+ Can define and document technical architecture of large-scale distributed systems.
+ Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services.
+ Responsible for the design and delivery of the mission critical stack, with focus on security, resiliency, scale, and performance.
+ Partner with development teams in defining operational requirements for product roadmaps.
+ Articulate technical characteristics of services and technology areas and guide Development Teams to engineer and add premier capabilities to the Oracle Cloud service portfolio.
+ Act as ultimate escalation point for complex or critical issues that have not yet been documented as Standard Operating Procedures (SOPs).
**Minimum Qualifications**
+ Bachelor's degree or higher in Computer Science or relevant work experience.
+ 5+ years experience in Site Reliability Engineering, DevOps or System Engineering.
+ Must have public cloud operations experience (e.g., AWS, Azure, GCP, OCI).
+ Extensive experience with Major Incident Management in a cloud-based environment.
+ Demonstrate clear understanding of automation and orchestration principles.
+ Experience having worked in at least one modern object-oriented programming language.
+ Experience with professional software engineering standard methodologies such as Agile project management, coding standards, code reviews, source control management, build processes, testing, and operations.
+ Familiarity with infrastructure automation tools such as Chef, Ansible, Jenkins, Terraform
+ Excellent expertise with several of following technologies: Infrastructure-as-a-Service, CI/CD systems, Docker, RESTful APIs, log analysis tools, debugging tools
**Preferred Qualifications**
+ Strong leadership, project planning, communication, and execution skills
+ Strong analytic and problem-solving skills.
+ Proven track record of leading high blast-radius Major Incidents in cloud-based platforms.
+ Strong leadership, project planning, communication, and execution skills
+ Ability to handle multiple competing priorities in a fast-paced environment.
+ Ability to communicate clearly with technical and non-technical stakeholders at all levels.
+ Confidence to drive and manage large conference calls.
+ Experience with distributed service-oriented architectures
Career Level - IC4
**About Us**
As a world leader in cloud solutions, Oracle uses tomorrow's technology to tackle today's challenges. We've partnered with industry-leaders in almost every sector-and continue to thrive after 40+ years of change by operating with integrity.
We know that true innovation starts when everyone is empowered to contribute. That's why we're committed to growing an inclusive workforce that promotes opportunities for all.
Oracle careers open the door to global opportunities where work-life balance flourishes. We offer competitive benefits based on parity and consistency and support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs.
We're committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing or by calling +1 in the United States.
Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans' status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.
Senior Network Reliability Engineer

Posted 27 days ago
Job Viewed
Job Description
**Job Description**
**About Us:**
At Oracle Cloud Infrastructure (OCI), we're building the future of cloud technology for enterprises. As a team of innovative, diverse creators and engineers, we operate with the agility of a startup, but the scale and customer-first mindset of the leading enterprise software company in the world. We thrive on equity, inclusion, and respect for all, and are deeply committed to creating a positive impact in everything we do. Our values shape the way we work, from delivering excellent products to fostering an environment of continuous learning and career growth.
We are looking for passionate and driven professionals to join our dynamic team where autonomy and collaboration are key to delivering outstanding results. Here, you'll have the support and freedom to excel and push the boundaries of what's possible.
You will be part of a fast-paced, innovative team responsible for swiftly responding to network disruptions, identifying root causes, and collaborating with both internal and external stakeholders to restore services. Your work will also focus on automating daily operations, improving workflow efficiency, and optimizing network performance. With OCI's expansive global footprint, you will manage hundreds of thousands of network devices across a mix of dedicated backbone infrastructure, CLoS networks, and the internet.
**Preferred Qualifications:**
+ **Education & Experience** :
+ Experience working in a large-scale **ISP** or **cloud provider** environment, supporting global network infrastructure.
+ Prior experience in a **network operations** role, with a proven track record of handling complex network events.
+ **Technical Skills** :
+ Strong proficiency in **network protocols** and services, including **MPLS, BGP, OSPF, IS-IS, TCP/IP, IPv4/IPv6, DNS, DHCP, VxLAN, and EVPN** .
+ Extensive experience with **network automation** , scripting, and data center design. **Python** is preferred, though expertise in other scripting or compiled languages is a plus.
+ Hands-on experience with **network monitoring and telemetry solutions** , with the ability to leverage these tools to drive improvements in network reliability.
+ Familiarity with **network modeling and programming** , including **YANG, OpenConfig, and NETCONF** .
+ **Problem-Solving and Collaboration** :
+ Ability to apply **engineering principles** to resolve complex network issues, collaborating across teams to deliver effective solutions.
+ Strong **communication skills** , both written and verbal, with the ability to present technical information clearly to both technical and non-technical stakeholders.
+ Demonstrated experience in influencing product roadmap decisions, priorities, and feature development through sound judgment and technical expertise.
Career Level - IC3
**Responsibilities**
**Responsibilities**
**What You'll Do:**
+ **Support and Operate OCI's Global Network:** Design, deploy, and manage large-scale network solutions that power Oracle Cloud Infrastructure (OCI), ensuring reliability and performance at a global scale.
+ **Collaborate and Drive Change:** Use best practices and tools to develop and execute network changes safely. Work closely with cross-functional teams to continuously improve network performance.
+ **Incident Response and Troubleshooting:** Lead break-fix support for network events, provide escalation for complex issues, and perform post-event root cause analysis to prevent future disruptions.
+ **Automation and Efficiency:** Create and maintain scripts to automate routine network tasks, working with business units and teams to streamline operations and increase productivity.
+ **Mentorship and Knowledge Sharing:** Guide and mentor junior engineers, fostering a culture of collaboration, continuous learning, and technical excellence.
+ **Network Monitoring and Performance Analysis:** Collaborate with network monitoring teams to gather telemetry data, build dashboards, and set up alert rules to track network health and performance.
+ **Vendor Collaboration:** Work with network vendors and technical account teams to resolve network issues, qualify new firmware/operating systems, and ensure the network ecosystem's stability.
+ **On-Call Support:** Participate in the on-call rotation to provide after-hours support for critical network events, ensuring that operational excellence is maintained 24/7.
Career Level - IC3
**About Us**
As a world leader in cloud solutions, Oracle uses tomorrow's technology to tackle today's challenges. We've partnered with industry-leaders in almost every sector-and continue to thrive after 40+ years of change by operating with integrity.
We know that true innovation starts when everyone is empowered to contribute. That's why we're committed to growing an inclusive workforce that promotes opportunities for all.
Oracle careers open the door to global opportunities where work-life balance flourishes. We offer competitive benefits based on parity and consistency and support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs.
We're committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing or by calling +1 in the United States.
Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans' status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.
IBM Cloud Site Reliability Engineer
Posted 4 days ago
Job Viewed
Job Description
Software Developers at IBM are the backbone of our strategic initiatives to design, code, test, and provide industry-leading solutions that make the world run today. At IBM, you will use the latest software development tools, techniques and approaches and work with leading minds in the industry to build solutions you can be proud of.
Are you passionate about technology? Do you love building new things? Do you want to develop the future of IBM's Cloud offerings? If you answered YES, then we have the right opportunity for you!
The shift toward the consumption of IT as a service, i.e., the cloud, is one of the most important changes to happen to our industry in decades. At IBM, we are driven to shift our technology to an as-a-service model and to help our clients transform themselves to take full advantage of the cloud. With industry leadership in analytics, security, commerce, and cognitive computing and with unmatched hardware and software design and enterprise reach, no other company is as well positioned to address the full opportunity of cloud computing.
We are looking for a dynamic Site Reliability Engineer to join our Cloud IaaS Operations Team in Dublin, Ireland who is responsive to market needs, to deliver value to our clients in a fast-changing cloud landscape. The SRE team dedicated to ensuring that the IBM Cloud is at the forefront of cloud technology, from data center design, Storage & Network architecture and compute clusters to flexible infrastructure services. We are building IBM's next generation cloud platform to deliver performance and predictability for our customers' most demanding workloads, at global scale and with leadership efficiency, resiliency and security. It is an exciting time, and as a team we are driven by this incredible opportunity to thrill our clients.
This will be a shift rotation position - You will work Sunday to Thursday or Tuesday to Saturday rotation.
**Your role and responsibilities**
In this Site Reliability Engineer role, you will work closely with several Data Centers, the entire Cloud organization and IBM vendors to support, maintain and operationally improve the IBM cloud infrastructure. You will focus on the following key responsibilities:
· Monitor the health of production and test systems 24x7
· Ability to respond promptly to production issues and alerts 24x7
· Execute changes in the production environment through automation
· Partner with other SRE teams and program managers to deliver mission-critical services to the market
· Manage major incidents and control the path to resolving outages as quick as possible.
· Support development of new and existing capabilities for our compute, storage and network infrastructure services
· Implement and automate infrastructure solutions that support IBM Cloud products and infrastructure
· Support the compliance and security integrity of the environment
· Work with Engineering to:
o Provide initial assessment and possible workaround of production issue
o Troubleshoot and resolve production issues
· Work with Support and Development teams to:
o Identify and resolve issues
o Discuss and plan integration tasks
· Provide technical escalation support for other Infrastructure Operations teams
**Required technical and professional expertise**
· Excellent written and verbal communication skills
· Experience in hands-on production administration of large systems and environment
· Experience establishing and improving procedures within a mission critical environment
· Must be efficient in writing and debugging scripts
· Must be extremely comfortable using and navigating within a Linux environment
· Ability to do low level debugging and problem analysis by examining logs and running Unix commands
· 2+ years of experience in Monitoring Technologies, Virtualization Technologies and Automation / Configuration Managements
o Monitoring technologies: Zabbix (preferred), Grafana, Nagios, ELK, Splunk, etc. (at least one)
o Virtualization technologies: Citrix Xen Hypervisor (Preferred), KVM(also preferred), libvirt, VMware vSphere, etc. (at least one)
o Automation and configuration management tools/solutions: Ansible, Salt, Chef, python, bash, puppet, Rundeck, etc. (at least one)
· Working knowledge with ServiceNow, JIRA, Confluence, and GitHub
· Working knowledge with Container technologies: Kubernetes (preferred), Docker, etc.
**Preferred technical and professional experience**
· Working knowledge & experience with Networking/Storage/Databases in the Cloud
· Go Language experience.
IBM is committed to creating a diverse environment and is proud to be an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender, gender identity or expression, sexual orientation, national origin, caste, genetics, pregnancy, disability, neurodivergence, age, veteran status, or other characteristics. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.
Lead Site Reliability Engineer (AWS)
Posted today
Job Viewed
Job Description
Be The First To Know
About the latest Condition monitoring Jobs in Ireland !
Site Reliability Engineer - Front End
Posted today
Job Viewed
Job Description
E&I Maintenance & Reliability Engineer
Posted 3 days ago
Job Viewed
Job Description
Reliability Engineer II - Cardiac Ablation Solutions
Posted 1 day ago
Job Viewed
Job Description
**A Day in the Life**
**Medtronic**
At Medtronic, we value what makes you unique. Be part of a company that thinks differently to solve problems, make progress, and deliver meaningful innovations.
**Our Purpose**
Cardiac Ablation Solutions (CAS) offers cardiac mapping and ablation solutions to treat patients with abnormal heart rhythms. Our vision is to help patients worldwide by advancing innovation for the diagnosis and ablation of cardiac arrhythmias, enabling clinicians to perform procedures with superior outcomes. Our team support the creation our market-leading portfolio of innovations.
**_Come for a job, stay for a career!_**
**A Day in The Life Of:**
+ Works closely with Research & Development in the development of Test methods to ensure that they are ready for the Test Method Validation (TMV) process.
+ Ensures all TMV's are validated to meet the required TMV procedure and standards.
+ Takes direction from the quality core team member in delivering day to day project deliverables as an extended QCTM.
+ Collaborates with engineering and manufacturing functions to ensure quality standards are in place.
+ Perform systematic reliability analysis against features, requirements, architecture, interfaces, and designs, through the appropriate application of reliability engineering techniques (e.g. fault tree analysis, failure trending and analysis, reliability forecasting, etc.) to understand product and process robustness.
+ Understand risk management concepts used throughout the quality system to successfully meet FDA, ANSI/AAMI/ISO 14971:2019, and ANSI/AAMI/ISO 13485:2016 requirements.
+ You will lead strategies for test method validations, design verification and shelf-life protocols / reports.
**Key Skills & Experience**
+ Requires advanced knowledge of job area combining breadth and depth, typically obtained through advanced education combined with experience.
+ Requires a minimum Level 8 degree in Engineering or other relevant discipline with minimum 2 years' relevant experience.
+ Experience in a highly regulated industry, preferably medical devices.
+ Experience with solving complex issues by interacting with cross functional groups.
+ Proven ability to operate in a matrix organization and navigate complex business systems, regulations, standards, and performance requirements.
+ Knowledge of reliability tools and practices that effectively support requirements, design, integration and verification, and validation.
+ Demonstrated critical thinking skills with focus on improved system performance outcomes and positive business impact.
+ Excellent communication and ability to influence is critical to the role.
**Medtronic offer a competitive salary and flexible Benefits Package**
**#IJ**
**Physical Job Requirements**
The above statements are intended to describe the general nature and level of work being performed by employees assigned to this position, but they are not an exhaustive list of all the required responsibilities and skills of this position.
**Benefits & Compensation**
**Medtronic offers a competitive Salary and flexible Benefits Package**
A commitment to our employees lives at the core of our values. We recognize their contributions. They share in the success they help to create. We offer a wide range of benefits, resources, and competitive compensation plans designed to support you at every career and life stage.
This position is eligible for a short-term incentive called the Medtronic Incentive Plan (MIP).
**About Medtronic**
We lead global healthcare technology and boldly attack the most challenging health problems facing humanity by searching out and finding solutions.
Our Mission - to alleviate pain, restore health, and extend life - unites a global team of 95,000+ passionate people.
We are engineers at heart- putting ambitious ideas to work to generate real solutions for real people. From the R&D lab, to the factory floor, to the conference room, every one of us experiments, creates, builds, improves and solves. We have the talent, diverse perspectives, and guts to engineer the extraordinary.
Learn more about our business, mission, and our commitment to diversity here ( lead global healthcare technology and boldly attack the most challenging health problems facing humanity by searching out and finding solutions.
Our Mission - to alleviate pain, restore health, and extend life - unites a global team of 95,000+ passionate people.
We are engineers at heart- putting ambitious ideas to work to generate real solutions for real people. From the R&D lab, to the factory floor, to the conference room, every one of us experiments, creates, builds, improves and solves. We have the talent, diverse perspectives, and guts to engineer the extraordinary.
**We change lives** . Each team member, each day, helps to improve and redefine how the world treats the most pressing health conditions, from heart disease to diabetes. Our industry leadership comes from the passion and ingenuity of our people. That's who we are. Working alongside one another, we use science, medicine, and a profound understanding of the human body to build extraordinary technologies that can transform lives.
**We build extraordinary solutions as one team** . With one Medtronic Mindset defining how we work. Speed and decisiveness run through our DNA. Diverse perspectives inspire our bold answers to any challenge that comes our way. And we deliver results the right way, breakthrough after patient breakthrough.
**This life-changing career is yours to engineer** . By bringing your ambitious ideas, unique perspective and contributions, you will.
+ **Build** a better future, amplifying your impact on the causes that matter to you and the world
+ **Grow** a career reflective of your passion and abilities
+ **Connect** to a dynamic and inclusive culture that welcomes the challenge of life-long learning
These commitments set our team apart from the rest:
**Experiences that put people first** . Respect for people is the hallmark of our humanity. It fuels our team to positively impact even a single life. And it means we put our people first at Medtronic as well, creating a culture of belonging and always pushing to get you the career-building resources you need.
**Life-transforming technologies** . No matter your role, you contribute to technologies that transform lives. What we build empowers patients to live life on their terms.
**Better outcomes for our world** . Here, it's about more than the bottom line. Our Mission to improve human welfare drives us. We advance healthcare, society, and equity with every design, inside and outside our walls.
**Insight-driven care** . Fresh viewpoints. Cutting-edge AI, data, and automation. You're shaping the future of healthcare technology and defining the next generation of breakthroughs in care
It is the policy of Medtronic to provide equal employment opportunity (EEO) to all persons regardless of age, color, national origin, citizenship status, physical or mental disability, race, religion, creed, gender, sex, sexual orientation, gender identity and/or expression, genetic information, marital status, status with regard to public assistance, veteran status, or any other characteristic protected by federal, state or local law. In addition, Medtronic will provide reasonable accommodations for qualified individuals with disabilities.
For sales reps and other patient facing field employees, going into a healthcare setting is considered an essential function of the job and we expect our employees to comply with all credentialing requirements at the hospitals or clinics they support.
This employer participates in the federal E-Verify program to confirm the identity and employment authorization of all newly hired employees. For further information about the E-Verify program, please click here ( .
For updates on job applications, please go to the candidate login page and sign in to check your application status.
If you need assistance completing your application please email
To request removal of your personal information from our systems please email