110 Site Reliability jobs in Ireland

Site Reliability Engineer

Cork, Munster Trellix

Posted 27 days ago

Job Viewed

Tap Again To Close

Job Description

**_Job Title:_**
Site Reliability Engineer
**About** **Trellix:**
**Trellix, the trusted CISO ally, is redefining the future of cybersecurity and soulful work.** Our comprehensive, GenAI-powered platform helps organizations confronted by today's most advanced threats gain confidence in the protection and resilience of their operations. Along with an extensive partner ecosystem, we accelerate technology innovation through artificial intelligence, automation, and analytics to empower over 53,000 customers with responsibly architected security solutions.
We also recognize the importance of closing the 4-million-person cybersecurity talent gap. We aim to create a home for anyone seeking a meaningful future in cybersecurity and look for candidates across industries to join us in soulful work. More at .
**_Role Overview:_**
We are seeking a talented SRE Engineer to join the dynamic Trellix EDR Cloud DevOps and SRE team. As a key member of our engineering team, you will be responsible for designing, developing, and deploying robust and scalable software solutions. You will work closely with cross-functional teams (Engineering, DevOps and SRE teams) to deliver innovative solutions that meet our business objectives. You will also be responsible for building and delivering world-class CI/CD pipelines to support an highly scalable and secure cloud environment in addition to supporting cloud service operations, deployments, and security.
**Responsibilities:**
+ **Operations:**
+ **Part of a global team providing operational & escalation coverage including event response and recovery efforts of critical services.**
+ **Periodic deployment of features, patches and hotfixes to maintain the Security posture of our Cloud Services.**
+ **Have ownership and responsibility for high availability of Production environments**
+ **Ability to work in shifts on a rotational basis and participate in On-Call duties**
+ **Assist with creating and updating runbooks & SOPs**
+ **Input into the monitoring of systems applications and supporting data.**
+ **Report on system uptime and availability**
+ **Software Development:**
+ **Design, develop, and maintain secure and high-quality CI/CD pipelines following industry best practices.**
+ **Design, develop and maintain Infrastructure as Code(IaC) platforms for efficient build and management of infrastructure in the public cloud platforms.**
+ **Build, Deploy and manage applications in the public cloud environments using containerization technologies like Docker and Kubernetes.**
+ **Contribute to the design and implementation of microservices architecture, breaking down complex systems into smaller, independent services.**
+ **Possess a basic understanding of dashboard design principles and tools to visualize data effectively.**
+ **Experience using modern Monitoring and Alerting tools (Prometheus, Grafana, PagerDuty, etc.)**
+ **Problem-Solving:**
+ **Troubleshoot and debug complex software issues, leveraging your strong problem-solving skills.**
+ **Conduct thorough root cause analysis, analyze system logs, and collaborate with cross-functional teams to implement robust solutions.**
+ **Collaboration & Learning:**
+ **Work closely with product managers, designers, and other engineers to translate business requirements into technical solutions.**
+ **Work effectively with teams across the organization to deliver projects on time and within budget.**
+ **Stay up-to-date with the latest technologies and industry trends to drive innovation.**
**Qualifications:**
+ **Technical Skills:**
+ **3-4 years of experience in software development, with strong proficiency in Python or Go.**
+ **Experience with cloud platforms (AWS or GCP) and containerization technologies (Docker, Kubernetes) is essential.**
+ **Problem-Solving:** **A keen eye for detail and a knack for troubleshooting complex issues.**
+ **Communication:** **Excellent communication and collaboration skills to work effectively with diverse teams.**
+ **Learning Agility:** **A passion for learning and a drive to stay up-to-date with emerging technologies.**
**Preferred Qualifications:**
+ **Automation:** **Experience with CI/CD pipelines and automation tools (e.g., Jenkins, GitLab CI/CD).**
+ **Certifications:** **AWS and/or Kubernetes Certification**
+ **Database:** **Knowledge of database systems (SQL and NoSQL) for data storage and retrieval.**
+ **Security:** **Understanding of security best practices to build secure applications.**
+ **DevOps:** **Exposure to DevOps principles and practices to streamline development and deployment processes.**
**Education:** **Bachelors in Computer Science or a related field.**
**_Company Benefits and Perks:_**
We believe that the best solutions are developed by teams who embrace each other's unique experiences, skills, and abilities. We work hard to create a dynamic workforce where we encourage everyone to bring their authentic selves to work every day. We offer a variety of social programs, flexible work hours and family-friendly benefits to all of our employees.
+ Retirement Plans
+ Medical, Dental and Vision Coverage
+ Paid Time Off
+ Paid Parental Leave
+ Support for Community Involvement
We're serious about our commitment to a workplace where everyone can thrive and contribute to our industry-leading products and customer support, which is why we prohibit discrimination and harassment based on race, color, religion, gender, national origin, age, disability, veteran status, marital status, pregnancy, gender expression or identity, sexual orientation or any other legally protected status.
This advertiser has chosen not to accept applicants from your region.

Sr Site Reliability Engineer

Kilkenny, Leinster UKG (Ultimate Kronos Group)

Posted 3 days ago

Job Viewed

Tap Again To Close

Job Description

Site Reliability Engineers at UKG are critical team members that have a breadth of knowledge encompassing all aspects of service delivery. They develop software solutions to enhance, harden and support our service delivery processes. This can include building and managing CI/CD deployment pipelines, automated testing, capacity planning, performance analysis, monitoring, alerting, chaos engineering and auto remediation.
Site Reliability Engineers must be passionate about learning and evolving with current technology trends. They strive to innovate and are relentless in pursuing a flawless customer experience. They have an "automate everything" mindset, helping us bring value to our customers by deploying services with incredible speed, consistency, and availability.
**Job Responsibilities:**
+ Engage in and improve the lifecycle of services from conception to EOL, including system designconsulting, and capacity planning
+ Define and implement standards and best practices related to: System Architecture, Servicedelivery, metrics and the automation of operational tasks
+ Support services, product & engineering teams by providing common tooling and frameworks todeliver increased availability and improved incident response
+ Improve system performance, application delivery and efficiency through automation, processrefinement, postmortem reviews, and in-depth configuration analysis
+ Collaborate closely with engineering professionals within the organization to deliver reliableservices
+ Increase operational efficiency, effectiveness, and quality of services by treating operationalchallenges as a software engineering problem (reduce toil)
+ Guide junior team members and serve as a champion for SiteReliability Engineering
+ Actively participate in incident response, including on-call responsibilities
**Required Qualifications:**
+ Must have at least 3 years of hands-on experience working in Engineering or Cloud
+ Minimum 2 years' experience with public cloud platforms (e.g. GCP, AWS, Azure)
+ Minimum 2 years' Experience in configuration and maintenance of applications and/orsystems infrastructure for large scale customer facing company
+ Experience coding in higher-level languages (e.g., Python, JavaScript, C++, or Java)
**Preferred Qualifications:**
+ Knowledge of Cloud based applications & Containerization Technologies
+ Demonstrated understanding of best practices in metric generation and collection, log aggregationpipelines
+ Demonstrable fundamentals in 2 of the following: Computer Science, Cloud architecture, Security,or Network Design fundamentals Demonstrable fundamentals in 2 of the following: Computer Science, Cloud architecture, Security, or Network Design fundamentals
**Where we're going**
UKG is on the cusp of something truly special. Worldwide, we already hold the #1 market share position for workforce management and the #2 position for human capital management. Tens of millions of frontline workers start and end their days with our software, with billions of shifts managed annually through UKG solutions today. Yet it's our AI-powered product portfolio designed to support customers of all sizes, industries, and geographies that will propel us into an even brighter tomorrow!
UKG is proud to be an equal opportunity employer and is committed to promoting diversity and inclusion in the workplace, including the recruitment process.
Disability Accommodation in the Application and Interview Process
For individuals with disabilities that need additional assistance at any point in the application and interview process, please email
NOTICE ON HIRING SCAMS
UKG will never ask you for a copy of your driver's license, social security card, or passport during a job interview. For new hires, we do not ask for payment for equipment purchase, cost for training, or to receive onboarding documents. UKG does not make job offers outside of our formal hiring process. To help protect yourself against potential hiring scams, learn more about our formal hiring process, outlined here ( .
ABOUT OUR JOB DESCRIPTIONS
All job descriptions are written to accurately reflect the open job and include general work responsibilities. They do not present a comprehensive, detailed inventory of all duties, responsibilities, and qualifications required for the job. Management reserves the right to revise the job or require that other or different tasks be performed if or when circumstances change.
It is the policy of Ultimate Software to promote and assure equal employment opportunity for all current and prospective Peeps without regard to race, color, religion, sex, age, disability, marital status, familial status, sexual orientation, pregnancy, genetic information, gender identity, gender expression, national origin, ancestry, citizenship status, veteran status, and any other legally protected status entitled to protection under federal, state, or local anti-discrimination laws. This policy governs all matters related to recruitment, advertising, and initial selection of employment. It shall also apply to all other aspects of employment, including, but not limited to, compensation, promotion, demotion, transfer, lay-offs, terminations, leave of absence, and training opportunities.
This advertiser has chosen not to accept applicants from your region.

Staff Site Reliability Engineer

Dublin, Leinster ServiceNow, Inc.

Posted 27 days ago

Job Viewed

Tap Again To Close

Job Description

It all started in sunny San Diego, California in 2004 when a visionary engineer, Fred Luddy, saw the potential to transform how we work. Fast forward to today - ServiceNow stands as a global market leader, bringing innovative AI-enhanced technology to over 8,100 customers, including 85% of the Fortune 500®. Our intelligent cloud-based platform seamlessly connects people, systems, and processes to empower organizations to find smarter, faster, and better ways to work. But this is just the beginning of our journey. Join us as we pursue our purpose to make the world work better for everyone.
This is an exciting opportunity for someone who is passionate about driving innovation, enhancing service reliability, and making a tangible impact on the organization's success.
**What you get to do in this role:**
+ Provide relief and sustainable resolution to issues within our infrastructure.
+ Use your knowledge and experience in software development, systems engineering, and networking to proactively prevent repeatable issues.
+ Lead internal stakeholders and partner teams to improve the reliability, scalability and performance of the infrastructure through improved system design.
+ Champion and contribute to a culture of intolerance to manual activity, which results in an automation environment delivering repeatable and scalable response to system issues.
**To be successful in this role you have:**
+ Experience in leveraging or critically thinking about how to integrate AI into work processes, decision-making, or problem-solving. This may include using AI-powered tools, automating workflows, analyzing AI-driven insights, or exploring AI's potential impact on the function or industry.
+ Excellent Knowledge of Linux systems.
+ Comfortable designing, authoring, testing, and debugging code in a team setting in one of the following languages such as Python, Go, Java, or Ruby.
+ Experience working with systems at scale - supporting critical services with focus on automation, observability, availability, and performance.
+ Experience with MySQL and PostgreSQL database administration, troubleshooting, and performance tuning.
+ Develop and maintain telemetry and monitoring solutions using OpenTelemetry standards to gain deep insights into system behaviour, proactively address issues, optimise performance, and improve efficiency through comprehensive data collection, analysis, and visualisation.
+ Proven experience in defining and managing SLAs.
+ Collaborate with development teams to ensure new services align with architectural standards and best practices.
Good to have:
+ Expertise in Observability and Monitoring of applications, services, and networks at scale.
+ Experience with DevOps automation, CI/CD pipeline and agile methodologies such as Gitlab CI-CD.
+ Experience writing test specifications and understand the fundamentals of test automation.
+ Experience working with Cloud technologies such as Azure and AWS.
+ Experience in configuration management of infrastructure using Ansible.
+ Experience with Kubernetes to orchestrate the deployment, scaling, and management of containers.
+ Hands-on experience with Microsoft Azure, Google Cloud (GCP) and Amazon Web Services (AWS), including designing, implementing, and maintaining reliable and scalable systems.
We also have pluses! They are not a 'must', but please highlight them on your resume if you have any of these: experience with cloud engineering, knowledge of core AI/ML techniques and algorithms, familiar with implementing Chaos engineering principles, experience in incident response process, post-mortem practices, or service best practice standards and web applications engineering.
**What you can expect from us:**
At ServiceNow, we make work better for everyone - including our own employees. We know that your best work happens when you live your best life and share your unique talents, so we do everything we can to make that possible for our employees. Win as a Team is part of our culture, and we aspire to wow our customers. We stay hungry and humble and focus on creating belonging. Sustainability, inclusivity, and diversity are key focus areas within our business framework so that we have transparency, equity, and accountability to deliver meaningful, measurable change. With our vision and dedication for a better future already underway. Join us on this journey!
In addition to a competitive salary, supportive teams, and a real opportunity to progress in your career with a forward-thinking organisation, we provide resources to help you and your loved ones be well. From benefits plans and programs, to mental health resources that offer coaching and 24/7 support, to family support resources and parental leave programs - we want to help you take care of yourself and your loved ones. Below is a glimpse into even more of our offerings or click here for a full list: ( Along with holidays, we have company-wide designated global well-being days where everyone is off and can spend time doing what matters most.
+ Good working culture to support the balance you need in both work and life.
+ Parental leave programs.
+ Childcare and caregiving benefits.
+ A learning experience platform built using our own technology, to support your learning and development goals as well as a tuition reimbursement program.
+ A global, cross-functional mentoring program.
+ We also have team building activities, various employee belonging groups, volunteering, and community outreach programs.
**Work Personas**
We approach our distributed world of work with flexibility and trust. Work personas (flexible, remote, or required in office) are categories that are assigned to ServiceNow employees depending on the nature of their work and their assigned work location. Learn more here ( . To determine eligibility for a work persona, ServiceNow may confirm the distance between your primary residence and the closest ServiceNow office using a third-party service.
**Equal Opportunity Employer**
ServiceNow is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, creed, religion, sex, sexual orientation, national origin or nationality, ancestry, age, disability, gender identity or expression, marital status, veteran status, or any other category protected by law. In addition, all qualified applicants with arrest or conviction records will be considered for employment in accordance with legal requirements.
**Accommodations**
We strive to create an accessible and inclusive experience for all candidates. If you require a reasonable accommodation to complete any part of the application process, or are unable to use this online application and need an alternative method to apply, please contact for assistance.
**Export Control Regulations**
For positions requiring access to controlled technology subject to export control regulations, including the U.S. Export Administration Regulations (EAR), ServiceNow may be required to obtain export control approval from government authorities for certain individuals. All employment is contingent upon ServiceNow obtaining any export license or other approval that may be required by relevant export control authorities.
From Fortune. ©2025 Fortune Media IP Limited. All rights reserved. Used under license.
This advertiser has chosen not to accept applicants from your region.

Principal Site Reliability Engineer

Oracle

Posted 27 days ago

Job Viewed

Tap Again To Close

Job Description

**Job Description**
OCI Incident Response is the first line of defense for maintaining the high availability of Oracle's cloud. We make customer-impacting events shorter, less frequent, and less impactful by providing large-scale incident management. We are front-and-center in driving down event duration by utilizing our operational experience, knowledge of best practices, and ability to develop tools to automate incident management.
We are looking for a Principal SRE to join our OCI teamThis role is part of a globally distributed team responsible for detecting, triaging, and mitigating OCI service-impacting events as quickly as possible. You will be a part of one of these regional teams and be responsible for minimizing the downtime of OCI services. You will achieve this through delivering excellent major incident management and by operating systems with high scalability, performance, and security that prevent incidents from occurring.
Oracle's Cloud is state-of-the-art and constantly evolving. When it experiences issues, your team will respond within minutes to ensure customer impact is mitigated. This experience will expose you to the inner workings of OCI's systems and organizations. You will interact with and influence leaders from across the Oracle business and will drive broad cross-organization programs meant to iteratively improve OCI-wide service availability. We are an agile team with significant impact. If you want to be a part of a fast-moving team breaking new ground, we would like to speak with you!
Career Level - IC4
**Responsibilities**
Oracle's Cloud is innovative and constantly evolving. When it experiences issues, your team will respond within minutes to ensure customer impact is mitigated. This experience will expose you to the inner workings of OCI's systems and organizations. You will interact with and influence leaders from across the Oracle business and will drive broad cross-organization programs meant to iteratively improve OCI-wide service availability. We are an agile team with significant impact. If you want to be a part of a fast-moving team breaking new ground, we would like to speak with you!
**Responsibilities**
+ Solve complex problems related to infrastructure cloud services and automate common tasks to enable continuous availability with minimal human overhead
+ Command and coordinate SMEs and Service leaders to restore service as quickly as possible during Major Incidents while keeping accurate and timely data on the progress of such incidents
+ Utilize a deep understanding of cloud computing design patterns and their dependencies to mitigate complex Major Incidents.
+ Embed a methodical approach to troubleshoot large, complex, interconnected systems used in Incident Detection & Orchestration
+ Documents pertinent information relating to Incidents that aids process improvement, identifies deviations and enables the creation of an Incident Knowledge Base
+ Monitors and evaluates high-level service and infrastructure dashboards and takes action to address identified anomalies
+ Identifies opportunities and takes ownership for automation and/or continuous improvement of Incident Management process steps and best practices
+ Can define and document technical architecture of large-scale distributed systems.
+ Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services.
+ Responsible for the design and delivery of the mission critical stack, with focus on security, resiliency, scale, and performance.
+ Partner with development teams in defining operational requirements for product roadmaps.
+ Articulate technical characteristics of services and technology areas and guide Development Teams to engineer and add premier capabilities to the Oracle Cloud service portfolio.
+ Act as ultimate escalation point for complex or critical issues that have not yet been documented as Standard Operating Procedures (SOPs).
**Minimum Qualifications**
+ Bachelor's degree or higher in Computer Science or relevant work experience.
+ 5+ years experience in Site Reliability Engineering, DevOps or System Engineering.
+ Must have public cloud operations experience (e.g., AWS, Azure, GCP, OCI).
+ Extensive experience with Major Incident Management in a cloud-based environment.
+ Demonstrate clear understanding of automation and orchestration principles.
+ Experience having worked in at least one modern object-oriented programming language.
+ Experience with professional software engineering standard methodologies such as Agile project management, coding standards, code reviews, source control management, build processes, testing, and operations.
+ Familiarity with infrastructure automation tools such as Chef, Ansible, Jenkins, Terraform
+ Excellent expertise with several of following technologies: Infrastructure-as-a-Service, CI/CD systems, Docker, RESTful APIs, log analysis tools, debugging tools
**Preferred Qualifications**
+ Strong leadership, project planning, communication, and execution skills
+ Strong analytic and problem-solving skills.
+ Proven track record of leading high blast-radius Major Incidents in cloud-based platforms.
+ Strong leadership, project planning, communication, and execution skills
+ Ability to handle multiple competing priorities in a fast-paced environment.
+ Ability to communicate clearly with technical and non-technical stakeholders at all levels.
+ Confidence to drive and manage large conference calls.
+ Experience with distributed service-oriented architectures
Career Level - IC4
**About Us**
As a world leader in cloud solutions, Oracle uses tomorrow's technology to tackle today's challenges. We've partnered with industry-leaders in almost every sector-and continue to thrive after 40+ years of change by operating with integrity.
We know that true innovation starts when everyone is empowered to contribute. That's why we're committed to growing an inclusive workforce that promotes opportunities for all.
Oracle careers open the door to global opportunities where work-life balance flourishes. We offer competitive benefits based on parity and consistency and support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs.
We're committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing or by calling +1 in the United States.
Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans' status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.
This advertiser has chosen not to accept applicants from your region.

IBM Cloud Site Reliability Engineer

Mulhuddart, Leinster IBM

Posted 4 days ago

Job Viewed

Tap Again To Close

Job Description

**Introduction**
Software Developers at IBM are the backbone of our strategic initiatives to design, code, test, and provide industry-leading solutions that make the world run today. At IBM, you will use the latest software development tools, techniques and approaches and work with leading minds in the industry to build solutions you can be proud of.
Are you passionate about technology? Do you love building new things? Do you want to develop the future of IBM's Cloud offerings? If you answered YES, then we have the right opportunity for you!
The shift toward the consumption of IT as a service, i.e., the cloud, is one of the most important changes to happen to our industry in decades. At IBM, we are driven to shift our technology to an as-a-service model and to help our clients transform themselves to take full advantage of the cloud. With industry leadership in analytics, security, commerce, and cognitive computing and with unmatched hardware and software design and enterprise reach, no other company is as well positioned to address the full opportunity of cloud computing.
We are looking for a dynamic Site Reliability Engineer to join our Cloud IaaS Operations Team in Dublin, Ireland who is responsive to market needs, to deliver value to our clients in a fast-changing cloud landscape. The SRE team dedicated to ensuring that the IBM Cloud is at the forefront of cloud technology, from data center design, Storage & Network architecture and compute clusters to flexible infrastructure services. We are building IBM's next generation cloud platform to deliver performance and predictability for our customers' most demanding workloads, at global scale and with leadership efficiency, resiliency and security. It is an exciting time, and as a team we are driven by this incredible opportunity to thrill our clients.
This will be a shift rotation position - You will work Sunday to Thursday or Tuesday to Saturday rotation.
**Your role and responsibilities**
In this Site Reliability Engineer role, you will work closely with several Data Centers, the entire Cloud organization and IBM vendors to support, maintain and operationally improve the IBM cloud infrastructure. You will focus on the following key responsibilities:
· Monitor the health of production and test systems 24x7
· Ability to respond promptly to production issues and alerts 24x7
· Execute changes in the production environment through automation
· Partner with other SRE teams and program managers to deliver mission-critical services to the market
· Manage major incidents and control the path to resolving outages as quick as possible.
· Support development of new and existing capabilities for our compute, storage and network infrastructure services
· Implement and automate infrastructure solutions that support IBM Cloud products and infrastructure
· Support the compliance and security integrity of the environment
· Work with Engineering to:
o Provide initial assessment and possible workaround of production issue
o Troubleshoot and resolve production issues
· Work with Support and Development teams to:
o Identify and resolve issues
o Discuss and plan integration tasks
· Provide technical escalation support for other Infrastructure Operations teams
**Required technical and professional expertise**
· Excellent written and verbal communication skills
· Experience in hands-on production administration of large systems and environment
· Experience establishing and improving procedures within a mission critical environment
· Must be efficient in writing and debugging scripts
· Must be extremely comfortable using and navigating within a Linux environment
· Ability to do low level debugging and problem analysis by examining logs and running Unix commands
· 2+ years of experience in Monitoring Technologies, Virtualization Technologies and Automation / Configuration Managements
o Monitoring technologies: Zabbix (preferred), Grafana, Nagios, ELK, Splunk, etc. (at least one)
o Virtualization technologies: Citrix Xen Hypervisor (Preferred), KVM(also preferred), libvirt, VMware vSphere, etc. (at least one)
o Automation and configuration management tools/solutions: Ansible, Salt, Chef, python, bash, puppet, Rundeck, etc. (at least one)
· Working knowledge with ServiceNow, JIRA, Confluence, and GitHub
· Working knowledge with Container technologies: Kubernetes (preferred), Docker, etc.
**Preferred technical and professional experience**
· Working knowledge & experience with Networking/Storage/Databases in the Cloud
· Go Language experience.
IBM is committed to creating a diverse environment and is proud to be an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender, gender identity or expression, sexual orientation, national origin, caste, genetics, pregnancy, disability, neurodivergence, age, veteran status, or other characteristics. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.
This advertiser has chosen not to accept applicants from your region.

Lead Site Reliability Engineer (AWS)

Dublin, Leinster J.P MORGAN S.E Dublin Branch

Posted today

Job Viewed

Tap Again To Close

Job Description

Job Description As a Lead Site Reliability Engineer at JPMorgan Chase in the Commercial & Investment Bank's Digital & Platform Services division, you hold a leadership role in your team, demonstrate strong knowledge across multiple technical domains, and advise others on the technical and business issues facing them. Take lead and conduct resiliency design reviews, break up complex problems into digestible work for other engineers, act as a technical lead for medium to large-sized products, and provide advice and mentoring to other engineers. This role will involve designing, managing and maintaining tools to automate operational processes on AWS. You will also collaborate with team members to identify comprehensive service level indicators and work with stakeholders to establish reasonable service level objectives and error budgets with customers. Job responsibilities Manage incident response to swiftly mitigate business impacts by coordinating cross-functional teams. Serve as the primary point of contact during major incidents, demonstrating the ability to quickly identify and resolve issues to prevent financial losses. Oversee, track, and validate all changes to the Production and Disaster Recovery environments. Automate security controls, governance processes, and compliance validation on AWS. Lead initiatives to enhance the reliability and stability of team applications and platforms, utilizing data-driven analytics to improve service levels. Document and share knowledge within the organization through internal forums and communities of practice. Provide ongoing guidance, tools, and solutions to support the firm's growth. Champion and demonstrate site reliability culture and practices, exerting technical influence throughout the team. Exhibit a high level of technical expertise in one or more domains, proactively identifying and resolving technology-related bottlenecks. Strive to become an expert on the applications and platforms under your purview, understanding their interdependencies and limitations. Required qualifications, capabilities, and skills Formal training or certification on software engineering concepts and proficient advanced experience. Deep proficiency in reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other site reliability best practices with the ability to implement these practices within an application or platform Fluency in at least one programming language such as (e.g., Python, Java Spring Boot, Go, Shell Script, etc.) Deep knowledge of software applications and technical processes with emerging depth in one or more technical disciplines Proficiency and experience in observability such as white and black box monitoring, SLO alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, etc. Proficiency and experience in Cloud Platform (AWS) infrastructure and setting up monitoring / observability for application migrated to cloud platforms. Proficiency in continuous integration and continuous delivery tools (e.g., Jenkins, GitLab, Terraform, etc.) Experience with container and container orchestration (e.g., ECS, Kubernetes, Docker, etc.) Experience with troubleshooting common networking technologies and issues Ability to identify and solve problems related to complex data structures, algorithms and new technologies and if needed self-educate on new technology Ability to expand and collaborate across different levels and stakeholder groups Preferred qualifications, capabilities, and skills Ability to identify new technologies and relevant solutions to ensure design constraints are met by the software team Ability to initiate and implement ideas to solve business problem Experience building dashboards with products such as Grafana Prior experience in both Systems Engineering and Software Development AWS certification as an Architect, DevOps is preferred About Us J.P. Morgan is a global leader in financial services, providing strategic advice and products to the world's most prominent corporations, governments, wealthy individuals and institutional investors. Our first-class business in a first-class way approach to serving clients drives everything we do. We strive to build trusted, long-term partnerships to help our clients achieve their business objectives. We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. We also make reasonable accommodations for applicants' and employees' religious practices and beliefs, as well as mental health or physical disability needs. Visit our FAQs for more information about requesting an accommodation. About the Team J.P. Morgan's Commercial & Investment Bank is a global leader across banking, markets, securities services and payments. Corporations, governments and institutions throughout the world entrust us with their business in more than 100 countries. The Commercial & Investment Bank provides strategic advice, raises capital, manages risk and extends liquidity in markets around the world. To be considered for this role you will be redirected to and must complete the application process on our careers page. To start the process click the Continue to Application or Login/Register to apply button below.
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer - Front End

Dublin, Leinster J.P MORGAN S.E Dublin Branch

Posted today

Job Viewed

Tap Again To Close

Job Description

Job Description At JP Morgan Chase, we understand that customers seek exceptional value and a seamless experience from a trusted financial institution. That's why we launched Chase UK to transform digital banking with intuitive and enjoyable customer journeys. With a strong foundation of trust established by millions of customers in the US, we have been rapidly expanding our presence in the UK and soon across Europe. We have been building the bank of the future from the ground up, offering you the chance to join us and make a significant impact. As a Site Reliability Engineer at JPMorgan Chase within the International Consumer Bank, you are the heart of this venture, focused on getting smart ideas into the hands of our customers. You have a curious mindset, thrive in collaborative squads, and are passionate about new technology. By your nature, you are also solution-oriented, commercially savvy and have a head for fintech. You thrive in working in tribes and squads that focus on specific products and projects - and depending on your strengths and interests, you'll have the opportunity to move between them. While we're looking for professional skills, culture is just as important to us. We understand that everyone's unique - and that diversity of thought, experience and background is what makes a good team, great. By bringing people with different points of view together, we can represent everyone and truly reflect the communities we serve. This way, there's scope for you to make a huge difference - on us as a company, and on our clients and business partners around the world. Job responsibilities Manage the end-to-end process developing UIs for the reliability applications. Shape the design and participate in the implementation of the app Build and ship new features to create a functional and engaging experience that will delight our users Design, review, write and test code - this is hands-on engineering role, and you will be directly involved in all steps of the process Build for resilience and reliability, ensuring consideration for ongoing maintenance requirements Required qualifications, capabilities, and skills Formal training or certification on React concepts and proficient advanced experience A deep understanding of mobile stability and monitoring Strong experience writing clean, testable, high-quality code and designing highly scalable systems in production Solid understanding of API architectural patterns Writing and architecting tests across different levels Familiarity with build pipelines and tools Preferred qualifications, capabilities, and skills Experience in a full TypeScript codebase Practical cloud native experience Front end design #ICBCareers #ICBEngineering About Us J.P. Morgan is a global leader in financial services, providing strategic advice and products to the world's most prominent corporations, governments, wealthy individuals and institutional investors. Our first-class business in a first-class way approach to serving clients drives everything we do. We strive to build trusted, long-term partnerships to help our clients achieve their business objectives. We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. We also make reasonable accommodations for applicants' and employees' religious practices and beliefs, as well as mental health or physical disability needs. Visit our FAQs for more information about requesting an accommodation. About the Team Our professionals in our Corporate Functions cover a diverse range of areas from finance and risk to human resources and marketing. Our corporate teams are an essential part of our company, ensuring that we're setting our businesses, clients, customers and employees up for success. To be considered for this role you will be redirected to and must complete the application process on our careers page. To start the process click the Continue to Application or Login/Register to apply button below.
This advertiser has chosen not to accept applicants from your region.
Be The First To Know

About the latest Site reliability Jobs in Ireland !

Staff Site Reliability Engineer, Infrastructure Security

MongoDB

Posted 27 days ago

Job Viewed

Tap Again To Close

Job Description

MongoDB's mission is to empower innovators to create, transform, and disrupt industries by unleashing the power of software and data. We enable organizations of all sizes to easily build, scale, and run modern applications by helping them modernize legacy workloads, embrace innovation, and unleash AI. Our industry-leading developer data platform, MongoDB Atlas, is the only globally distributed, multi-cloud database and is available in more than 115 regions across AWS, Google Cloud, and Microsoft Azure. Atlas allows customers to build and run applications anywhere-on premises, or across cloud providers. With offices worldwide and over 175,000 new developers signing up to use MongoDB every month, it's no wonder that leading organizations, like Samsung and Toyota, trust MongoDB to build next-generation, AI-powered applications.
We are looking for an experienced Staff Engineer for our SRE, InfraSec team, to guide the security of our cloud-based infrastructure. As a Staff SRE, you will be very hands-on technically while also mentoring a small team of SREs.
The InfraSec team collaborates closely with other engineering teams to ensure that our infrastructure adheres to the highest security standards. They build essential security infrastructure and implement controls that reinforce the platform's security posture.
This is an SRE team, which means you can expect a highly hands-on approach, tackling the technical challenges of implementing large scale solutions.This team is deeply involved in the technical aspects of security and the nuances of its actual implementation.
**Responsibilities:**
Cloud Security Design and Implementation:
+ Help lead the design and deployment of security solutions for cloud platforms (AWS, Azure, GCP), including network and compute security, identity management, and cloud security posture management (CSPM)
Automation and Monitoring:
+ Build automated solutions for real-time security monitoring, logging, and alerting in cloud environments. Leverage native cloud services and third-party tools for runtime security monitoring and anomaly detection
Security Tooling:
+ Evaluate, implement, and manage cloud-native security tools and platforms for endpoint security, identity management (IAM), and CSPM
**Qualifications:**
Experience:
+ 7+ years of experience in SRE, infrastructure engineering or similar role, with a strong focus on security work, with ideally 2+ years in a senior or staff engineering role
Security Mindset:
+ A comprehensive understanding of all facets of cloud environment security, spanning from foundational OS networking layers to cloud provider configurations. Proven experience in leading projects within security-focused areas, such as runtime scanning, security observability, CSPM, and more
Cloud Expertise:
+ Strong experience with at least one cloud platform (AWS, Azure, GCP), including expertise in IAM, VPC networking, security groups, and cloud security tools (e.g., GuardDuty, Security Hub, CloudTrail)
Coding/Automation:
+ Proficiency in at least one programming language (we use Golang but are language agnostic when it comes to hiring ) and experience with infrastructure-as-code tools (Terraform, CloudFormation, Ansible) to automate security configurations and processes
Linux and Networking
+ Understanding of the underlying Linux and networking concepts, including low-level fundamentals, and how they work together in complex systems
Communication and Leadership Skills:
+ Strong ability to explain complex security concepts to both technical and non-technical teams. Ability to lead a small technical team and ensure success both meeting the team goals as well as personal growth for all team members
To drive the personal growth and business impact of our employees, we're committed to developing a supportive and enriching culture for everyone. From employee affinity groups, to fertility assistance and a generous parental leave policy, we value our employees' wellbeing and want to support them along every step of their professional and personal journeys. Learn more about what it's like to work at MongoDB ( , and help us make an impact on the world!
MongoDB is committed to providing any necessary accommodations for individuals with disabilities within our application and interview process. To request an accommodation due to a disability, please inform your recruiter.
MongoDB is an equal opportunities employer.
_Req ID: 1263064630_
This advertiser has chosen not to accept applicants from your region.

IBM Cloud Site Reliability Engineer - HPCS

Mulhuddart, Leinster IBM

Posted 3 days ago

Job Viewed

Tap Again To Close

Job Description

**Introduction**
Introduction
* The IBM Cloud Site Reliability Engineering (SRE) team is working on providing infrastructure and operations solutions to maintain scalable, highly reliable, and highly secure cloud-based software infrastructures to enable our clients to meet their on-demand IT and security needs to disrupt their industries (Financial, Manufacturing, Insurance and more).
* Above all, we are looking for applicants who desire creative freedom and who will thrive in an open, vibrant, flexible, and collaborative environment.
**Your role and responsibilities**
Your Role and Responsibilities
As a Site Reliability Engineer, you will work in an agile, collaborative environment to build, deploy, configure, and maintain systems for the IBM client business. In this role, you will lead the problem resolution process for our clients, from analysis and troubleshooting, to deploying the latest software updates & fixes.
Your primary responsibilities include:
-24x7 Observability: Be part of a worldwide team that monitors the health of production systems and services around the clock, ensuring continuous reliability and optimal customer experience.
-Cross-Functional Troubleshooting: Collaborate with engineering teams to provide initial assessments and possible workarounds for production issues. Troubleshoot and resolve production issues effectively.
-Deployment and Configuration: Leverage Continuous Delivery (CI/CD) tools to deploy services and configuration changes at enterprise scale.
-Security and Compliance Implementation: Implementing security measures that meet or exceed industry standards for regulations such as GDPR, SOC2, ISO 27001, PCI, HIPAA, and FBA.
-Maintenance and Support: Tasks related to applying Couchbase security patches and upgrades, supporting Cassandra and Mongo for pager duty rotation, and collaborating with Couchbase Product support for issue resolution.
This will be a shift rotation position - You will work Sunday to Thursday or Tuesday to Saturday rotation.
**Required technical and professional expertise**
Required Professional and Technical Expertise
* Design, develop, and own different tooling and automation to monitor and improve availability, scalability, latency, and efficiency of highly secure, confidential computing cloud services.
* Deploy and manage infrastructure and services in IBM's Cloud ecosystem.
* During your workday, as part of a global team using a follow-the-sun model, you will handle both real-time alerts as well as customer reported problems.
* Participate in scrums, sprint planning and retrospectives; Be an active member of the team and provide feedback and improvement ideas.
* Work collaboratively with the extended IBM teams, learn new technologies and apply the skills learned.
* Respond with urgency to incidents, perform root cause analysis, and build a knowledge base to enable sharing with other teams.
**Preferred technical and professional experience**
Preferred Professional and Technical Expertise
* Bachelor's Degree in Computer Science or related field
* Experience using Linux, GitHub, Bash, Python, Node.js, Docker, Kubernetes, and Ansibles
* Experience developing tests and reliable automation for common, repeated tasks
* Demonstrated experience with REST APIs and automation
* Proficient in cloud computing and services, specifically logging and monitoring
* Strong debugging, problem determination, and isolation skill
* Effectively communicate with global, cross functional teams and customers
* Team player who can work collaboratively, innovate, and be a quick learner
IBM is committed to creating a diverse environment and is proud to be an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender, gender identity or expression, sexual orientation, national origin, caste, genetics, pregnancy, disability, neurodivergence, age, veteran status, or other characteristics. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.
This advertiser has chosen not to accept applicants from your region.

IBM Cloud Site Reliability Engineer - HPCS

Mulhuddart, Leinster IBM

Posted 8 days ago

Job Viewed

Tap Again To Close

Job Description

**Introduction**
Introduction
* The IBM Cloud Site Reliability Engineering (SRE) team is working on providing infrastructure and operations solutions to maintain scalable, highly reliable, and highly secure cloud-based software infrastructures to enable our clients to meet their on-demand IT and security needs to disrupt their industries (Financial, Manufacturing, Insurance and more).
* Above all, we are looking for applicants who desire creative freedom and who will thrive in an open, vibrant, flexible, and collaborative environment.
**Your role and responsibilities**
As a Site Reliability Engineer, you will work in an agile, collaborative environment to build, deploy, configure, and maintain systems for the IBM client business. In this role, you will lead the problem resolution process for our clients, from analysis and troubleshooting, to deploying the latest software updates & fixes.
Your primary responsibilities include:
-24x7 Observability: Be part of a worldwide team that monitors the health of production systems and services around the clock, ensuring continuous reliability and optimal customer experience.
-Cross-Functional Troubleshooting: Collaborate with engineering teams to provide initial assessments and possible workarounds for production issues. Troubleshoot and resolve production issues effectively.
-Deployment and Configuration: Leverage Continuous Delivery (CI/CD) tools to deploy services and configuration changes at enterprise scale.
-Security and Compliance Implementation: Implementing security measures that meet or exceed industry standards for regulations such as GDPR, SOC2, ISO 27001, PCI, HIPAA, and FBA.
-Maintenance and Support: Tasks related to applying Couchbase security patches and upgrades, supporting Cassandra and Mongo for pager duty rotation, and collaborating with Couchbase Product support for issue resolution.
This will be a shift rotation position - You will work Sunday to Thursday or Tuesday to Saturday rotation.
**Required technical and professional expertise**
Required Professional and Technical Expertise
* Design, develop, and own different tooling and automation to monitor and improve availability, scalability, latency, and efficiency of highly secure, confidential computing cloud services.
* Deploy and manage infrastructure and services in IBM's Cloud ecosystem.
* During your workday, as part of a global team using a follow-the-sun model, you will handle both real-time alerts as well as customer reported problems.
* Participate in scrums, sprint planning and retrospectives; Be an active member of the team and provide feedback and improvement ideas.
* Work collaboratively with the extended IBM teams, learn new technologies and apply the skills learned.
* Respond with urgency to incidents, perform root cause analysis, and build a knowledge base to enable sharing with other teams.
**Preferred technical and professional experience**
Preferred Professional and Technical Expertise
* Bachelor's Degree in Computer Science or related field
* Experience using Linux, GitHub, Bash, Python, Node.js, Docker, Kubernetes, and Ansibles
* Experience developing tests and reliable automation for common, repeated tasks
* Demonstrated experience with REST APIs and automation
* Proficient in cloud computing and services, specifically logging and monitoring
* Strong debugging, problem determination, and isolation skill
* Effectively communicate with global, cross functional teams and customers
* Team player who can work collaboratively, innovate, and be a quick learner
IBM is committed to creating a diverse environment and is proud to be an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender, gender identity or expression, sexual orientation, national origin, caste, genetics, pregnancy, disability, neurodivergence, age, veteran status, or other characteristics. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.
This advertiser has chosen not to accept applicants from your region.
 

Nearby Locations

Other Jobs Near Me

Industry

  1. request_quote Accounting
  2. work Administrative
  3. eco Agriculture Forestry
  4. smart_toy AI & Emerging Technologies
  5. school Apprenticeships & Trainee
  6. apartment Architecture
  7. palette Arts & Entertainment
  8. directions_car Automotive
  9. flight_takeoff Aviation
  10. account_balance Banking & Finance
  11. local_florist Beauty & Wellness
  12. restaurant Catering
  13. volunteer_activism Charity & Voluntary
  14. science Chemical Engineering
  15. child_friendly Childcare
  16. foundation Civil Engineering
  17. clean_hands Cleaning & Sanitation
  18. diversity_3 Community & Social Care
  19. construction Construction
  20. brush Creative & Digital
  21. currency_bitcoin Crypto & Blockchain
  22. support_agent Customer Service & Helpdesk
  23. medical_services Dental
  24. medical_services Driving & Transport
  25. medical_services E Commerce & Social Media
  26. school Education & Teaching
  27. electrical_services Electrical Engineering
  28. bolt Energy
  29. local_mall Fmcg
  30. gavel Government & Non Profit
  31. emoji_events Graduate
  32. health_and_safety Healthcare
  33. beach_access Hospitality & Tourism
  34. groups Human Resources
  35. precision_manufacturing Industrial Engineering
  36. security Information Security
  37. handyman Installation & Maintenance
  38. policy Insurance
  39. code IT & Software
  40. gavel Legal
  41. sports_soccer Leisure & Sports
  42. inventory_2 Logistics & Warehousing
  43. supervisor_account Management
  44. supervisor_account Management Consultancy
  45. supervisor_account Manufacturing & Production
  46. campaign Marketing
  47. build Mechanical Engineering
  48. perm_media Media & PR
  49. local_hospital Medical
  50. local_hospital Military & Public Safety
  51. local_hospital Mining
  52. medical_services Nursing
  53. local_gas_station Oil & Gas
  54. biotech Pharmaceutical
  55. checklist_rtl Project Management
  56. shopping_bag Purchasing
  57. home_work Real Estate
  58. person_search Recruitment Consultancy
  59. store Retail
  60. point_of_sale Sales
  61. science Scientific Research & Development
  62. wifi Telecoms
  63. psychology Therapy
  64. pets Veterinary
View All Site Reliability Jobs