POSITION SUMMARY STATEMENT:
The Principal II Site Reliability Engineer acts as a technical expert applying engineering techniques to automate manual repeatable operational work, partnering with Application Development and Infrastructure Teams to architect and operate reliable, scalable, and performant software/services.
DETAILED RESPONSIBILITIES/DUTIES:
Level II
• Partner with application developers and solution architects to ensure services are built for scale and performance.
• Lead setting service-level objectives, agreements and indicators (SLOs, SLAs and SLIs) for the underlying service by collaborating with Application Development, Product and Business Owners
• Design, Develop and create Scripts/Software/Tools that will improve the reliability of systems in Production including fixing issues, responding to incidents and taking on-call responsibilities.
• Improve the overall resilience of a system and provide visibility to the health and performance of services across all applications and infrastructure
• Improve service performance metrics like latency, page load speed and ETL and help proactively identify performance issues across the system
• Implement monitoring solutions, create Dashboards and Alerts based on four golden signals of SRE providing single source to determine the overall performance and availability of the services they support.
• Writing, updating, and using documentation, including runbooks/playbooks
• Automating work including infrastructure needs, testing, failover solutions, failure mitigation, and much more
• Using Chaos Engineering to test what you build under real-world conditions
• Spread information across DevOps and business teams – encouraging a blameless culture focused on workflow visibility and collaboration
• Root-cause analysis complex problems involving multiple parties, networks, hardware, and software that relate to scaling and performance.
• Services as technical owner to ensures delivery for SRE initiative
• Performs deliverable reviews and coaches' team in area of expertise in SRE
• Provide continuous competitive and best-practices research, leverage industry resources and market trends, and liaise with internal stakeholders.
• Escalates risks and resolves issues to enable team delivery
• Helps to foster a fun, collaborative and supportive culture in which we are able to make career defining work.
• Ensures team delivers high quality, accurate, viable, and reliable products
QUALIFICATIONS:
Skills:
Required
• Experience working with Linux & Windows OS along with Scripting experience using PowerShell, Python, Linux/Unix Shell Scripting
• Experience with Monitoring and Logging Tools – Splunk, Dynatrace, Azure Monitoring, Datadog, Prometheus with Grafana
• Experience working with DevOps Automation tools - Azure DevOps, GitHub, GitHub Actions, SonarQube, Artifactory, Google Cloud Build, Cloud Deploy, Argo CD/Flux
• Experience with Public Cloud Platforms – Azure, GCP
• Experience with Docker, Kubernetes (AKS, GKE), Helm, Service Mesh
• Experience with Google Anthos, Apigee, Confluent Kafka, MongoDB, SQL and Oracle Databases
• Experience with Microservices Architectures
• Experience with Infrastructure as Code automation tools - Terraform, Ansible
• An understanding of programming languages such as C#, Ruby, Perl, Java, Go, Python and PHP
• Excellent written and verbal communication skills
• Ability to communicate effectively to technical and executive audiences
• Company renowned for technical expertise in one area within DevOps/SRE
• Provides SME support in area of expertise
• Creative problem solving and innovation
• Provide technical leadership and vision
Certificates / Training (Two or more):
• Azure / Google Cloud Certifications
• AZ-400: Designing and Implementing Microsoft DevOps Solutions
• Google Cloud Professional Cloud DevOps Engineer
• Certified Kubernetes Administrator (CKA) / Certified Kubernetes Application Developer (CKAD)
Preferred
• Good understanding of Application Security Architectures and Guidance
• Knowledge of threat modelling and risk assessment techniques
• Knowledge of cybersecurity threats, current best practices and latest software
• Experience in configuration of Web Application Firewall Rules using Akamai
Experience:
Level II
• 7 + years experience in DevOps/Site Reliability Engineering with deep expertise in one area
Education:
Required
• Bachelor's in Computer Science or equivalent combination of experience may be considered in lieu of education.
Preferred
• Advanced Technical Degree
Principles & Related Competencies:
Ethical
• Complies with policies and procedures; Takes the high road and upholds our values; Maintains confidentiality; Acts with integrity, honesty and respect.
Leader
• Communicates the big picture whether remotely or in-person, connecting the dots globally and overcoming obstacles; Gives and receives frequent feedback, learns, teaches, encourages information sharing and cooperation among teams; Celebrates the individual and the team; Ability to clearly communicate.
Collaborative
• Communicates the big picture whether remotely or in-person, connecting the dots globally and overcoming obstacles; Gives and receives frequent feedback, learns, teaches, encourages information sharing and cooperation among teams; Celebrates the individual and the team; Ability to clearly communicate.
Looks Beyond Oneself
• (Team Leader) Demonstrates humility through servant leadership by thinking about what can I do as a leader to help you achieve your goals; Develops a vision (strategy) and sets goals and targets, fostering an environment which encourages achievement; Inspires and influences people to work together cohesively and enthusiastically engages with them; Welcomes a diversity of backgrounds and ideas; Values Distributors and teammates.
Drives Innovation
• Add value through: Driving opportunities for all 3 types of innovation (incremental, evolutionary or disruptive); Proposing ideas and creative solutions to employee, distributor and/or customer challenges; Celebrating and learning from failures and successes, willing to experiment and take educated risks making decisions based on facts & data; Welcoming other’s ideas and suggestions and acting on them.
Delivers Change
• Delivers Change Through: Experiencing and leading change; Understanding Herbalife Nutrition’s business; Creating a sense of urgency for delivering business benefits; Flexibility and openness to change.
Software Powered by iCIMS
www.icims.com