Principal II, SRE - India

Job Locations IN-KA-Bangalore
ID
2024-14981
Category
Global Technology Services
Position Type
Regular Full-Time

Overview

POSITION SUMMARY STATEMENT:  

The Principal II Site Reliability Engineer acts as a technical expert applying engineering techniques to automate manual repeatable operational work, partnering with Application Development and Infrastructure Teams to architect and operate reliable, scalable, and performant software/services.


DETAILED RESPONSIBILITIES/DUTIES:  

Level II
•   Partner with application developers and solution architects to ensure services are built for scale and performance.
•   Lead setting service-level objectives, agreements and indicators (SLOs, SLAs and SLIs) for the underlying service by collaborating with Application Development, Product and Business Owners
•   Design, Develop and create Scripts/Software/Tools that will improve the reliability of systems in Production including fixing issues, responding to incidents and taking on-call responsibilities.
•   Improve the overall resilience of a system and provide visibility to the health and performance of services across all applications and infrastructure
•   Improve service performance metrics like latency, page load speed and ETL and help proactively identify performance issues across the system
•   Implement monitoring solutions, create Dashboards and Alerts based on four golden signals of SRE providing single source to determine the overall performance and availability of the services they support.
•   Writing, updating, and using documentation, including runbooks/playbooks
•   Automating work including infrastructure needs, testing, failover solutions, failure mitigation, and much more
•   Using Chaos Engineering to test what you build under real-world conditions
•   Spread information across DevOps and business teams – encouraging a blameless culture focused on workflow visibility and collaboration
•   Root-cause analysis complex problems involving multiple parties, networks, hardware, and software that relate to scaling and performance.
•   Services as technical owner to ensures delivery for SRE initiative
•   Performs deliverable reviews and coaches' team in area of expertise in SRE
•   Provide continuous competitive and best-practices research, leverage industry resources and market trends, and liaise with internal stakeholders.
•   Escalates risks and resolves issues to enable team delivery 
•   Helps to foster a fun, collaborative and supportive culture in which we are able to make career defining work.
•   Ensures team delivers high quality, accurate, viable, and reliable products

QUALIFICATIONS:

Skills:

Required

•   Experience working with Linux & Windows OS along with Scripting experience using PowerShell, Python, Linux/Unix Shell Scripting
•   Experience with Monitoring and Logging Tools – Splunk, Dynatrace, Azure Monitoring, Datadog, Prometheus with Grafana
•   Experience working with DevOps Automation tools - Azure DevOps, GitHub, GitHub Actions, SonarQube, Artifactory, Google Cloud Build, Cloud Deploy, Argo CD/Flux
•   Experience with Public Cloud Platforms – Azure, GCP
•   Experience with Docker, Kubernetes (AKS, GKE), Helm, Service Mesh
•   Experience with Google Anthos, Apigee, Confluent Kafka, MongoDB, SQL and Oracle Databases
•   Experience with Microservices Architectures
•   Experience with Infrastructure as Code automation tools - Terraform, Ansible
•   An understanding of programming languages such as C#, Ruby, Perl, Java, Go, Python and PHP
•   Excellent written and verbal communication skills
•   Ability to communicate effectively to technical and executive audiences
•   Company renowned for technical expertise in one area within DevOps/SRE 
•   Provides SME support in area of expertise
•   Creative problem solving and innovation
•   Provide technical leadership and vision

Certificates / Training (Two or more):
•   Azure / Google Cloud Certifications
•   AZ-400: Designing and Implementing Microsoft DevOps Solutions
•   Google Cloud Professional Cloud DevOps Engineer
•   Certified Kubernetes Administrator (CKA) / Certified Kubernetes Application Developer (CKAD)


Preferred 
•   Good understanding of Application Security Architectures and Guidance
•   Knowledge of threat modelling and risk assessment techniques
•   Knowledge of cybersecurity threats, current best practices and latest software
•   Experience in configuration of Web Application Firewall Rules using Akamai


Experience:

Level II
•   7 + years experience in DevOps/Site Reliability Engineering with deep expertise in one area


Education:

Required
•   Bachelor's in Computer Science or equivalent combination of experience may be considered in lieu of education. 

Preferred
•   Advanced Technical Degree


Principles & Related Competencies:

Ethical
•   Complies with policies and procedures; Takes the high road and upholds our values; Maintains confidentiality; Acts with integrity, honesty and respect.

Leader
•   Communicates the big picture whether remotely or in-person, connecting the dots globally and overcoming obstacles; Gives and receives frequent feedback, learns, teaches, encourages information sharing and cooperation among teams; Celebrates the individual and the team; Ability to clearly communicate.

Collaborative
•   Communicates the big picture whether remotely or in-person, connecting the dots globally and overcoming obstacles; Gives and receives frequent feedback, learns, teaches, encourages information sharing and cooperation among teams; Celebrates the individual and the team; Ability to clearly communicate.

Looks Beyond Oneself
•   (Team Leader) Demonstrates humility through servant leadership by thinking about what can I do as a leader to help you achieve your goals; Develops a vision (strategy) and sets goals and targets, fostering an environment which encourages achievement; Inspires and influences people to work together cohesively and enthusiastically engages with them; Welcomes a diversity of backgrounds and ideas; Values Distributors and teammates.

Drives Innovation
•   Add value through: Driving opportunities for all 3 types of innovation (incremental, evolutionary or disruptive); Proposing ideas and creative solutions to employee, distributor and/or customer challenges; Celebrating and learning from failures and successes, willing to experiment and take educated risks making decisions based on facts & data; Welcoming other’s ideas and suggestions and acting on them.

Delivers Change
•   Delivers Change Through: Experiencing and leading change; Understanding Herbalife Nutrition’s business; Creating a sense of urgency for delivering business benefits; Flexibility and openness to change.

Options

Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.
Share on your newsfeed

Need help finding the right job?

We can recommend jobs specifically for you! Click here to get started.