Why American Express?
There’s a difference between having a job and making a difference.
American Express has been making a difference in people’s lives for over 160 years, backing them in moments big and small, granting access, tools, and resources to take on their biggest challenges and reap the greatest rewards.
We’ve also made a difference in the lives of our people, providing a culture of learning and collaboration, and helping them with what they need to succeed and thrive. We have their backs as they grow their skills, conquer new challenges, or even take time to spend with their family or community. And when they’re ready to take on a new career path, we’re right there with them, giving them the guidance and momentum into the best future they envision.
Because we believe that the best way to back our customers is to back our people.
The powerful backing of American Express.
Don’t make a difference without it.
Don’t live life without it.
Purpose of the Role:
We're looking for a Site Reliability Engineers with 2-6 years of experience and responsible for Java/J2EE based web application performance, availability and reliability hosted on Cloud and Linux platform.
Role will drive the DevOps mindset which strives to use software engineering to build and run better production systems. You will write software to optimize day to day work through better automation, monitoring, alerting testing and deployment.
You'll be expected to work with several Technology partners to identify areas of opportunity within the availability platform and build a solution to automate monitoring solutions for the next generation platform, technology and constant innovations to drive efficiencies. You will be responsible for implementing tracing, monitoring, tooling solutions to maximize the performance and availability of our Web applications.
o Provide production support and respond to production incidents and be the first line of defense for the development team in analyzing the outages, driving the RCAs and subsequently bringing the required product enhancements from the RCAs.
o Introduce new and impactful technologies to the production support tool chain to help minimize friction for production releases and that results in quick diagnosis and recovery from production incidents.
o Facilitate the resolutions of non-application issues (3rd party upstream issues, infrastructure issues, storage, database, network, file transfer, network certificate etc.)
o Build monitoring and alerting tools to help SRE and Operations teams to quickly pinpoint, isolate and resolve issues related to infrastructure, platform services and applications.
o Ability to collaborate with high-performing teams and individuals throughout the firm to accomplish common goals. Provide consultation and strategic recommendations by quickly assessing and remediating complex availability issues.
o Strong analysis, research, investigation and evaluation skills, with a structured approach to problem solving.
o Ability to work and effectively prioritize in a highly dynamic work environment that includes a global focus and focus on improvements in Automations, Logging and Monitoring.
o Drive monitoring requirements to ensure business-service level visibility for all support teams
o Produce weekly, monthly and quarterly uptime and status reports for production and critical internal infrastructure and application.
o To be able to drive Incident Management, Change Management, Problem Management.
o Experience in design and development of SRE capabilities such as self-healing, Advance predictive analytics and Monitoring.
o Site Reliability Engineers are often required to take part in on-call rotation and provide coverage on IST timings .
o Prior experience on developing and maintenance of application build on Java/J2EE and experience supporting multi-tier web application architectures. Well versed with Java concepts and MVC framework etc.
o Good understanding on Database and related concepts preferred on DB2 and Oracle.
o Proficient and in-depth knowledge with one or more real time analytics and monitoring solutions – Splunk, ELK, Prometheus, Grafana, Dynatrace etc.
o Proficient in Software Engineering concepts, Unix shell scripting, Python, Java and other programming languages.
o Broad Technical field exposure, with preference to following skills: Cloud Infrastructure, TCP/IP,HTTP,DNS, VM, Load Balancer, Docker, Kubernetes, JVM’s, Web servers, App server , Caching technologies, databases, routing and switching etc.
2 - 7 Yrs
Bonus points if you:
o Knowledge of configuration management systems such as Puppet, Chef or Ansible.
o Knowledge of Kafka, Redis, APM tools, Couchbase, Node JS
o Set The Agenda: Define What Winning Looks Like, Put Enterprise Thinking First, Lead with an External Perspective.
o Bring Others With You: Build the Best Team, Seek & Provide Coaching Feedback, Make Collaboration Essential.
o Do It The Right Way: Communicate Frequently, Candidly & Clearly, Make Decisions Quickly & Effectively, Live the Blue Box Values, Great Leadership Demands Courage.
o Familiarity with financial services and authorizations systems is a plus.
o Understanding of using Agile Practices in Operations teams.
A BE/B-Tech/MCA in Computer Science, Computer Engineering, other Technical discipline, or equivalent work experience.
Schedule (Full-Time/Part-Time): Full-time
Date Posted: Jan 7, 2020, 6:08:42 AM