Senior Engineer - Cloud Operations (Docker, Kubernetes, Openstack, VMware, Prometheus or Thanos)

Get Referred

Job Description

You Lead the Way. We’ve Got Your Back.


At American Express, we know that with the right backing, people and businesses have the power to progress in incredible ways.  Whether we’re supporting our customers’ financial confidence to move ahead, taking commerce to new heights, or encouraging people to explore the world, our colleagues are constantly redefining what’s possible - and we’re proud to back each other every step of the way. When you join #TeamAmex, you become part of a diverse community of over 60,000 colleagues, all with a common goal to deliver an exceptional customer experience every day.

Job Description for CTAM Engineer:


American Express has embarked on an exciting Cloud, Big Data and Mobile transformation driven by an energetic new team of high performers. You will contribute to a rock star start-up engineering team called CTAM (cloud telemetry, alerting, and monitoring) in building the next generation of alerting and self-healing system.


Ever wondered what it takes to build a highly available, global scale enterprise wide private PaaS/IaaS cloud platform with an Open Source technology stack and achieve up-times SLA of Amazon, Google and Then you should consider this innovative and disruptive opportunity where you can be a key transformative contributor to a team which will ensure stability of the next generation enterprise application platform (PaaS/IaaS) for American Express.


The goal of the team is to minimize incidents in both the quantity and duration, minimize impact, and prevent incidents from occurring. This team will be delivering the solution that will ensure the Cloud Platform is reliable and timely as well as providing solutions for the users of the Cloud Platform. You will be involved in creating a solution for both internal use as well as for the customers. As we are transitioning into supporting a hybrid cloud, this team will be critical in ensuring reliability and timeliness of the platform.


You will be supporting a variety of technologies in a highly available platform-as-a-service (PaaS/IaaS) which is implemented using technologies such as OpenStack and OpenShift. Also, you will be supporting Kubernetes, Docker, Redis, Spark, Storm, and numerous other technologies and solutions. You will be working with, and utilizing, a variety of programming languages and tools, all while contributing to the Monitoring, Alerting and Self-Healing solution.



Responsibilities Include:


· Owns technical aspects of software development, focused on alerting, monitoring and recovery 

· Performs hands-on architecture, design and development of systems 

· Ability to understand systems and architectures to quickly identify potential problems 

· Assists in implementing a solution for collecting millions of metrics and alerting in near real-time, focused on ease-of-use and extensibility, across multiple cloud environments 

· Involved in predicting alerting 

· Identifies opportunities to adopt innovative technologies 

· Provides continuous support for ongoing application availability 

· Works closely with product owners on feature sets that impact multiple platforms and products and ensures proper monitoring and metrics are available during design 

· Understands current incidents and provides solutions to detect, recover, and prevent reoccurrence


Minimum Qualifications

Bachelor's degree or master’s degree in Computer Science, Computer Engineering or equivalent work experience 

· 7+ years of industry experience with 4+ years of software development experience in one OO programming language: Java, Python, Go, Node. Js 

· Experience in using Grafana, Thanos, and Ansible preferred.

· Demonstrated support of production systems at scale, with experience in detecting issues, root cause analysis, and prevention of incidents 

· Ability to work with Infrastructures and Platforms including IaaS, PaaS, Cloud technologies and tools for Continuous Delivery (CD)  

· In depth understanding of Linux functionalities / features as well as good experience of shell scripting   

· Good understanding of Container & Orchestration Technologies such as Docker, Rocket, CloudFoundry, Kubernetes, and OpenShift 

· Experience in public clouds such as AWS, GCP, Azure 

· Ability to effectively interpret technical and business objectives and challenges and articulate solutions 

· Experience in Timeseries databases such as Graphite, Prometheus, Influx DB 

· Outstanding written and verbal communication skills


Employment eligibility to work with American Express in the U.S. is required as the company will not pursue visa sponsorship for these positions.


American Express is an equal opportunity employer and makes employment decisions without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, protected veteran status, disability status, age, or any other status protected by law.


ReqID: 21006776
Schedule (Full-Time/Part-Time): Full-time
Date Posted: May 4, 2021, 1:44:37 PM