American Express Careers

SRE Engineer

Phoenix, Arizona
Digital Commerce Technology

Apply Get Referred

Job Description

Site Reliability engineering portfolio consists of several mission critical applications for americanexpress.com. Mobile and Web engineering enterprise applications are highly available applications, maintains high (~100%) availability in an extremely high throughput transactional system with strict performance requirements. Site Reliability Engineering team of MWE portfolio works with various Product teams, Staff Architects, Engineering Leaders and Engineering Teams across Mobile and Web engineering platform. Primary focus of the Site Reliability Engineering team is to conceptualize, design, develop and implement frameworks/common components, instrumenting observability tools for enterprise that will ensure high application reliability, scalability, availability and performance of the Mobile and Web applications. Site reliability team is embarking on a transformation journey to implement “Robotics first” approach in Service Delivery and Site Reliability Engineering space.

 

The Sr. Engineer I (Site Reliability Engineer) role is a hands-on Senior Architect Level position supporting American Express' MWE Service Reliability Engineering team. The ideal candidate must have experience in full stack engineering.

 
What you will be doing:
  • Conceptualize and implement Machine Learning driven Site Reliability Engineering Framework/Components to improve predictive monitoring and driving SRE team’s journey towards “Robotics First” approach.
  • Research latest technology, concepts, conceptualize solution and develop proof of concept that will improve resiliency and performance of the production infrastructure. Design and implement innovative solution/framework that will improve software engineering velocity, infrastructure resiliency and security, and data availability.
  • Develop common framework components (to be leveraged by enterprise applications), define standards for configuration, monitoring, reliability and performance engineering.
  • Work with operations team to resolve major incidents.
  • Continuously improve automated remediation tasks to ensure the highest levels of availability.

Qualifications

  • A BS degree in Computer Science, Computer Engineering, other Technical discipline, or equivalent work experience.
  • 3 + years of experience in Python, with emphasis on machine learning.
  • Hands on experience with – Spark, Splunk, Pandas, Numpy, Sci-kit learn.
  • Experience in designing mission critical highly available enterprise applications.
  • Hand on experience with performance testing framework design, tuning Java applications.
  • Experience managing NoSQL databases such as Couchbase and MongoDB.
  • Strong knowledge of Linux internals and experience managing Linux systems in high traffic environments.
  • Strong knowledge of Data Science/Machine Learning, mathematical modeling, and statistics
  • Strong interpersonal communication skills and the ability to work well in a diverse team-focused environment.
 

Employment eligibility to work with American Express in the U.S. is required as the company will not pursue visa sponsorship for these positions.


ReqID: 19015307
Schedule (Full-Time/Part-Time): Full-time
Date Posted: Sep 2, 2019, 3:55:45 PM
Apply Get Referred