American Express Careers

Infrastructure Engineer – Logging & Monitoring

Bangalore, India
Digital Commerce Technology

Apply Get Referred

Job Description



Looking to invest your time and energy into innovative engineering projects for a global IT services organization, then join the Enterprise Monitoring, Tooling and Engineering team at American Express.  Be a part of the team responsible for introducing and supporting technology that helps improve availability, performance and efficiency of American Express’ IT operations.


EMTE seeks an Infrastructure Engineer with the ideas, knowledge, and strengths to help us deliver a world-class monitoring platform. This individual will contribute toward EMTE’s efforts to raise the bar of operational excellence and best practices using Application Performance Management (APM), Time Series Metrics, Logging and Automation tools. This Infrastructure Engineer will align all designs with American Express’ architectural enterprise standards and promote the adoption of monitoring/automation best practices. Success for this individual’s performance and outcomes will be measured, in part, on the engineer’s ability to:







            design, produce, support and continuously improve EMTE’s monitoring tools

            increase the operational stability and efficiency of EMTE’s monitoring platforms


            create greater visibility of AET performance and availability

            collaborate with team members, technology partners and other stakeholders to create innovative solutions that achieve personal goals and those set by organizational leaders and the team.





As a Logging & Monitoring Engineer you will:





















            Deploy, support and improve Logging & Monitoring tool usage and adoption (e.g., Dynatrace, Splunk, Elastic, Time Series, etc.)

            Create materials to assist stakeholders in the use of Logging & Monitoring tools and best practices


            Assist Logging & Monitoring users to analyze application performance and availability trends and conduct root cause analysis of performance issues

            Monitor, maintain and improve the availability and performance of EMTE’s monitoring tool platforms and service offerings


            Develop, implement and support efforts to “Monitor the Monitor” – creating greater visibility into the system and application health of EMTE monitoring tools; improving stability, alert notifications and related KPIs for MTTx

            Perform administration for EMTE tools/platforms (e.g., APM, Enterprise Logging as a Service, system/node monitoring, Event Correlation, etc.)


            Monitor environment and computing resources for reporting and capacity planning.


            Evaluate changes/updates of EMTE tools to determine potential impacts of production systems and coordinate with all appropriate stakeholders as needed


            Assist with the administration/support of other EMTE platforms as necessary


            Be available to provide on-call support for monitoring and automation tools during business hours, nights and weekends




            IT working experience in the areas of Application Performance Management, application monitoring, network administration, system administration, performance engineering / testing, or Java/.NET development

            High technical comfort with Linux administration (Red Hat Enterprise Linux 6 and 7)


            Fundamental knowledge of TCP/IP networking, subnetting and routing concepts, and distributed computing concepts

            Ability to write scripts in one or more languages. (shell, Perl, Python, Ruby, etc…)


            1+ years in software engineering, Object-Oriented Programming (OOP), web programming: JavaScript, AJAX and other JavaScript frameworks

            1+ years of experience on an operations team or NOC


            Experience with application technologies (J2EE, .NET, Citrix, Micro-services)


            Degree in Computer Science, Computer Engineering or Information Technology



Preferred Experience







            Knowledge of Splunk and/or Elastic administration and maintenance.

            Knowledge of Splunk and/or Elastic application development and optimization.


            Experience in monitoring, time series metric data and/or logging tools

(e.g., Dynatrace, AppDynamics, ICINGA, Prometheus, Graphite/Grafana, etc.)


            Working knowledge of application development workflows and Agile methods

            Demonstrated understanding of code-level and container-level performance tuning, troubleshooting, and architecture best practice for enterprise applications


            Comprehension of basic object-oriented fundamentals, web application infrastructure, memory management, garbage collection, and threading

            Inherent and proven problem-solving nature, with an engaged and proactive attitude


            Ability to automate processes using automation tools (e.g., Ansible, Puppet, etc.)

            Experience working in a team/workgroup setting


            Experience working with Scrum or Kanban-related tools and concepts (e.g., Jira, Rally, Epics, Stories, estimating story points, etc.)



Professional and Leadership Qualities for Success







            Must be a highly motivated, energetic self-starter who excels in fast-paced, dynamic, team environments and committed to getting results

            Strong technical acumen, passionate about learning and trying new technology


            Ability to self-direct personal activities to achieve goals and meet commitments

            Strong analytical, logical reasoning


            Ability to solve problems quickly and independently

            Excellent organizational/time management skills, able to manage multiple tasks


            Strong interpersonal skills, strong written/verbal communications skills (i.e., presentations, documentation, emails, reports, etc.)

            Innovates through experimentation, failing fast, and continuous improvement


            Seeks and offers constructive feedback, willing to learn from mistak

ReqID: 19014146
Schedule (Full-Time/Part-Time): Full-time
Date Posted: Jul 29, 2019, 11:17:11 AM
Apply Get Referred