Apply Now    

Site Reliability Engineer / Infrastructure Developer

Req #: 180013261
Location: Hyderabad, AP, IN
Job Category: Technology
Job Description:
JPMorgan Chase & Co. (NYSE: JPM) is a leading global financial services firm with assets of $2.6 trillion and operations in more than 60 countries. The firm is a leader in investment banking, financial services for consumers, small business and commercial banking, financial transaction processing, asset management, and private equity.
Global Technology Infrastructure (GTI) is the technology infrastructure organization for the firm, delivering a wide range of products and services, and partnering with all lines of business to provide high quality service delivery, exceptional project execution and financially disciplined approaches and processes in the most cost effective manner.
The objective of GTI is to balance both business alignment and the centralized delivery of core products and services. GTI is designed to address the unique infrastructure needs of specific lines of business and the demand to leverage economies of scale across the firm.
The Global Technology Infrastructure Core Foundation Services team (GTI CFS) is responsible for providing end to end support for critical technologies that are used across the company. This includes Configuration and Orchestration, Identity Management, Name Services, Enterprise Monitoring Solutions, and automation tools used to manage these technologies.
CFS Team is looking for a Site Reliability Engineer to join our growing engineering team and build out the next generation of automation technology for our platform. The ideal candidate is a hands-on-automation specialist with significant experience in automating infrastructure platforms. We are looking for someone who understands the end to end stack of what it takes to run an application:
  • Configuration Management
  • No manual tasks, full automation of all infrastructure components
  • Operational and Production Support
  • Deep understanding of operating systems, web, middleware, and networking
  • Understanding of how to build a highly resilient application
As part of the Configuration Management and Automation Services (CMAS) team you will help support  the next generation of automation frameworks to be adopted by all technology teams that transforms how JP Morgan Chase approaches resiliency, operational efficiencies, and adoption of modern DevOps practices.
In the role of Site Reliability Engineer, you will be responsible for:
  • Automating operational tasks – analyze existing BAU tasks and ensure 80% of BAU tasks are automated through self-healing
  • Automating incident resolution – Review the root cause analysis of all major incidents and ensure that 60% are resolved by automation after first offense
  • Developing and maintain runbook to fully enable an automated failover of an application
  • Analyzing the system using frameworks like ChaosMonkey to identify weak points in the architecture and work with development and engineering teams to handle all incident situations
  • Developing a Continuous Integration and Delivery model with products like Jenkins to ensure delivery of new content is fully automatable with pipelines that do automated integration and functional testing
  • Leveraging application performance monitoring (APM) products such as AppDynamics, Dynatrace, Splunk, etc. to analyze production code and partner with engineering and development teams to proactively identify gaps in production releases to prevent incidents
  • Partnering with application and engineering teams to identify all monitoring requirements and ensure coverage is achieved at all stacks of the application (UX, Web, App, Middleware, DB) using the firms standard monitoring solutions
  • Work with capacity management tools to proactively monitor performance and identify gaps to ensure tuning or capacity increases are achieved before incident’s
  • 24x5 real coverage with on call support on weekends to provide a follow the sun model for support and incident remediation for all major incidents
  • Partnering with application owner to understand all audit/csa/rcsa requirements and deliver solutions proactively and reactively to ensure satisfactory and compliant results
Qualifications Site Reliability Engineer:
  • Software expertise with the ability to code and script in multiple languages (PowerShell, Python, Ansible, Puppet, Shell, etc. and other open source technologies and tools
  • Hands-on experience with strong understanding of infrastructure (operating systems, web, middleware, and networking)
  • Background in an engineering or support role to manage infrastructure
  • Strong knowledge of Continuous Integration and Continuous Delivery
  • Experience with Scrum/Agile development methodologies
  • Capable of delivering on multiple competing priorities with little supervision
  • Excellent verbal and written communication skills
  • Computer Science or similar degree with experience in the following software/tools:
    • Infrastructure automation technologies: Ansible, Puppet, Chef, etc.
    • Declarative Programming languages: YML, Ruby, etc.
    • Scripting languages like Python, Perl, Shell, etc.
    • Build automation technologies: Maven, Jenkins, etc.
    • Monitoring technologies: Nagios, Tivoli, SCOM, etc.
    • Use of APIs and services using REST, SOAP, etc.
Apply Now    
Link for schema

Join our Talent Community

Not ready to apply? Leave your information with us and we will keep you up to date with new career opportunities.

Other Information

Apply Using LinkedIn

You can also apply using your LinkedIn® profile. It may save you some time because your information will be automatically transferred into our system. Just click on the LinkedIn logo when you get to the application screen and follow the directions.

Submit an Updated Résumé

During the application process, be sure you have an up-to-date copy of your Résumé, your cover letter and any other documentation you would like to submit.