Apply Now    

Site Reliability Engineer

Req #: 170113151
Location: Jersey City, NJ, US
Job Category: Technology
Job Description:

Global Technology Infrastructure (GTI) is the technology infrastructure organization for the firm, delivering a wide range of products and services, and partnering with all lines of business to provide high quality service delivery, exceptional project execution and financially disciplined approaches and processes in the most cost effective manner. The Global Service Operations (GSO) organization within GTI plays a critical role in providing operational support of enterprise systems that host applications at the core of JPMorgan Chase's business.

 

This position is responsible for global operations team providing 24x7x365 support of the JPMorgan Chase Integrated Compute Platform.  This platform provides converged compute, storage and network capabilities.

 

The ideal candidate is an experienced technologist, with an agile software development mindset and a proven track record of managing large scale compute, storage and network environments. Should have extensive experience in automation technologies; has excellent technical, communication and collaboration skills.

 

RESPONSIBILITIES

  • Accountable for end to end operations of firm’s Integrated Computer Platform, including compute, hypervisor, storage and network.  This will include incident, change and problem management, vulnerability management, platform hygiene, monitoring, etc.
  • Site Reliability Engineer Role (Google Model) for the IAAS platform i.e. the Integrated Computer Platform with ownership of Infrastructure automation, micro services operational ownership and driving the agility of the platform to achieve end to end stateless provisioning and operations.
  • Triage, mitigate, and drive any problems across multiple teams to resolution in an expedient manner across multiple teams
  • Identifying and driving stability, operational efficiency and risk reduction efforts for the integrated computer platform.
  • Work closely with Engineering and LOB partners to define, drive, and report on strategic programs that continuously improve stability, efficiency, and performance of the infrastructure.
  • Play a key role in defining SLAs and managing processes that enable satisfying or exceeding those SLAs, and will participate in any post mortem documentation and reporting activities relating to outages and missed SLAs.
  • Drive vendor performance including operational excellence and innovation in partnership with GTI Engineering Groups.
  • Ensure process, procedures and controls are auditable, with risk management embedded into all aspects of operations. Conduct risk self-assessments and escalate issues as identified.

 

QUALIFICATIONS

  • Bachelor's degree or the equivalent in Computer Science, Engineering or related technical field.
  • 7-10 years of experience in large-scale, high-availability environment
  • Strong knowledge of infrastructure operations, engineering and agile software development
  • Motivated to work in a high pressure, dynamic operations environment
  • Deep understanding of  Compute, Storage and Networks operations
  • Awareness and experience in operations in a DevOps / SRE / Agile operations model. Strong working knowledge of orchestration, monitoring and instrumentation solutions
  • Experience with Enterprise Incident, Problem and Capacity Management practices
  • Experience transitioning to virtualized, data driven, software defined environments
  • Proven ability to innovate creative and practical solutions to deliver business value
  • Excellent analytical, problem solving and critical thinking skills
  • Extensive hands-on experience as Infrastructure Ops / Engineer (Spanning OS, third party virtualization and automation products, Storage, Networks, etc)
  • Experience with either of these products: VMware ESX, vSphere 4/5, Scripting (PowerShell, JavaScript, etc), RHEL, Windows.
  • Strong understanding of storage and network functions and capabilities
  • Strong hands on experience with installation and configuration of IT infrastructure components in large and complex environments.
  • Experience with writing scripts and/or workflows to automate routine product management admin tasks SHELL, PERL and/or PYTHON
  • Demonstrated ability to deliver solutions that are resilient, repeatable and scalable 
  • Experience with vendor provided APIs
  • Ability to work under pressure and multi-task appropriately prioritizing deliverables against timeframe
  • Must have excellent verbal and written skills being able to communicate effectively on both a technical and business level
  • Excellent attention to detail and ability to analyze detailed business requirements, raise questions and seek resolution to outstanding clarifications required for test case identification
  • Excellent work ethic
  • Must demonstrate strong time management skills & ability to prioritize one's own work
Apply Now    

Join our Talent Community

Not ready to apply? Leave your information with us and we will keep you up to date with new career opportunities.

Other Information

Apply Using LinkedIn

You can also apply using your LinkedIn® profile. It may save you some time because your information will be automatically transferred into our system. Just click on the LinkedIn logo when you get to the application screen and follow the directions.

Submit an Updated Résumé

During the application process, be sure you have an up-to-date copy of your Résumé, your cover letter and any other documentation you would like to submit.