The Production Engineer is a technical Subject Matter Expert that is accountable for the technical stability of the applications in the production environment. Staff will provide Engineering expertise to facilitate operational availability and performance of internally and externally developed systems. They design, manage and implement non-functional capabilities for solutions in physical, virtual and cloud environments. Their solutions are in support of areas including Automation, Monitoring, Alerting, Logging, Performance, Infrastructure, Configuration, and Ongoing Operations.
The Production Engineer has a full understanding of the hardware and software architecture of the applications within the end to end business flow and is responsible for driving down technical debt, including working with the application development team to identify and remediate chronic technical issues, automating manual activities, and ensuring the health of the platform is properly maintained. Staff work closely with Operations, Application Development and Infrastructure teams.
Gain a thorough understanding of the technical architecture of the end-to-end product platform, through knowledge sharing sessions and personal investigation.
Create and maintain end-to-end product flow diagrams, and ensure all architecture components are well documented.
Raise concerns about the technical product design that could put the stability and resiliency of the platform at risk.
Be the point of escalation during technically complex P1 issues to assist in resolution and restoring service, as required.
Actively participate in root cause analysis of S1/S2 incidents and contribute to the resolution thereof, eliminating repeat issues and driving down MTTR.
Proactively identify technical improvements to thematic recurring production tickets of any severity; reduce/eliminate false or un-actionable alerts and event storms.
Identify and partner with Infrastructure teams and AD teams to implement automation opportunities to drive down toil and reduce technical debt.
Identify and implement tooling enhancements, ensuring full visibility into the environment and driving proactive identification and resolution of issues.
In partnership with AD, participate in architectural design reviews to ensure non-functional requirements are included from design through implementation. These include:
- Standards, best practices and Compensating Controls - Backups
- Performance Requirements, Capacity Management, Application scalability
- Monitoring and alerting Resiliency
- Service and Operational Level Agreements, Application Knowledge Support Artifacts
Drive resiliency of the application product suite, implementing the appropriate Continuity Management plans according to defined business requirements. This includes Disaster Recovery (DR), High Availability (HA), and/or Sustained Resiliency (SR) planning and automated, seamless execution.
*This position is anticipated to require the use of one or more High Security Access (HSA) systems. Users of these systems are subject to enhanced screening which includes both criminal and credit background checks, and/or other enhanced screening at the time of accepting the position and on an annual basis thereafter. The enhanced screening will need to be successfully completed prior to commencing employment or assignment.
7+ years of Application Development work experience throughout the full project delivery lifecycle.
3+ years Application Production management and Support with proven success resolving the actual root cause of issues.
5+ years experience in
log analysis and monitoring tools such as Splunk and DynaTrace
infrastructure engineering, especially as relates to Digital and Web channels
Java/J2EE technology stack and web technologies
relational databases such as DB2 and Casandra
queue technologies, such as MQ/JMS/Kafka
Resolving hardware and software hygiene breaks
Understand networking and cloud technologies, for example Security, Load Balancing, Network routing protocols.
Provide occasional 24/7 on-call escalation support as required.
Proven experience in incident/problem management.
Capable of executing and delivering on a multitude of concurrent initiatives in a dynamic environment.
Proven experience as a people manager: driving employee satisfaction, developing the skills of the team, developing and helping staff achieve career objectives, and drive team to achieve organizational objectives.
Excellent communication verbal and written skills, with the ability to contribute to large conference calls. Comfortable providing clear problem descriptions in terminology understood by a broad audience in a time sensitive environment.
Ability to be proactive with a strong bias for action, naturally inquisitive, and bias for continuous improvement of practices / process.
Experience in working with cross line of business teams, Outside Service Providers and Partner Organizations.
Experience with Incident Management methodologies, systems and tools.
Outstanding interpersonal skills and ability to establish strong relationships with all levels of management.
Strong knowledge of relevant applications and development life cycle methodologies, including Agile.
Experience working with geographically distributed and culturally diverse work-groups.
Ability to work independently as a self-starter, and within a team environment. Comfortable in a fast paced dynamic environment with an ability to handle multiple tasks simultaneously.
Working knowledge of Microsoft products, including Outlook, Word, Excel, and PowerPoint.
Experience in a DevOps model a plus.
Banking and/or financial services industry experiences a plus.
Not ready to apply? Leave your information with us and we will keep you up to date with new career opportunities.
Sign in to our application system to continue your job search.
Current employees sign in here.
You can also apply using your LinkedIn® profile. It may save you some time because your information will be automatically transferred into our system. Just click on the LinkedIn logo when you get to the application screen and follow the directions.
During the application process, be sure you have an up-to-date copy of your Résumé, your cover letter and any other documentation you would like to submit.