Site Reliability Engineer, Cloud Big Data, Associate Site Reliability Engineer, Cloud Big Data,  …

J.P.Morgan
in Singapore
Permanent, Full time
Be the first to apply
Competitive
J.P.Morgan
in Singapore
Permanent, Full time
Be the first to apply
Competitive
Site Reliability Engineer, Cloud Big Data, Associate
As a Site Reliability Engineer (SRE), you'll help build a meaningful engineering discipline, combining software and systems to develop creative engineering solutions to operations problems. Much of our support and software development focuses on optimizing existing systems, building infrastructure and reducing work through automation. You'll join a team of curious problem solvers with a diverse set of perspectives who are thinking big and taking risks. In this environment, you'll take the lead on relevant projects, supported by an organization that provides the support and mentorship you need to learn and grow. As an SRE, you'll be focused on running better production applications and systems.

JPMC has started significantly investing and building in the next generation core infrastructure, Cloud, Big Data and AI/ML technology, with the goal to accelerate the delivery and adoption of the Global Technology Vision - and enable the firm's Global Technology teams to deliver faster and more impactful for customers and clients. As a Senior Site Reliability Engineer, you will be working with the JPMC Big Data team on production support in public cloud. You'll be working with AI/ML and cloud engineers to build the platform, pipeline, and monitoring systems to ensure the application landscape is designed to take most advantage of JPMC's global cloud solution.

This role requires a wide variety of strengths and capabilities, including:
  • Deep understanding of SRE philosophy, technologies, platforms and tools, SLA management, incident resolution, and automation
  • Mastery of application, data and infrastructure architecture disciplines
  • Command of architecture, design and business processes
  • Keen understanding of financial control and budget management
  • Expertise in working in partnership with colleagues throughout the firm, and in leading collaborative teams to achieve common goals
  • Hands on experience on managing operations of large-scale internet-centric production environments for application or infrastructure services serving tens to millions of end users
  • Prior experience in large scale internet companies/technologies, where uptime and continuous availability was core to the business
  • Work with Architecture to design reusable patterns to deploy to applications, provide governance around adoption, and influence application development teams on roadmaps and designs
  • Identify and partner with Infrastructure teams and AD teams to implement automation opportunities to drive down toil and reduce technical debt
  • Apply standards of cloud compliance to application design to achieve reliability
  • Understanding of Networking and cloud technologies, for example Security, Load Balancing, Network routing protocols
Responsibilities:
  • Implement SRE frameworks to support globally multi-cloud environments, and ensure the highest level of SLA through operational excellence
  • Provides failure analysis / root cause analysis when required
  • Provides support to develop & improve the quality of technical engineering documentation
  • Provides support to drive the maturity of the software development lifecycle
  • Provides quality control of engineering deliverables
  • Provides technical consultation to product management
  • Performs deployment, administration, management, configuration, testing, and integration tasks related to the big data platforms in cloud environment
  • Helps to develop new cloud engineering strategies and implementations for the firm
  • Champion a DevOps model so that services are automated and elastic across all platforms
  • Helps on coaching and mentoring less experienced team members
  • Writes operation documentation and knowledge base of known issues with solutions
  • Participates in 24x7 SRE on-call rotations and escalation workflows, which may consist of the occasional w eekend when needed
Required Skills:
  • Bachelor's degree in Computer Science, Information Technology, or equivalent technical field
  • 6 or more years relevant engineering experience
  • 2 or more years of Enterprise Cloud infrastructure experience (AWS, Azure or GCP) in a mission critical environment
  • In-Depth OS experience (RHEL, Ubuntu, Windows Server) with strong debugging, troubleshooting, and problem-solving skills
  • Experience in site reliability engineering in one of the following languages: Python or Java
  • Hand-on experience with cloud-based technologies and tools especially in deployment, monitoring and operations, such as Data Dog, Prometheus, Splunk, Elasticsearch or Grafana
  • Strong working knowledge of modern development technologies and tools such Agile, CI/CD, Git, Terraform and Jenkins
Additional Preferred Skills:
  • AWS certification is highly desirable
  • Experience in GO, powershell or shell scripting
  • Deep knowledge of Internet protocols and web services technologies such as HTTP, DNS, TCP/UDP, SOAP, JSON and REST
  • Good understanding of networking protocols and cybersecurity best practices in cloud environment
J.P.Morgan logo
More Jobs Like This
See more jobs
Close
Loading...
Loading...