SRE/Cloud Engineer - Production Engineering
Do you like to engineer large scale mission critical solutions that are Always On
? Do you like to contribute where cutting edge infrastructure and software development, global scale system architecture and enterprise systems administration intersect? Are you curious about how things break, so that you can eliminate these failure modes? Are you driven to eliminate toil by designing/adopting better tools and automation techniques? If so, we would love to have you in our Production Engineering team.
As a senior leader in Production Engineering, you will combine your years of proven expertise with a never-ending quest to create innovative technology. With your deep knowledge of design, architecture, infrastructure development, testing and application design, your team will raise their game even more, meeting your standards, as well as satisfying both business and functional requirements. In addition to creating solutions with partnership across firm wide technology and business team, you'll also work with public cloud providers as a key part of our hybrid cloud strategy.
This role requires a wide variety of strengths and capabilities, including:
- Mastery of application, data and infrastructure architecture disciplines
- Command of architecture, design and business processes
- Keen understanding of financial control and budget management
- Expertise in working in partnership with colleagues throughout the firm, and in leading collaborative teams to achieve common goals
- Deep understanding of Site Reliability Engineering (SRE) philosophy, Chaos Engineering, technologies, platforms and tools, SLA management, incident resolution, and automation
- 10 years of experience engineering distributed systems (compute and storage) and managing operations of large scale internet-centric production environments for application or infrastructure services serving tens to millions of end users
- 5 years of experience developing and/or site reliability engineering in one of the following languages: Java J2EE technology stack and web technologies , Python, Go, Perl, Ruby or shell scripting (Unix/Linux)
- Hands-on experience with cloud-based technologies and tools especially in deployment, monitoring and operations, such as Kubernetes/Docker, Pivotal Cloud Foundry, Prometheus, FluentD, Slack, Elasticsearch, Grafana, Kibana, etc.
- 7 years experience in
- Managing and/or influencing infrastructure services to ensure application service uptime and user experience
- Developing and managing engineering practices leveraging key event streaming, messaging and DB services such as Kafka , Cassandra , Aurora, RDS, Cloud SQL, BigTable, DynamoDB, MongoDB, Cloud Spanner, Kinesis, Cloud Pub/Sub, etc.
- Experience working with Infrastructure as Code to automate design patterns and configuration management through full lifecycle; Terraform, ansible, puppet, AWS Cloud-formation
· Working with Architecture to design reusable patterns to deploy to applications, provide governance around adoption, and influence application development teams on roadmaps and designs
- Applying standards of cloud compliance to application design to achieve reliability
- Understanding of Networking and cloud technologies, for example Security, Load Balancing, Network routing protocols