USACares Jobs

Job Information, Inc Cloud Site Reliability Manager in Singapore, Singapore

Job Category

Products and Technology

Job Details

About the job

Imagine being part of a vibrant team where your ideas have the potential to shape the direction of a new organization. Picture yourself working on new transformational technologies. Envision yourself in a team solving thought-provoking technical problems and driving our customers’ success. Please come join us as we look to begin a new SRE journey for Salesforce!

The Public Cloud Site Reliability Engineering (Cloud SRE) team is a brand new organization within Production Engineering, with an exciting mission to bootstrap adoption of the industry’s leading-edge SRE principles and best practices at Salesforce. Working closely with counterparts in the Infrastructure and Engineering organisations, this Cloud SRE group owns the reliable delivery of service to Salesforce engineering teams and customers running on public cloud infrastructure. This organization provides round-the-clock, follow-the-sun situational awareness and leadership in the swift repair of any service-impacting issues, driving customer success.

We are looking for a manager to lead our growing team within our Singapore office, focused on hiring and retaining the best SRE talent, partnering with distributed development teams to drive reliability and excellence in service delivery, and leading internal development projects to improve our own efficiency and quality of service delivery.

Cloud SRE balances proactive automation with reactive operations, and targets 50%+ time spent on improving service design for reliability, extending monitoring and operational automation, driving self-healing and resiliency initiatives and game day exercises. As the leader of a Cloud SRE team, you will be the guardian of both service delivery excellence and your team’s ability to keep 50%+ of their time focused on improving service design for reliability, extending monitoring and operational automation, driving self-healing, and leading resiliency initiatives and game day exercises.


In this role, you will be responsible for managing and supervising the day-to-day responsibilities of front-line Site Reliability Engineers. The ideal candidate combines software engineering management experience with an agile develop process with both excellent cross-functional communication and organization skills and experience with managing enterprise-scale Internet services. As a technical leader, you will both create the strategy for your team’s role in a larger movement to DevOps principles within Sales force, and set the tactical direction across multiple teams as you drive investigations within incident investigations. This position will involve fostering and maintaining strong relationships with other connected areas of the business, ensuring the SRE team are vital stakeholders within a continuous cycle of engineering and process improvements.

  • Incident Management - Act in key support roles during major incidents e.g. Sev0, Sev1, Sev2.

  • Problem Management - Populate and participate in RCAs and partner with multiple teams to drive permanent resolution of complex issues.

  • Engineering Leadership - Drive the team as well as partner product teams to be proactive in design, management, and improvement of high-quality customer-facing services, with a focus on automation, reliability, and observability.

  • Process-Minded - Create and improve processes that facilitate SREs responding and mitigating incidents to quantitative goals.

  • Collaborative and Influential - Works successfully with other cross-cloud service owners (Developers, DBAs, Network etc) with positive relationships but with influence.

  • Data-Driven: We want to a leader who will use data to solve underlying problems in our systems.


  • 10+ years of Infrastructure Engineering, DevOps, or Technical Operations experience

  • 5+ years managing Site Reliability Engineering, Operations, or Software Development teams preferably in globally distributed environments

  • Experience with management and troubleshooting of Internet services running on traditional data centers and Public Cloud (AWS, GCP, or Azure) infrastructure

  • Past experience in Incident Management, strong understanding of ITIL processes, and Scrum agile development methodologies

  • Expertise with enterprise observability and monitoring systems, such as Prometheus, OpenTSDB, and Splunk

  • Passion for: Teamwork and collaboration, Adaptability, Customer Focus, Results, and Innovation.

  • Entrepreneurial-spirited with strong Aloha spirit. Passionate about employee development with experience successfully coaching individuals to achieve goals

  • Strong communication, organizational, analytical and problem solving skills and attention to detail

  • Passionate about engineering productivity and service ownership and customer success

  • Experience designing, developing, debugging, and operating resilient distributed systems that run across thousands of compute nodes in multiple data centers

Preferred qualifications:

  • A good understanding and practice in large-scale distributed systems

  • Experience in designing and deploying high performance production services with extensive monitoring and logging practices

  • CI/CD automation experience, including understanding of key open source technologies like Spinnaker, Vault, Jenkins, and Docker

  • Experience with immutable infrastructure via Terraform/CloudFormation or other similar approaches across large footprints and distributed teams


  • MS in Computer Science or related field, or

  • BS in Computer Science or related field plus relevant job-related experience


Accommodations - If you require assistance due to a disability applying for open positions please contact the Recruiting Department at .

Posting Statement and are Equal Employment Opportunity and Affirmative Action Employers. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, or disability status. Headhunters and recruitment agencies may not submit resumes/CVs through this Web site or directly to managers. and do not accept unsolicited headhunter and agency resumes. and will not pay fees to any third-party agency or company that does not have a signed agreement with or

Founded in 1999, Salesforce is the global leader in Customer Relationship Management (CRM). Companies of every size and industry are using Salesforce to transform their businesses, across sales, service, marketing, commerce, and more by connecting with customers in a whole new way. We harness technologies that can revolutionize companies, careers, and, hopefully, our world.

Salesforce is built on a set of four core values: Trust, Customer Success, Innovation, and Equality. By making technology more accessible, we're helping create a future with greater opportunity and equality for all. This has taken our company to great heights, including being ranked by Fortune as one of the “Most Admired Companies in the World” and one of the “100 Best Companies to Work For” eleven years in a row, and named “Innovator of the Decade” and one of the “World’s Most Innovative Companies” eight years in a row by Forbes.

There are those who choose to work with the best and brightest. And then, there are those who want to do more than just a job. They are the ones improving lives, not only their careers. Having an impact now instead of later. Doing something that’s so much bigger than themselves, an industry, and their company.

We believe everyone can be a Trailblazer. Join Salesforce and discover a future of new opportunities.