SRE Tech Lead - Big Data Compute Platform

TikTok
Singapore
Full time
6 days ago
About TikTok
TikTok is the leading destination for short-form mobile video. At TikTok, our mission is to inspire creativity and bring joy. TikTok's global headquarters are in Los Angeles and Singapore, and its offices include New York, London, Dublin, Paris, Berlin, Dubai, Jakarta, Seoul, and Tokyo.

Why Join Us
Creation is the core of TikTok's purpose. Our products are built to help imaginations thrive. This is doubly true of the teams that make our innovations possible. Together, we inspire creativity and enrich life - a mission we aim towards achieving every day. To us, every challenge, no matter how ambiguous, is an opportunity; to learn, to innovate, and to grow as one team. Status quo? Never. Courage? Always. At TikTok, we create together and grow together. That's how we drive impact-for ourselves, our company, and the users we serve. Join us.

About the team
Our Compute Platform SRE team supports all Big Data services and products across the company. We are a newly established team and waiting for talents like you to shape the team's future together. We are responsible for the reliability of all the company's major data warehouse products, services, and query engines. We serve business needs across domains within TikTok. We look forward to welcoming you to the team.

Responsibilities:

- Lead a global SRE team for TikTok's Data Platform, distributed across the US and Singapore. Responsible for the reliability of all TikTok's major data warehouse products, services, and query engines, such as ClickHouse, Spark, Presto, Doris, etc.

- Uphold Service Level Agreements (SLAs): Ensure that all service level objectives and agreements from ByteDance's Data Platform services are met. Lead team members to respond promptly to any system outages or issues.

- Continuous Performance Optimization: Lead the team to deeply analyze service performance and reliability patterns to identify potential performance bottlenecks. Implement proactive measures to prevent service disruptions. Work with development teams to optimize application performance, ensuring that services run efficiently and that resources are utilized effectively.

- Incident Management: Build robust incident management mechanism. Lead efforts to troubleshoot and resolve service incidents and postmortems. Coordinate with cross-functional teams to manage and mitigate service-impacting events.

- Infrastructure Automation: Lead the team to develop highly efficient toolchains covering end-to-end deployment and reliability assurance operations. Automate infrastructure provisioning, scaling, and management processes to reduce manual interventions and improve service quality. Develop and enhance system capabilities such as auto-failure-detection, auto-healing, chaotic engineering, and perform systematic disaster drills.

- Collaboration: Engage with product and development teams to integrate reliability and performance considerations into the software lifecycle.

- Capacity and Demand Planning: Assess and forecast infrastructure needs based on growth patterns and upcoming initiatives.

- Stay Updated: Keep current with industry trends, best practices, and emerging technologies related to site reliability and infrastructure engineering.
Apply
Other Job Recommendations:

VP, Platform SRE Engineer, SRE

DBS Bank
Singapore
$123,834 - $156,802
The SRE engineer will also perform automation development tasks to remove toil and increase the team’s productivity.Roles and...
3 weeks ago

Tech Lead (SRE) - Cloud Infrastructure

ByteDance
Singapore
What you should have:- At least a Bachelor's Degree in Computer Science or a closely related technical field, along with more than...
1 week ago

VP, Problem & Knowledge Management Lead, SRE & Governance, Group Technology

DBS Bank
Singapore
$95,755 - $121,248
  • Mentor the team in the seamless facilitation & conduct...
  • Prime focal point for presenting in the RCA Forum, Tech Risk...
3 weeks ago

Backend Software Engineer (SRE) - Cloud Infrastructure

ByteDance
Singapore
We also encourage ownership, self-governance and independence to work on various projects, and an environment that provides the...
2 days ago

SRE Manager, OLAP Engine (Bytehouse)

TikTok
Singapore
TikTok and affiliate are developing the next-generation high-performance analytical database, with a mission to enable efficient...
6 days ago