Responsibilities
The team is responsible for infrastructure systems, including Storage/Computing/DB. We aim to be the leading SRE team across the industry. In the SRE team, you will have the opportunity to manage the complex challenges of scale, while using expertise in coding, algorithms, complexity analysis, and large-scale system design. We embrace a culture of diversity, intellectual curiosity, openness, and problem-solving. We also encourage ownership, self-governance and independence to work on various projects, and an environment that provides the support and mentorship needed to learn and grow as an engineer.
What you will be doing:
1. Reliability: Ensuring the reliability and efficiency of our core infrastructure, focusing on system capacity and stability; setting up reliability standards and recovery SOP.
2. Reliability: Troubleshooting and locating the technical issues, bottleneck analysis, managing system high availability architecture transformation and upgrading.
3. Efficiency: Building automated operation solutions for large-scale systems; partnering with system development teams for system iteration.
4. Efficiency: Designing and implementing software platforms and monitoring frameworks for efficient, automated, and intelligent service-oriented architecture (SOA) governance.
5. Cost: There are millions of CPUs. We should build delivery standards, and monitor and budget systems to optimize the cost of the company.
6. Compliance: Designing and setting up new IDC; designing and implementing data protection plan to meet the standard requirement.
Qualifications
Minimum Qualifications:
- Bachelor's / Master's Degree in Computer Science or related major, with at least 5 years of relevant experience;
- Solid basic knowledge of computer software, understanding of Linux operating system, storage, network IO and other related principles.
- Familiar with one or more programming languages, such as Python, Go, and Java. Knowledge of design patterns and coding principles is necessary.
Preferred Qualifications:
1. Experience with storage, and relevant system experience with the following: KV, Table, Graph, Redis, MySQL, MongoDB, MQ, and Kafka.
2. Experience with computing & big data, and system experience with the following: Kubernetes, Docker/Containers, AIops, Spark, Flink, Function as a service, RPC Framework, and Service Mesh.
Job Information
About Us
Founded in 2012, ByteDance's mission is to inspire creativity and enrich life. With a suite of more than a dozen products, including TikTok, Lemon8, CapCut and Pico as well as platforms specific to the China market, including Toutiao, Douyin, and Xigua, ByteDance has made it easier and more fun for people to connect with, consume, and create content.
Why Join ByteDanceInspiring creativity is at the core of ByteDance's mission. Our innovative products are built to help people authentically express themselves, discover and connect – and our global, diverse teams make that possible. Together, we create value for our communities, inspire creativity and enrich life - a mission we work towards every day.
As ByteDancers, we strive to do great things with great people. We lead with curiosity, humility, and a desire to make impact in a rapidly growing tech company. By constantly iterating and fostering an "Always Day 1" mindset, we achieve meaningful breakthroughs for ourselves, our Company, and our users. When we create and grow together, the possibilities are limitless. Join us.
Diversity & Inclusion
ByteDance is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At ByteDance, our mission is to inspire creativity and enrich life. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach. We are passionate about this and hope you are too.