Our mission is to simplify the acquisition and utilization of unstructured/unlabeled data. The team act as the data modeling factory, using and analyzing mass data and finding useful insights for business growth.
About the Role:
We are looking for experienced data scientists to join our team and apply advanced analytics and machine learning techniques-including Prompt Engineering (PE), multi-modal large language models (LLMs), computer vision (CV), natural language processing (NLP), and audio signal processing-to optimize intelligent labeling workflows and data products within TikTok's ecosystem. Your work will help improve user experience, enhance content integrity, and support data-driven strategic decision-making. You will collaborate closely with cross-functional teams across product, operations, and algorithms to build scalable, end-to-end Prompt Engineering and LLM workflows for intelligent content moderation and labeling applications.
Key Responsibilities:
• Collaborate with cross-functional stakeholders to gather and refine requirements for data labeling projects and identify opportunities for optimization through data-driven solutions.
• Design and manage the full lifecycle of end-to-end data labeling and policy testing workflows — from aligning with business needs to deployment, iteration, and monitoring.
• Establish and maintain a centralized knowledge base for Retrieval-Augmented Generation (RAG) systems, incorporating both structured (e.g., SOPs, guidelines) and unstructured (e.g., annotations, case logs) data to support LLM-based policy QA and labeling efforts.
• Operationalize intelligent labeling pipelines leveraging Prompt Engineering, agent-based workflows, and labeling models to ensure availability of high-quality data for model training and policy evolution.
• Translate complex policy documents into machine- and human-readable formats, support agent and PE strategy development, and evolve nuanced policy edge cases in sync with fast-changing regulatory or platform dynamics.
• Apply multi-modal LLM techniques to extract latent signals from content that inform moderation strategies and highlight policy gaps.
• Lead applied ML and data science research and experimentation to solve business-critical use cases.
• Own the model lifecycle from data sourcing and preprocessing to training, deployment, and post-launch maintenance.
Report job