FB Pixel no scriptFresh faces, bold results: DeepSeek’s rise in AI | KrASIA
MENU
KrASIA
Insights

Fresh faces, bold results: DeepSeek’s rise in AI

Written by 36Kr English Published on   4 mins read

Share
With its young team defying convention, DeepSeek’s latest V3 model signals that its bold approach might be yielding results.

Luo Fuli, a prodigious talent born in the mid-1990s and personally scouted by Xiaomi’s Lei Jun, serves as a window into the young talents shaping DeepSeek’s future. This team, described as “unfathomable geniuses” by Jack Clark, OpenAI’s former policy director, developed the groundbreaking DeepSeek-V3 model on a modest budget of USD 6 million. Early evaluations suggest it outperforms GPT-4o and Claude 3.5 Sonnet.

DeepSeek’s founder, Liang Wenfeng, sketched a broad profile of his team during an interview with 36Kr: “They include fresh graduates from top universities, doctoral interns in their fourth or fifth year, and early-career professionals just a few years out of school.” Yet, building such a cohort of brilliance is but the first step in DeepSeek’s pursuit of artificial general intelligence (AGI).

Effective management of these talents is pivotal to DeepSeek’s mission, according to interviews conducted by 36Kr. Since its founding in May 2023, the company has maintained a team of approximately 150 members, structured with a flat hierarchy that emphasizes collaboration in resource allocation and research topic selection.

Innovation at DeepSeek thrives in the untested synergy of young talents working within a nontraditional organizational framework.

Over 100 talents, no hierarchies

Hiring seasoned veterans is a common strategy in the artificial intelligence sector. For instance, Wang Xiaochuan brought his 20-year Sogou core team to Baichuan Intelligence, while Jiang Daxin staffed his AI firm, Stepfun, with former colleagues from Microsoft Research Asia. The founding team of 01.AI included profiles such as Huang Wenhao, Pan Xin, and Li Xiangang, each with high-profile credentials from Microsoft, Google Brain, and ByteDance, respectively.

DeepSeek, however, prefers untested talent. A headhunter familiar with the company’s hiring practices told 36Kr, “DeepSeek does not recruit senior tech professionals. The upper limit is around three to five years of experience, and those with over eight years are often rejected outright.”

For instance, three core authors of DeepSeekMath—Zhu Qihao, Shao Zhihong, and Peiyi Wang—completed their research during doctoral internships. Another team member, Dai Damei, only graduated with a doctorate from Peking University in 2024.

Beyond education, DeepSeek evaluates candidates on their academic and competitive achievements. Third-party collaborators revealed that competition results weigh heavily—anything below a gold medal is typically excluded.

The resume of a DeepSeek team member shared online reveals a degree from Peking University, three ACM International Collegiate Programming Contest (ICPC) gold medals, and six research papers, two as co-first author, mostly published in top-tier conferences.

By the time DeepSeek officially launched in May 2023, hedge fund High-Flyer, its backer, had already assembled a team of nearly 100 AI engineers. Today, DeepSeek’s Beijing office alone houses over 100 engineers, excluding an infrastructure group based in Hangzhou. Acknowledgment sections from technical reports reveal that 139 engineers contributed to the DeepSeek-V3 project.

Though DeepSeek’s team size pales compared to the thousands employed by ByteDance or Baidu, its focus on talent density is noted. Some describe it as an elite force in AI innovation.

To retain this pool of talent, DeepSeek employs two strategies. First, it offers highly competitive compensation. “DeepSeek’s salaries match ByteDance’s R&D offers, often exceeding them,” an insider shared with 36Kr. Furthermore, once Liang deems a technical proposal promising, an “unlimited” amount of computational resources are essentially provided.

Second, DeepSeek adopts a flat, academic-style management structure. Members work in project-based groups without fixed roles or strict hierarchies. Each person takes on tasks matching their expertise, with challenges addressed through group discussions or advice from other experts.

Liang described this as a bottom-up approach with natural division of labor during his interview with 36Kr: “Everyone brings unique experiences and ideas. They don’t need to be pushed. Once an idea shows potential, we reallocate resources from the top down.”

This flat structure resonates with broader trends in innovation. Wang Huiwen, founder of Light Year, emphasized the importance of egalitarian communication for building learning organizations. Similarly, OpenAI co-founder Greg Brockman highlighted OpenAI’s avoidance of job titles, instead using the all-encompassing “member of technical staff.”

One outcome of this collaborative approach is the multi-head latent attention (MLA) training architecture, which significantly reduced the cost of training the DeepSeek-V3 model. Liang noted that MLA stemmed from a young researcher’s personal interest, leading to the creation of a dedicated team that succeeded after several months.

Breaking with convention

AI companies in China often focus on proven professionals validated by job titles, track records, and product influence. However, the post-2024 shift in hiring reflects the increasing prominence of young, unproven talents.

For example, Sora’s Aditya Ramesh highlighted at the 2024 BAAI Conference that OpenAI prioritizes high-potential individuals who may lack formal academic accolades. Similarly, diffusion transformer (DiT) author Xie Saining noted that many successful researchers have limited traditional research training.

At DeepSeek, many recruits lack prior experience in model training or even a computer science background. One physicist shared how self-teaching and designing solutions in uncharted areas served as an entry point into AI. Another operations engineer was said to be a complete novice before joining.

“Innovation requires breaking free from inertia,” an industry insider said. Most Chinese AI firms follow OpenAI’s proven frameworks, relying on the transformer architecture and scaling law to minimize risks. However, few remember that these methods were once seen as audacious before GPT-3’s success.

Without commercial pressure or rigid key performance indicators, DeepSeek’s members can avoid taking after OpenAI. One engineer said that MLA emerged because DeepSeek challenged default architectures from the start. “Other companies might replicate MLA, but they wouldn’t question the original assumptions.”

However, DeepSeek’s audacity depends on abundant resources. “DeepSeek’s entire focus is model training. It saves by avoiding advertising or branching into other businesses,” the insider said.

“DeepSeek does not recruit renowned names because they rarely have the motivation to innovate,” a headhunter working with DeepSeek added. “Big-name veterans often lack the drive for innovation. They carry the baggage of past successes, afraid to fail. True breakthroughs require fresh minds.”

KrASIA Connection features translated and adapted content that was originally published by 36Kr. This article was written by Zhou Xinyu for 36Kr.

Share

Auto loading next article...

Loading...