The lowdown on NLP developments in China: Insights from 2024

Natural language processing (NLP) is a branch of artificial intelligence that bridges the gap between computers and human language. Its core objective is to enable machines to comprehend, generate, and process human communication in both textual and spoken formats. As an interdisciplinary field, NLP incorporates elements of computer science, AI, and linguistics, distinguished by its dynamic, complex, and multifaceted nature.

NLP’s evolution can be mapped across four distinct stages:

Initial stage (1950s and 1960s): NLP research began during the second World War with machine translation efforts, propelled by advancements in computational codebreaking. Early systems were rudimentary, relying on basic word-level translations and simple rule-based processes typical of first-generation machine translation systems. However, limited understanding of human language, AI, and machine learning structures—combined with constrained computational power and scarce data—kept progress incremental.
Rule-based stage (1970s and 1980s): This era saw the rise of manually constructed rule-based systems, which grew in complexity and depth. These systems incorporated grammar rules and reference handling, enabling applications such as database queries. Advances in linguistics and knowledge-based AI introduced frameworks that separated declarative language knowledge from processing mechanisms, leading to more sophisticated language understanding capabilities.
Statistical learning stage (1990s till 2012): The proliferation of digital text shifted focus to algorithmic research. Early methods relied on extracting models from available online text, though simplistic word counts offered limited linguistic insight. Efforts soon turned to building annotated language resources and applying supervised machine learning techniques to tasks like word tagging, named entity recognition, and parsing grammatical structures. This statistical approach reshaped NLP, setting the stage for the deep learning era.
Deep learning stage (2013 to present): Deep learning has revolutionized NLP workflows, with notable progress between 2013 and 2018 in tasks involving contextual and semantic similarity using vector-based representations. From 2018 onward, NLP embraced large-scale self-supervised learning models, including transformer-based architectures like BERT and GPT. These innovations have dramatically improved NLP performance, driving its integration across industries and ushering in an era of unprecedented capabilities

What’s driving NLP development in China?

Regulation and policy support

Natural language processing (NLP) has thrived under robust government support, proactive policies, and stringent regulations. Initiatives such as the Digital China plan underscore the integration of AI technologies across industries, providing a macro-level strategy that drives both innovation and practical application. For instance, government-backed programs have encouraged businesses and research institutions to leverage NLP for enhancing digital services and streamlining workflows.

At the same time, guidelines from the Cyberspace Administration of China (CAC) on AI-generated content (AIGC) ensure compliance with ethical standards, content review protocols, and data security norms. This dual approach fosters standardized, large-scale industry growth while ensuring accountability.

August 2022: China’s Ministry of Science and Technology (MOST) introduced a policy to support the development of next-generation AI application scenarios. The initiative prioritized areas such as smart driving, aiming to create replicable and scalable models that are “usable and effective,” promoting wider adoption across industries.
December 2022: The Chinese central government unveiled a strategic roadmap focused on expanding domestic demand and advancing technological innovation. The plan emphasized accelerating the development of innovative products, integrating AI technologies, and prioritizing strategic projects in next-generation IT, new materials, and biomedicine.
February 2023: The central government outlined the “2522” framework for building a Digital China. It emphasizes strengthening two foundations: digital infrastructure, such as internet networks and platforms, and systems for managing and utilizing data resources. It aims to integrate digital technologies into five areas: the economy, governance, culture, society, and the environment. The plan also focuses on enhancing two critical capabilities: fostering innovation in digital technology and ensuring robust cybersecurity. Finally, it seeks to create a supportive environment for digital development, both within China and on a global scale.
July 2023: The CAC introduced interim measures to regulate generative AI services. This policy encouraged independent innovation in foundational technologies, set guidelines for service operations and data usage, clarified stakeholder responsibilities, and promoted innovative applications of generative AI.
August 2023: China’s Ministry of Industry and Information Technology (MIIT), in partnership with three other ministries, launched a plan to support industrial transformation and employment through emerging technologies. The initiative emphasized integrating emerging and future industries, establishing standards for AI and digital technologies, and adopting region-specific approaches to enhance foundational capabilities in generative AI.
August 2023: The MIIT and the Ministry of Finance issued an action plan to stabilize growth in electronics and information manufacturing. The plan focused on fostering advancements in computing, increasing investment in AI infrastructure, and strengthening the electronics industry’s foundation.
January 2024: The National Data Administration, along with 17 other ministries, introduced a three-year plan for data-driven innovation. This plan highlighted 12 key application scenarios designed to enhance the environment for large-scale AI deployment and accelerate the exploration of AI models in critical industries.
February 2024: The MIIT, alongside nine other ministries, issued guidelines to drive future industrial development. The policy proposed leveraging AI to identify and cultivate high-potential industries and supporting initiatives to promote industrial innovation and new forms of industrialization.
June 2024: The MIIT and related departments released a framework for integrating AI into national industries and advancing industrialization. This plan set a target for 2026 to deepen AI’s role in industrial technology innovation, foster multi-stakeholder collaboration, and establish standardized frameworks for high-quality AI industry development.

Traditional industries drive demand

The rapid digital transformation of industries such as finance, healthcare, and law has fueled the need for large-scale data processing and workflow optimization.

In finance, NLP tools enhance research efficiency and risk management by enabling capabilities like news classification, sentiment analysis, automated summarization, and personalized content recommendations. These tools allow analysts to identify market trends and investment opportunities with greater precision.

In healthcare, NLP streamlines and automates medical recordkeeping, alleviating administrative burdens on practitioners and improving the overall organization of patient data.

In the legal sector, professionals leverage NLP tools to expedite legal document drafting, contract review, case retrieval, and analysis. These solutions improve efficiency and accuracy, reduce labor costs, and minimize errors, transforming traditional workflows.

The increasing demand across these sectors creates extensive application scenarios and significant market opportunities for NLP, driving its continued development and adoption.

Industry and market state

The NLP industry chain comprises three layers: the upstream infrastructure layer, the midstream technology layer, and the downstream application layer.

Upstream (infrastructure layer)

This foundational layer includes hardware, data services, open-source models, and cloud services:

Hardware: High-performance servers, GPUs, TPUs, and other specialized chips support large-scale data processing and the complex training of NLP models.
Data services: These services rely on diverse sources, such as web crawlers and voice sensors, paired with rigorous data cleaning processes to ensure quality. Professional data annotation tailored to tasks like part-of-speech tagging and semantic analysis provides the high-quality training materials essential for optimizing models.
Open-source models: Models like BERT accelerate innovation by offering accessible starting points for research and development.
Cloud services: Flexible computing, storage, and networking resources lower adoption barriers for NLP, enabling scalable deployment.

Midstream (technology layer)

This layer focuses on NLP technology and product development.

Advanced technologies: Innovations include deep learning-based neural networks, such as recurrent neural networks (RNNs), long short-term memories (LSTMs), attention mechanisms, and transformer architectures.
Key players: Internet companies utilize their extensive ecosystems and customer bases to develop consumer-facing applications, while AI-focused companies apply their technical expertise to create customized solutions for niche markets.

Downstream (application layer)

This layer consists of NLP applications tailored to specific use cases and industries. Examples include:

Voice assistants: NLP-powered systems to enable voice recognition and synthesis, commonly integrated into smartphones and smart home devices.
Customer service: Intelligent systems to improve user satisfaction while reducing operational costs.
Risk control: Applications to analyze financial data, news sentiment, and other inputs to proactively mitigate risks.
Regulatory compliance: Tools to process policy documents and reports, enhancing governance efficiency in industries such as finance and market supervision.

Market size

With advancements in AI and the acceleration of digital transformation across industries, NLP technologies have rapidly gained traction due to their unique capabilities in language comprehension, generation, and interaction.

Applications such as intelligent customer service in e-commerce and finance, as well as writing assistants in media and advertising, highlight the commercial value of NLP. According to CCID Consulting, the NLP market is projected to reach RMB 30.85 billion (USD 4.3 billion) by 2024 and expand further to RMB 210.5 billion (USD 29.5 billion) by 2030, reflecting a compound annual growth rate (CAGR) of 36.5%.

Industry trends

Multimodal integrations

As technology evolves, NLP is advancing beyond text processing to integrate with other modalities, such as images and audio. For example, future smart home systems could combine NLP-based voice commands with computer vision to better understand user contexts and execute tasks. A command to “turn off the light where someone is sitting” might prompt the system to use image recognition to identify the occupied area and control the appropriate light.

In edtech, multimodal NLP can create immersive learning environments by enriching text-based content with visuals and audio explanations. These systems can dynamically respond to student progress and queries, blending voice, text, and visual feedback to enhance engagement and learning outcomes.

Model optimization and customization

NLP models are evolving toward lightweight designs and tailored solutions. To address the constraints of mobile devices and edge computing, model compression techniques and algorithmic optimizations are reducing computational and storage requirements. This enables high-performance NLP functionalities, such as smart voice assistants, to operate efficiently on smartphones and wearables, improving responsiveness and minimizing energy consumption.

Simultaneously, businesses are increasingly adopting personalized NLP models tailored to specific industries. For instance, healthcare providers are developing custom models for medical terminology and R&D, while financial institutions design models that align with risk management and investment strategies. This dual emphasis on model efficiency and domain-specific customization empowers industries to drive deeper digital transformation and innovation.