NVIDIA Launches NIM Microservices for Generative AI in Japan, Taiwan

admin · 2024-09-18T21:10:58+0100

Nations worldwide are pursuing sovereign AI to produce artificial intelligence using their computing infrastructure, data, workforce, and business networks to ensure AI systems align with local values, laws, and interests.

In support of these efforts, NVIDIA announced the availability of four new NVIDIA NIM microservices, enabling developers to build and deploy high-performing generative AI applications more efficiently.

The microservices support popular community models tailored to meet regional needs. They enhance user interactions through accurate understanding and improved responses based on local languages and cultural heritage.

According to ABI Research, generative AI software revenue in the Asia-Pacific region alone is expected to reach $48 billion by 2030, up from $5 billion this year.

Llama-3-Swallow-70B, trained on Japanese data, and Llama-3-Taiwan-70B, trained on Mandarin data, are regional language models that provide a deeper understanding of local laws, regulations, and other customs.

The RakutenAI 7B models, built on Mistral-7B, were trained on English and Japanese datasets and are available as two different NIM microservices for Chat and Instruct. Rakuten’s foundation and instruct models have achieved leading scores among open Japanese large language models, landing the average score in the LM Evaluation Harnescarried out from January to March 2024.

Training a large language model (LLM) on regional languages enhances the effectiveness of its outputs by ensuring more accurate and nuanced communication, as it better understands and reflects cultural and linguistic subtleties.

The models offer leading performance for understanding Japanese and Mandarin languages, regional legal tasks, question-answering, and language translation and summarization compared with base LLMs like Llama 3.

Nations worldwide — from Singapore, the United Arab Emirates, South Korea, and Sweden to France, Italy, and India — are investing in sovereign AI infrastructure.

The new NIM microservices allow businesses, government agencies, and universities to host native LLMs in their environments, enabling developers to build advanced copilots, chatbots, and AI assistants.

Developing Applications With Sovereign AI NIM Microservices

Developers can quickly deploy sovereign AI models, packaged as NIM microservices, into production while achieving improved performance.
The microservices, available with NVIDIA AI Enterprise, are optimized for inference with the NVIDIA TensorRT-LLM open-source library.

NIM microservices for Llama 3 70B, the base model for the new Llama–3-Swallow-70B and Llama-3-Taiwan-70B NIM microservices, can provide up to 5x higher throughput. This lowers the total cost of running the models in production and provides better user experiences by decreasing latency.

The new NIM microservices are available today as hosted application programming interfaces (APIs).

Tapping NVIDIA NIM for Faster, More Accurate Generative AI Outcomes

The NIM microservices accelerate deployments, enhance overall performance, and provide security for organizations across global industries, including healthcare, finance, manufacturing, education, and legal.

The Tokyo Institute of Technology fine-tuned Llama-3-Swallow 70B using Japanese-language data.

“LLMs are not mechanical tools that provide the same benefit for everyone. They are rather intellectual tools that interact with human culture and creativity.

The influence is mutual where not only are the models affected by the data we train on, but also our culture and the data we generate will be influenced by LLMs,” said Rio Yokota, professor at the Global Scientific Information and Computing Center at the Tokyo Institute of Technology. “Therefore, developing sovereign AI models that adhere to our cultural norms is paramount.

The availability of Llama-3-Swallow as an NVIDIA NIM microservice will allow developers to easily access and deploy the model for Japanese applications across various industries.”

For instance, a Japanese AI company, Preferred Networks, uses the model to develop a healthcare-specific model trained on a unique corpus of Japanese medical data called Llama3-Preferred-MedSwallow-70B, which tops scores on the Japan National Examination for Physicians.

Chang Gung Memorial Hospital (CGMH), one of the leading hospitals in Taiwan, is building a custom-made AI Inference Service (AIIS) to centralize all LLM applications within the hospital system. Using Llama 3-Taiwan 70B improves the efficiency of frontline medical staff with more nuanced medical language that patients can understand.

“By providing instant, context-appropriate guidance, AI applications built with local-language LLMs streamline workflows and serve as a continuous learning tool to support staff development and improve the quality of patient care,” said Dr. Changfu Kuo, director of the Center for Artificial Intelligence in Medicine at CGMH, Linko Branch.

“NVIDIA NIM is simplifying the development of these applications, allowing for easy access and deployment of models trained on regional languages with minimal engineering expertise.”

Taiwan-based Pegatron, a maker of electronic devices, will adopt the Llama 3-Taiwan 70B NIM microservice for internal- and external-facing applications. It has integrated it with its PEGAAi Agentic AI System to automate processes, boosting efficiency in manufacturing and operations.

Llama-3-Taiwan 70B NIM is also being used by global petrochemical manufacturer Chang Chun Group, world-leading printed circuit board company Unimicron, technology-focused media company TechOrange, online contract service company LegalSign.ai and generative AI startup APMIC. These companies are also collaborating on the open model.

Creating Custom Enterprise Models With NVIDIA AI Foundry

While regional AI models can provide culturally nuanced and localized responses, enterprises must fine-tune them for their business processes and domain expertise.

NVIDIA AI Foundry is a platform and service that includes popular foundation models, NVIDIA NeMo for fine-tuning, and dedicated capacity on NVIDIA DGX Cloud. It provides developers a full-stack solution for creating a customized foundation model packaged as a NIM microservice.

Additionally, developers using NVIDIA AI Foundry have access to the NVIDIA AI Enterprise software platform, which provides security, stability, and support for production deployments.

NVIDIA AI Foundry gives developers the tools to build and deploy their custom, regional language NIM microservices more quickly and efficiently to power AI applications, ensuring culturally and linguistically appropriate user results.