How Michael Feil Is Growing AI With Long-Context Language Models

Artificial Intelligence (AI) has touched and transformed almost every business, but widespread adoption has yet to be achieved fully, and many challenges remain. A significant one is in the highly specialized field of creating Long-context Language Models (LLMs).

These technologies make machines think, understand, and respond to large amounts of text like humans. The potential benefits of LLMs are far-reaching and include the ability to write software code, understand protein structures, and perform the roles of chatbots and AI assistants.

Long-context language models have transformed many vital industries—including tech, healthcare, customer service, marketing, legal, and banking—by streamlining operations, automating tedious tasks, scaling vast amounts of data, offering limitless customization, working in different languages, and generating new content based on a business's individual needs.

Yet, despite these game-changing benefits, issues still need to be resolved.

The Issue with Long-Context Models

Long-context language models often need to help maintain coherence and relevance over extended text sequences beyond their training length. This results in an inability to function performance, which can have severe consequences in situations that require both deep understanding and the generation of lengthy content.

Researchers and engineers have been working to overcome this challenge through innovative solutions.

An Innovator to Watch: Michael Feil

Michael Feil, an AI Engineer at Gradient.ai, is an exciting young innovator. With an extensive background in Machine Learning and AI, Feil has emerged as an expert in the inference of language models, the generative AI ecosystem in open source, and machine learning (ML). His journey began with an internship at Bosch Research in Chicago in 2019, where he worked on ML models, sensors, and IoT (Internet of Things) with a focus on anomaly detection for CNC systems. This internship resulted in a new patented sensor system and received rave reviews in several publications.

Feil then completed his post-graduate studies in Robotics and AI at TU Munich, after which he worked at Bosch and the Max Planck Institute for Intelligent Systems in Tübingen, Germany, which is the most cited lab for computer science in the country.

Feil's work at these institutions garnered him an excellent reputation in reinforcement learning, applied AI, and bringing AI to manufacturing.

An Industry Trailblazer

Feil was working on LLM for Code Generation at the time of the ChatGPT release in December 2022. The infrastructure for running these models was far less advanced than today's. Feil contributed to LLM inference projects like CTranslate2, where his work was featured as part of presenting Starcoder-1 by Hugging Face, a company that develops computation tools for building applications using machine learning, and ServiceNow, a software company that develops cloud computing platforms to help companies manage digital workflows. Starcoder-1 was the best-performing Open LLM at the time of its release.

Feil integrated Starcoder-1 into the novel inference engine "vLLM," becoming the first person outside UC Berkeley to contribute. Today, vLLM is widely used by companies like AWS, Databricks, Google Cloud, and Anyscale. As a result of this accomplishment, multiple people tried to hire Feil, but he ultimately signed as the first employee outside the United States for Gradient. This San Francisco-based AI tech company builds AI agents for enterprises.

Feil's trailblazing efforts continue. In 2023, Feil had the epiphany that most LLM projects were not ready for production, or worse, they had a terrible developer experience. While people were building inference engines for decoder-only LLMs, there was little traction in optimized deployments for encoder-only LLMs, which are the backbone of vector embeddings, reranking, and text classification models.

Recognizing this gap, Feil built Infinity, an LLM inference service that enables a high throughput deployment of encoder LLMs. HuggingFace and Qdrant, two major companies in the field of Generative AI, soon announced their open-source products in response, but Feil's project remained a standout in the field.

Feil didn't rest on his laurels. Instead, he continued demonstrating his pioneering efforts by building other popular infrastructures for Generative AI, like embed and hf-hub-ctranslate2.

How Feil is Solving the "Major Issue" of Long-Context Language Models

In his current role at Gradient.ai, Feil is tackling the challenge of keeping long pieces of text consistent and relevant. He led a project to create models that can handle vast amounts of text (up to 4 million words at a time), allowing the models to understand better and reason across long documents.

Feil also worked on a customized version of the Ring Attention technique, making it easier to train models on long text sequences. Ring Attention breaks down complex calculations into smaller parts that can be handled by multiple Graphics Processing Units (GPUs) in parallel. This approach reduces the memory needed for each GPU and speeds up the process.

This innovative method was developed through a collaborative effort and resulted in a model that gained significant attention in May 2024. At the release, it was the trending LLM on HuggingFace, ranking #2 on the trending leaderboard despite Meta having some releases and Apple releasing a new model at the beginning of May. It garnered over 500,000 Twitter views and was featured by major platforms like the Latent Space Podcast and VentureBeat.

Recognition and Impact

Michael Feil's contributions to the AI field have not gone unnoticed, with his particularly impactful work on Infinity. Infinity quickly garnered attention from major players in the AI industry and has been adopted by companies like SAP, Vast, Freshworks, and RunPod. Industry experts such as Chip Huyen have also featured it in top generative AI infrastructure lists.

Feil's work has had a far-reaching impact on the AI community. His contributions to long-context language models have opened up new possibilities for applications that require deep understanding and the generation of extensive text. These models are now being used in a wide range of industries, including customer service, content creation, and research.

In addition, Feil's open-source projects have democratized access to advanced AI technologies; by making these tools freely available, he has opened the door for a broader range of developers and organizations to benefit from the power of AI. This has led to increased innovation and new long-context language model application development.

Benefiting the Next Generation: A Commitment to Giving Back

Michael Feil's innovative approach to AI extends beyond his technical contributions. He is committed to mentoring and leading other professionals in the field and has spoken at numerous industry events and podcasts, sharing his insights and experiences with a broader audience. His appearance on the Run.ai Podcast and the NVIDIA x Munich NLP Meetup are notable examples of his efforts to inspire and educate others.

Feil's leadership skills are evident in his ability to collaborate with other experts and integrate their work into his projects. For example, his involvement in the Starcoder-1 project showcased his ability to work with leading researchers and engineers to develop state-of-the-art AI models. This collaborative spirit has been a hallmark of Feil's career, resulting in far more significant advancements in the AI industry than he could have accomplished alone.

Future Goals

In the future, Feil envisions open-source AI projects becoming an integral component of countless industries and a driver of innovation and transparency. However, he also recognizes the challenges of monetizing open-source projects. Feil plans to create a marketplace for open-source AI projects to collaborate with companies to combat this, potentially becoming the next central platform for AI development.

Feil's ongoing projects at Gradient.ai include raising the bar in long-context language models and developing new tools for AI inference. He is committed to pushing the boundaries of what is possible with AI by continuously exploring new techniques and technologies to enhance the performance and capabilities of language models.

Feil's Legacy on the Industry and World

Feil's impact on the AI community and beyond will undoubtedly grow as he continues to innovate and lead in the field. His dedication to improving AI technologies and making them accessible to a broader audience reflects his commitment to driving progress.

Through his efforts, Michael Feil is not only advancing AI's capabilities but also inspiring a new generation of researchers and engineers to explore the limitless possibilities of this unique technology.

AI Artificial Intelligence