Every day, hundreds of LLMs are published online, being HuggingFace the biggest open repository (see HFCommunity for some stats). Most of them (probably around 99%) are not created from scratch, but from pre-existing LLMs that are somehow modified, improved or refined for specific domains or tasks. As an example, this LLM (hfl/llama-3-chinese-8b) is a fine-tuned version of the well-known Llama model (meta-llama/Meta-Llama-3-8B), where they took the LLM and trained it further with 120 GB of Chinese text corpora to improve its quality when understanding and generating Chinese text.

By the end of 2023, we started to see combinations of LLMs topping the leaderboards. Multiple approaches exist to combine LLMs in order to capture and combine their knowledge or skills, or simply create “smarter” LLMs, according to several benchmarks.

We created a no-code tool based on a Software Product Line (SPL) to define combinations of LLMs. Our tool comes with a configurable Feature Model, where you can define the characteristics you want for your combined ML model, and a generator able to take that configuration and create the combined model for you, e.g. by using a Mixture of Experts approach.

SPL for AI overview

Tool overview

This work has been accepted as a demo paper at the ACM SPLC 2024 conference (DOI link). Before the conference takes place, you can read the preprint version here.

Machine Learning Model Combination: State-of-the-art

There are multiple techniques for the combination of Machine Learning models. Here, we will mention some of the most popular alternatives for LLM merging (since the tool currently focuses on LLMs). The main advantage of model merging, compared to training from scratch, is the very scarce hardware requirements needed to do it.

Model Merging

Model merging consists of combining 2 or more pre-trained LLMs into a single unified one. This can be done in different ways, such as:

A quick search in HuggingFace reveals the high number of merged LLMs. The Open LLM Leaderboard can be explored to discover the best LLMs at the moment, some of which are merges (note that LLM leaderboards change really fast and the best LLM can be “defeated” in a matter of days).

task vectors

An illustration of task vectors and addition arithmetic operation. A task vector is obtained by subtracting the weights of a pre-trained model from the weights of the same model after fine-tuning. Adding task vectors improves the performance of the pre-trained model on the tasks under consideration.

Mixture of Experts

This methodology was presented in 1991 as a novel supervised learning procedure for systems composed of many independent neural networks (see the original paper here). These networks would be trained to solve specific tasks and then combined with a special gating network in charge of deciding, for a given input, which networks (i.e., experts) would generate the output. This Mixture of Experts (MoE) has recently gained a lot of new attention thanks to its application to Transformer models (the foundational architecture of most of the current state-of-the-art LLMs).

Some of the most popular MoE LLMs are those produced by MistralAI (e.g., Mixtral-8x7B). Note that MoEs like Mixtral are trained from scratch (and eventually, each component becomes an expert in a specific domain or task), which differs from MoEs created by combining a set of pre-trained LLMs (and these are the ones our tool allows us to define, which can be created with minimum hardware requirements!)

Using SPLs: A Model-Driven Approach

We propose a Software Product Line approach for the emerging field of combinations of LLMs. More specifically, we introduce a feature model to characterize the dimensions and variability aspects of this domain, together with a code generation approach able to transform a feature configuration into an actual merged model. We rely on Mergekit for the generation phase and provide our own tool support for the whole process.

Generative Software Development is crucial for outlining the properties, requirements, commonalities, and variabilities in system families. Nevertheless, we still lack modeling approaches, low-code tools, and domain-specific languages (DSLs) for developing intelligent systems in a reusable and maintainable way. The complexity of software development is increasing with more variables and smart components. Feature models, which capture all possible products in an SPL, can be used to define software families of intelligent systems, but using SPLs to characterize ML systems, like LLM compositions, remains underexplored.

A Feature Model proposal

A Feature Model for configuring combinations of machine learning models

A Feature Model for configuring combinations of machine learning models

The Feature Model (FM) aims to capture all the possible combinations of LLMs. As there is not a single feature modelling notation, to express our complex FM, we combine several syntaxes to express modularity and compositionality in FMs, feature attributes consisting of name, domain and value, and cardinality-based features.

We distinguish 2 types of compositions: Merges and MoEs, as each has specific configuration properties. And, indeed, the Feature Model is strongly influenced by Mergekit, since currently it is the one and only distinguished tool for model merging. That is not a determining factor, though, and this very first definition could evolve through time.

Our tool (available in GitHublets the user create their own feature configurations from the feature model.

User Interface of the tool for the Combination of ML through a SPL approach

User Interface of the tool for the configuration of the feature model

From a Feature Configuration to a Composite LLM: The Generation Step

After being evaluated, the configuration is passed as input to the Mergekit generator. The process consists of reading the feature configuration and use its content appropriately to generate and run the Mergekit scripts (i.e., no rocket science involved). To generate the new LLM, the base LLMs defined in the feature configuration are locally downloaded from HuggingFace.

The duration of the generation process will vary depending on the available resources, the merging technique being used and the LLM sizes. Once the process finishes, the generated LLM will be stored in a local directory, containing all the model weights and other configuration files. At this point, the LLM is ready to be used and, optionally, deployed to a cloud environment. Our tool comes with an implementation to automatically push the generated LLMs to HuggingFace and make them available to everyone.

Conclusions and Future Work

We presented a no-code tool for the definition of combinations of LLMs based on Software Product Lines, aiming at capturing similarities and variability among the possible combinations and generating all valid configurations based on a well-defined Feature Model. We covered Model Merging and Mixture of Experts methodologies for composite LLMs.

As any other kind of software, AI, ML and ultimately intelligent systems (often involving more variables and increasing the complexity of the development process), needs of methodologies to properly define the requirements of our systems, abstracting us from underlying processes to focus on what really matters.

As future work, we plan to improve the process of evaluating the feature configurations (we identified that some restrictions that are not embedded in the feature model lead to invalid configurations, due to the “freedom” our syntax gives with feature cardinality and attributes). Extensions to the code generator to target new libraries will also be implemented when needed.

Finally, we also plan to cover other domains such as Computer Vision composite models or other phases of an MLOps workflow, for instance, covering the composition of datasets for optimal model training, taking into account responsible AI such as the provenance and gathering process of the data.

Want to build better software faster?

Want to build better software faster?

Get the latest news in software modeling, model-based and low-code development

Thanks for your interest. Check your inbox and confirm your subscription!

Pin It on Pinterest

Share This