Are Large Language Models Too Dominant?

In 2024, it’s clear that Large Language Models (LLMs) have revolutionized and democratized the field of artificial intelligence (AI). They clearly surpass previous systems in terms of performance, versatility and ease of use. They are able to offer functionalities such as text generation (summarization, translation, computer code), information extraction from heterogeneous content, image analysis and relatively complex problem solving. Their capabilities are based on very large-scale neural architectures of the Transformer type, combined with massive training corpora. LLMs are currently the main focus of AI research, and a number of new technological offerings have emerged, competing fiercely to offer the best model: OpenAI’s GPT-4, Meta’s Llama3, Anthropic’s Claude, Google’s Gemini, and so on.

However, it’s legitimate to ask a few questions about the apparent technological dominance of LLMs. Are they truly intelligent tools, or simply powerful calculators based on statistical correlations? Will they replace all other AI techniques, or does their success mask limitations and the need for alternatives or complements? This article takes a critical look at LLMs: their strengths, weaknesses, possible alternatives and future developments.

The LLM revolution

LLMs are based on the Transformer architecture introduced in 2017, which marked a real breakthrough in natural language processing (NLP). Unlike previous models (RNN, LSTM), Transformers exploit attention mechanisms, making it possible to contextualize every word in a sentence, whatever its position.

Since then, successive iterations, such as BERT in 2018, GPT-3 in 2020 and GPT-4 in 2023, have multiplied capabilities. These models, with billions of parameters, are trained on massive corpora, covering an impressive diversity of subjects and domains.

An LLM is capable of understanding and generating text in several languages, solving complex mathematical problems, analyzing data or even explaining scientific concepts in detail. As a result, LLMs can be adapted to a wide range of tasks and use cases: extracting and synthesizing information to assist in the use of techniques or machines, generating and writing content in different languages and in any desired form (editorial article, school presentation, comparison…), automatically generating code in any programming language, etc.

Multimodal models, such as Google’s Gemini, combine several types of data: text, image and audio. A multimodal LLM can interpret an image, explain its content and then answer related questions. This enables innovative applications in medicine (analysis of medical images), artistic creation (automated description of paintings) or product recognition in e-commerce.

LLMs are very easy to use, as all you have to do is describe your request in a prompt, attaching any associated documents (images, etc.). There’s no need for advanced knowledge of AI or data science, as was the case with previous approaches: training a Machine Learning model, creating decision rules, and so on.

These LLMs are made available either via a simple man-machine interface (OpenAI), or by API, and are either paid for or open-source, depending on the publisher and its business model. For the service provider, LLMs require very substantial storage and processing infrastructures, which are a priori expensive.

In terms of performance, LLMs clearly outperform previous technologies when compared through standardized evaluations such as :

SuperGLUE (General Language Understanding Evaluation): evaluation of general language comprehension
Multi-Task Language Understanding (MMLU): multiple language comprehension tasks
HumanEval: code and programming tasks

It even appears that for certain tasks, an LLM can approach or even exceed human intelligence. For example, it has been shown that LLMs can achieve very good results on various university-level academic tests (American SAT exams, bar exams, medical exams, etc.).

LLM limitations and alternatives

Despite their impressive capabilities, we feel it’s important to remember that LLMs suffer from significant limitations, which may restrict their use in certain contexts.

Truthfulness and completeness of information

LLMs frequently hallucinate, which means they generate false or invented answers. For example, an LLM might confidently assert that a famous personality has received an award he or she never won, simply because this idea is statistically plausible according to his or her knowledge (based on the training corpus).

Technological alternatives that don’t hallucinate :

Semantic models: a knowledge graph structures data by linking it to reliable sources, guaranteeing factual verification
Rule-based systems: in fields with strict frameworks (finance, law), these systems offer guaranteed accuracy thanks to explicit rules

Lack of coherence and explicability

LLMs are context-sensitive. For example, changing a name or word in a question can result in a different or even inconsistent answer. What’s more, their operation remains a black box for the user: it’s difficult to explain why the model produced this answer or another. Finally, because of the statistical nature of the answer, asking the same question several times generates answers that are different in form and content every time!

Alternatives that can be explained and offer reliable relevance :

Specialized encoders (BERT, RoBERTa): these models offer more stable and traceable results
Symbolic systems: the use of formal logics enables the production of explicable and comprehensible reasoning

Reasoning limits and “non-intelligence”

Rest assured, contrary to their name, LLMs do not possess intelligence in the human sense of the word. They simply manipulate statistical correlations, without any real understanding. For example, a model might fail to solve a basic mathematical problem if it hasn’t already seen a similar solution during its training.

Example: If an LLM is asked to solve a complex equation after replacing the variables with random names, it is likely to err by generating a solution based on false assumptions.

Additions for greater efficiency :

Symbolic systems: these systems, combined with LLMs, can offer a more robust logical reasoning capability
Hybrid models: by integrating explicit databases and reasoning engine

Bias and ethical issues

LLMs’ training data often contain biases, which can amplify stereotypes in their responses. For example, a model may associate certain jobs with a particular gender, or give inappropriate answers in sensitive contexts (gender, ethnicity, religion etc.). It should also be noted that the training corpus is sometimes compiled without the consent of the authors, which raises questions not only about copyright, but also about the interpretation and contextualization of statements made during the writing process. In addition, the use of LLMs via cloud APIs raises the issue of confidentiality of personal data supplied to the model: in some cases, this data may be used to improve the model or future versions of the model.

Worse still, the phenomenon of hallucination in responses can appear as a risk for the propagation of fake news. Despite numerous safeguards during training, there is always a risk of generating an inappropriate and/or dangerous response if the end-user is not sufficiently vigilant.

Proposals to reduce these risks :

Models trained on specific corpora: limiting training to selected data reduces bias
Ethical ontologies: impose explicit constraints on responses, based on pre-established rules

High costs and energy efficiency

The cost of training and operating LLMs is colossal. Training GPT-4, for example, required an investment of $80 million. The infrastructure is pushed to its limits, with dedicated computers (IA GPUs with large memory and processing capacity) consuming a lot of energy. A simple query can consume up to a thousand times more energy than a search engine query. Query execution times are also significantly higher than for previous technologies.

Alternatives :

Lighter-weight models: DistilBERT and TinyBERT offer similar performance for specific tasks with a reduced energy footprint
Traditional approaches: simple algorithms are sometimes sufficient for tasks such as keyword searches

Complementarity and the future of LLMs

LLMs represent a spectacular breakthrough in AI, but their long-term future depends on their ability to complement other approaches. Rather than dominating alone, these technologies should contribute to creating an ecosystem of specialized artificial intelligences. For example :

Alternative approaches: consider alternative approaches (e.g. training transformer encoder models) to LLMs when they are more efficient
Retrieval-Augmented Generation (RAG): search for documents in a knowledge base to ensure the quality of the LLM’s response.
Knowledge graphs + LLMs: ensure the veracity of information generated by LLMs based on structured data
Symbolic systems + machine learning: using logical rules to increase the reliability and explicability of models

LLMs are constantly evolving to overcome some of their limitations. Here are a few current research areas :

Improving performance: by increasing the size of the model or corpus, refining the internal architecture or training techniques
Smaller models: less expensive and locally usable
Expert models: specialists in a particular field or task (biology, mathematics, etc.)
Fusion with explanatory systems: integrate logical reasoning modules and symbolic algorithms
Dynamic and contextual memory: enabling models to adapt in real time to changing contexts
Towards general artificial intelligence (GAI): current attempts are aimed at unifying several modalities and task types in a single intelligent entity

The Fluree approach

At Fluree, we offer a comprehensive platform for semantic terminology/knowledge graph management (ITM, FlureeDB) and structured and unstructured information extraction (Sense, CAM). The strength of our solution lies in the complementary nature of the AI tools we implement in our processing methods.

We combine different approaches where they make the most sense, to optimize the quality, cost and explicability of these processes, and to adapt to the specific needs of our customers. A single process can thus combine rule-based detection steps, linguistic or trained ML models for classification and information extraction, LLMs and semantic repositories and knowledge graphs.