C4 dataset<\/a>, a starting point for many LLMs, is 750 GB of text data. That’s 805,306,368,000 bytes – a lot of information. This data can include books, articles, websites, forums, comment sections, and other sources. <\/p>\nThe more varied and comprehensive the data, the better the model’s understanding and generalization capabilities.<\/p>\n
While the basic transformer architecture remains the foundation, LLMs have a significantly larger number of parameters. GPT-3, for example, has 175 billion parameters. In this case, parameters refer to the weights and biases in the neural network that are learned during the training process.<\/p>\n
In deep learning, a model is trained to make predictions by adjusting these parameters to reduce the difference between its predictions and the actual outcomes. <\/p>\n
The process of adjusting these parameters is called optimization, which uses algorithms like gradient descent.<\/p>\n <\/figure>\n\nWeights:<\/strong> These are values in the neural network that transform input data within the network’s layers. They are adjusted during training to optimize the model’s output. Each connection between neurons in adjacent layers has an associated weight.<\/li>\nBiases:<\/strong> These are also values in the neural network that are added to the output of a layer’s transformation. They provide an additional degree of freedom to the model, allowing it to fit the training data better. Each neuron in a layer has an associated bias.<\/li>\n<\/ul>\nThis scaling allows the model to store and process more intricate patterns and relationships in the data. <\/p>\n
The large number of parameters also means that the model requires significant computational power and memory for training and inference. This is why training such models is resource-intensive and typically uses specialized hardware like GPUs or TPUs.<\/p>\n
The model is trained to predict the next word in a sequence using powerful computational resources. It adjusts its internal parameters based on the errors it makes, continuously improving its predictions.<\/p>\n
Attention mechanisms like the ones we’ve discussed are pivotal for LLMs. They allow the model to focus on different parts of the input when generating output. <\/p>\n
By weighing the importance of different words in a context, attention mechanisms enable the model to generate coherent and contextually relevant text. Doing it at this massive scale enables the LLMs to work the way they do.<\/p>\n
How does a transformer predict text?<\/h2>\n Transformers predict text by processing input tokens through multiple layers, each equipped with attention mechanisms and feed-forward networks. <\/p>\n
After processing, the model produces a probability distribution over its vocabulary for the next word in the sequence. The word with the highest probability is typically selected as the prediction.<\/p>\n <\/figure>\n <\/figure>\nHow is a large language model built and trained?<\/h2>\n Building an LLM involves gathering data, cleaning it, training the model, fine-tuning the model, and vigorous, continuous testing.<\/p>\n
The model is initially trained on a vast corpus to predict the next word in a sequence. This phase allows the model to learn connections between words that pick up patterns in grammar, relationships that can represent facts about the world and connections that feel like logical reasoning. These connections also make it pick up biases present in the training data.<\/p>\n
After pre-training, the model is refined on a narrower dataset, often with human reviewers following guidelines. <\/p>\n
Fine-tuning is a crucial step in building LLMs. It involves training the pre-trained model on a more specific dataset or task. Let’s take ChatGPT as an example. <\/p>\n
If you’ve played with GPT models, you know that prompting is less “write this thing” and more like<\/p>\n
\nPrompt: <\/strong>Once upon a time\n\nContinuation: <\/strong>There was an evil wizard at the top of a tower.<\/li>\n<\/ul>\n<\/li>\nPrompt<\/strong>: Why did the chicken join a band?\n\nContinuation<\/strong>: Because it had the drumsticks!<\/li>\n<\/ul>\n<\/li>\n<\/ul>\nTo get to ChatGPT from that point involves a lot of low-paid labor. Those people create immense corpora to put a finger on the weight of GPT responses and expected behaviors. These workers create tons of prompt\/continuation texts that are like:<\/p>\n
\nPrompt<\/strong>: Finish this story: “Once upon a time..”\n\nContinuation<\/strong>: Sure! Once upon a time, in a land far, far away, there was a small village nestled between two majestic mountains. <\/li>\n<\/ul>\n<\/li>\nPrompt<\/strong>: Tell me a joke about a chicken.\n\nContinuation<\/strong>: Why did the chicken join a band? Because it had the drumsticks!<\/li>\n<\/ul>\n<\/li>\n<\/ul>\nThis fine-tuning process is essential for several reasons:<\/p>\n
\nSpecificity: <\/strong>While pre-training gives the model a broad understanding of language, fine-tuning narrows its knowledge and behavior to align more with specific tasks or domains. For instance, a model fine-tuned on medical data will better answer medical questions.<\/li>\nControl:<\/strong> Fine-tuning gives developers more control over the model’s outputs. Developers can use a curated dataset to guide the model to produce desired responses and avoid unwanted behaviors.<\/li>\nSafety: <\/strong>It helps in reducing harmful or biased outputs. By using guidelines during the fine-tuning process, human reviewers can ensure the model doesn’t produce inappropriate content.<\/li>\nPerformance:<\/strong> Fine-tuning can significantly improve the model’s performance on specific tasks. For example, a model that’s been fine-tuned for customer support will be much better at it than a generic model.<\/li>\n<\/ul>\nYou can tell ChatGPT has been fine-tuned in particular in some ways. <\/p>\n
For example, “logical reasoning” is something LLMs tend to struggle with. ChatGPT’s best logical reasoning model – GPT-4 – has been trained intensely to recognize patterns in numbers explicitly. <\/p>\n
Instead of something like this:<\/p>\n
\nPrompt<\/strong>: What’s 2+2?<\/li>\nProcess<\/strong>: Oftentimes in math textbooks for children 2+2 =4. Occasionally there are references to “2+2=5” but there is usually more context to do with George Orwell or Star Trek when that is the case. If this was in that context the weight would be more in favor of 2+2=5. But that context doesn’t exist, so in this instance the next token is likely 4.<\/li>\nResponse<\/strong>: 2+2=4<\/li>\n<\/ul>\nThe training does something like this:<\/p>\n
\ntraining: 2+2=4<\/li>\n training: 4\/2=2<\/li>\n training: half of 4 is 2<\/li>\n training: 2 of 2 is four<\/li>\n<\/ul>\n…and so on. <\/p>\n
This means for those more “logical” models, the training process is more rigorous and focused on ensuring that the model understands and correctly applies logical and mathematical principles. <\/p>\n
The model is exposed to various mathematical problems and their solutions, ensuring it can generalize and apply these principles to new, unseen problems.<\/p>\n
The importance of this fine-tuning process, especially for logical reasoning, cannot be overstated. Without it, the model might provide incorrect or nonsensical answers to straightforward logical or mathematical questions. <\/p>\n
Image models vs. language models<\/h2>\n While both image and language models might use similar architectures like transformers, the data they process is fundamentally different:<\/p>\n
Image models<\/h3>\n These models deal with pixels and often work in a hierarchical manner, analyzing small patterns (like edges) first, then combining them to recognize larger structures (like shapes), and so on until they understand the entire image.<\/p>\n
Language models<\/h3>\n These models process sequences of words or characters. They need to understand the context, grammar, and semantics to generate coherent and contextually relevant text.<\/p>\n
How prominent generative AI interfaces work<\/h2>\nDall-E + Midjourney<\/h3>\n Dall-E is a variant of the GPT-3 model adapted for image generation. It’s trained on a vast dataset of text-image pairs. Midjourney is another image generation software that is based on a proprietary model.<\/p>\n
\nInput:<\/strong> You provide a textual description, like “a two-headed flamingo.”<\/li>\nProcessing:<\/strong> These models encode this text into a series of numbers and then decode these vectors, finding relationships to pixels, to produce an image. The model has learned the relationships between textual descriptions and visual representations from its training data.<\/li>\nOutput: <\/strong>An image that matches or relates to the given description.<\/li>\n<\/ul>\nFingers, patterns, problems<\/strong><\/p>\nWhy can’t these tools consistently generate hands that look normal? These tools work by looking at pixels next to each other. <\/p>\n
You can see how this works when comparing earlier or more primitive generated images with more recent ones: earlier models look very fuzzy. In contrast, more recent models are a lot crisper. <\/p>\n
These models generate images by predicting the next pixel based on the pixels it has already generated. This process is repeated millions of times over to produce a complete image.<\/p>\n
Hands, especially fingers, are intricate and have a lot of details that need to be captured accurately. <\/p>\n
Each finger’s positioning, length, and orientation can vary greatly in different images. <\/p>\n
When generating an image from a textual description, the model has to make many assumptions about the exact pose and structure of the hand, which can lead to anomalies.<\/p>\n
ChatGPT<\/h3>\n ChatGPT is based on the GPT-3.5 architecture, a transformer-based model designed for natural language processing tasks.<\/p>\n
\nInput:<\/strong> A prompt or a series of messages to simulate a conversation.<\/li>\nProcessing:<\/strong> ChatGPT uses its vast knowledge from diverse internet texts to generate responses. It considers the context provided in the conversation and tries to produce the most relevant and coherent reply.<\/li>\nOutput:<\/strong> A text response that continues or answers the conversation.<\/li>\n<\/ul>\nSpecialty<\/strong><\/p>\nChatGPT’s strength lies in its ability to handle various topics and simulate human-like conversations, making it ideal for chatbots and virtual assistants.<\/p>\n
Bard + Search Generative Experience (SGE)<\/h3>\n While specific details might be proprietary, Bard is based on transformer AI techniques, similar to other state-of-the-art language models. SGE<\/a> is based on similar models but weaves in other ML algorithms Google uses. <\/p>\nSGE likely generates content using a transformer-based generative model and then fuzzy extracts answers from ranking pages in search. (This may not be true. Just a guess based on how it seems to work from playing with it. Please don’t sue me!)<\/p>\n
\nInput:<\/strong> A prompt\/command\/search<\/li>\nProcessing: <\/strong>Bard processes the input and works the way other LLMs do. SGE uses a similar architecture but adds a layer where it searches its internal knowledge (gained from training data) to generate a suitable response. It considers the prompt’s structure, context, and intent to produce relevant content.<\/li>\nOutput:<\/strong> Generated content that can be a story, answer, or any other type of text.<\/li>\n<\/ul>\nApplications of generative AI (and their controversies)<\/h2>\nArt and design<\/h3>\n Generative AI can now create artwork, music, and even product designs. This has opened up new avenues for creativity and innovation.<\/p>\n
Controversy<\/strong><\/p>\nThe rise of AI in art has sparked debates about job losses in creative fields. <\/p>\n
Additionally, there are concerns about:<\/p>\n
\nLabor violations, especially when AI-generated content is used without proper attribution or compensation.<\/li>\n Executives threatening writers with replacing them with AI is one of the issues that spurred the writers’ strike.<\/li>\n<\/ul>\nNatural language processing (NLP)<\/h3>\n AI models are now widely used for chatbots, language translation, and other NLP tasks. <\/p>\n
Outside the dream of artificial general intelligence (AGI), this is the best use for LLMs since they are close to a “generalist” NLP model. <\/p>\n
Controversy<\/strong><\/p>\nMany users find chatbots to be impersonal and sometimes annoying. <\/p>\n
Moreover, while AI has made significant strides in language translation, it often lacks the nuance and cultural understanding that human translators bring, leading to impressive and flawed translations.<\/p>\n
Medicine and drug discovery<\/h3>\n AI can quickly analyze vast amounts of medical data and generate potential drug compounds, speeding up the drug discovery process. Many doctors already use LLMs to write notes and patient communications<\/p>\n
Controversy<\/strong><\/p>\nRelying on LLMs for medical purposes can be problematic. Medicine requires precision, and any errors or oversights by AI can have serious consequences. <\/p>\n
Medicine also already has biases that only get more baked in using LLMs. There are also similar issues, as discussed below, with privacy, efficacy, and ethics.<\/p>\n
Gaming<\/h3>\n Many AI enthusiasts are excited about using AI in gaming: they say that AI can generate realistic gaming environments, characters, and even entire game plots, enhancing the gaming experience. NPC dialogue can be enhanced through using these tools. <\/p>\n
Controversy<\/strong><\/p>\nThere’s a debate about the intentionality in game design. <\/p>\n
While AI can generate vast amounts of content, some argue it lacks the deliberate design and narrative cohesion that human designers bring. <\/p>\n
Watchdogs 2 had programmatic NPCs, which did little to add to the narrative cohesion of the game as a whole. <\/p>\n
Marketing and advertising<\/h3>\n AI can analyze consumer behavior and generate personalized advertisements and promotional content, making marketing campaigns more effective. <\/p>\n
LLMs have context from other people’s writing, making them useful for generating user stories or more nuanced programmatic ideas. Instead of recommending TVs to someone who just bought a TV, LLMs can recommend accessories someone might want instead.<\/p>\n
Controversy<\/strong><\/p>\nThe use of AI in marketing raises privacy concerns. There’s also a debate about the ethical implications of using AI to influence consumer behavior.<\/p>\n
Dig deeper: <\/em><\/strong>How to scale the use of large language models in marketing<\/em><\/strong><\/a><\/p>\nContinuing issues with LLMS<\/h3>\n Contextual understanding and comprehension of human speech<\/strong><\/p>\n\nLimitation: <\/strong>AI models, including GPT, often struggle with nuanced human interactions, such as detecting sarcasm, humor, or lies.<\/li>\nExample:<\/strong> In stories where a character is lying to other characters, the AI might not always grasp the underlying deceit and might interpret statements at face value.<\/li>\n<\/ul>\nPattern matching<\/strong><\/p>\n\nLimitation:<\/strong> AI models, especially those like GPT, are fundamentally pattern matchers. They excel at recognizing and generating content based on patterns they’ve seen in their training data. However, their performance can degrade when faced with novel situations or deviations from established patterns.<\/li>\nExample: <\/strong>If a new slang term or cultural reference emerges after the model’s last training update, it might not recognize or understand it.<\/li>\n<\/ul>\nLack of common sense understanding<\/strong><\/p>\n\nLimitation: <\/strong>While AI models can store vast amounts of information, they often lack a “common sense” understanding of the world, leading to outputs that might be technically correct but contextually nonsensical.<\/li>\n<\/ul>\n