Skip to main content

Command Palette

Search for a command to run...

How Large Language Models Learn, Part 2

Updated
17 min read
How Large Language Models Learn, Part 2
R

I'm technologist in love with almost all things tech from my daily job in the Cloud to my Master's in Cybersecurity and the journey all along.

In Part 1 of this series we explored the friendly foundations of machine learning:

  • Classification

  • Regression

  • Clustering

  • Neural networks

  • Training

  • Testing

  • Overfitting

  • Decision trees

  • Reinforcement learning

  • Data cleaning

  • Feature engineering

  • Model evaluation.

That first journey helped turn technical vocabulary into visual stories.

This Part 2 continues the same idea, but now the focus moves from “What is this machine learning concept?” to “How do we build machine learning systems that are useful, reliable, and responsible?” This is where the story becomes more practical.

A machine learning model is not only a clever algorithm. It is part of a larger process that includes

  1. data collection

  2. quality checks

  3. fairness

  4. evaluation

  5. deployment

  6. monitoring

  7. privacy

  8. human judgment.

Machine learning is a way for computers to learn patterns from data and use those patterns to make predictions, recommendations, or decisions.

  • In supervised learning, models learn from examples that include inputs and correct answers

  • In unsupervised learning, models look for structure without provided labels;

  • and in reinforcement learning, an agent learns by taking actions and receiving feedback or rewards.[1] [2]

1. The Machine Learning Pipeline: From Data to Prediction

A machine learning pipeline is the full sequence of steps used to build and use a model. It usually starts with

  1. collecting data, then

  2. cleaning that data

  3. choosing useful features

  4. training a model

  5. testing it

  6. evaluating it

  7. deploying it

  8. and monitoring it after launch.

In this Trufa-and-Paula version, Trufa shows Paula a friendly “learning factory” for a toy robot. Paula first collects toy cards, then cleans messy cards, chooses clues, trains the robot, checks its answers, and finally lets it help with real toys. The pipeline reminds us that machine learning is not a single magic button. It is a process.

The machine learning pipeline answers the question: “What steps turn raw data into a useful prediction?”

This topic is important because beginners often jump straight to the model. In real projects, the model is only one part of the system. If the data is poor, the features are confusing, the testing is weak, or the model is not monitored, the final prediction may not be reliable.

Pipeline stage What happens Trufa-and-Paula example
Collect Gather examples. Paula collects toy cards from different boxes.
Prepare Clean and organize the data. Trufa helps remove duplicates and fix missing clues.
Train Let the model learn patterns. The robot practices with example cards.
Test Check the model on new examples. Paula gives the robot surprise cards.
Deploy Use the model in the real world. The robot starts helping in Paula’s room.
Monitor Keep checking performance. Trufa gives the robot regular checkups.

2. Data Collection: Collecting Good Examples First

Data collection is the step where we gather the examples that a machine learning model will learn from. The model can only learn from what it is shown, so the examples should match the problem we want the model to solve. If the model will help identify toy problems, it needs examples of many toy situations:

  • ready toys

  • charging toys

  • broken toys

  • toys with lights

  • toys without lights

  • toys with wheels

  • toys without wheels.

Paula walks around with a basket collecting toy cards from different shelves. Trufa reminds her not to collect only one kind of toy. A useful collection should include variety, because the robot needs to learn from enough examples to understand the real task.

Data collection answers the question: “Do we have the right examples for the model to learn from?”

Good data collection is not just about quantity. More data can help, but only if the data is relevant and meaningful. A huge pile of repeated examples may be less useful than a smaller but more balanced set of examples that covers the real situations the model will face.

3. Data Quality: Good Data vs. Bad Data

Data problem What it means Toy-card example
Missing data A clue is absent. The battery level box is blank.
Duplicate data The same example appears too many times. Paula copied one toy card five times.
Wrong label The answer is incorrect. A broken toy is labeled “ready.”
Inconsistent format Similar data is written in different ways. “Low,” “low battery,” and “needs charge” mean the same thing.
Outdated data Old examples no longer match reality. New toys use different batteries.

4. Bias and Fairness: Teaching Models with Many Kinds of Examples

Bias in machine learning can happen when the data, design, or use of a model leads to unfair or unbalanced results. Fairness means thinking carefully about whether a system works well for different people, groups, or situations, especially when decisions can affect real lives.

In this Trufa-and-Paula story, Paula teaches the robot using only blue toys. The robot becomes very good at blue toys, but it gets confused when it sees red, green, or yellow toys. Trufa explains that the robot did not become unfair on purpose. It simply learned from an incomplete set of examples.

Bias and fairness answer the question: “Did we teach the model with enough variety and care?”

This is a gentle way to introduce a serious topic. In real machine learning systems, biased data can create biased predictions. Responsible teams need to examine what data was used, who may be affected, and whether the model behaves differently across important groups or situations.

5. Explainability: Can We Understand Why the Model Chose That?

Explainability is the ability to understand, describe, or inspect why a machine learning model made a decision. Some models are easier to explain than others. A decision tree may show a clear path of yes/no questions, while a large neural network may be harder to interpret.

In the infographic, Paula asks the robot, “Why did you choose the repair tool?” Trufa helps the robot point to clues: the toy did not light up, the battery was full, and a wheel was loose. Paula can then understand the decision instead of simply accepting it.

Explainability answers the question: “Can we see the reasons behind the prediction?”

Explainability matters because people need confidence in systems that make recommendations or decisions. If a model makes a mistake, explanations can help us diagnose what went wrong. If a model is used in an important setting, explanations can support review, accountability, and trust.

6. Accuracy vs. Real-World Usefulness: A High Score Is Not Always Enough

Accuracy is a common evaluation metric that measures how often a model predicts correctly. However, a model can have a high score and still be unhelpful if the score does not match the real-world goal.

In our Trufa-and-Paula example, the robot gets many easy toy cards correct, so its score looks high. But when Paula asks it to help with the tricky toys she actually cares about, the robot struggles. Trufa explains that the model’s score is only useful if it measures the right kind of success.

Accuracy versus usefulness answers the question: “Is the model good at the problem that actually matters?”

This is one of the most practical ideas in machine learning. A model should be judged against the real purpose of the system. If the model is supposed to find broken toys, then missing broken toys may be more serious than making a few false alarms. The best metric depends on the goal.

7. Precision and Recall: False Alarms and Missed Problems

Precision and recall are two evaluation metrics that help us understand different kinds of classification performance. Precision asks, “When the model says something is positive, how often is it correct?” Recall asks, “Of all the real positive cases, how many did the model find?”

In this toy-robot example, Paula wants the robot to identify toys that need repair. High precision means that when the robot says “needs repair,” it is usually right. High recall means that the robot finds most of the toys that really need repair, even if it sometimes raises a false alarm.

Precision and recall answer the question: “Are we more worried about false alarms or missed problems?”

Both metrics matter, but they matter differently depending on the situation. If repairs are expensive, Paula may want high precision so the robot does not send too many healthy toys to the repair table. If missing a broken toy is a bigger problem, she may want high recall so fewer broken toys are overlooked.

Metric Simple question Toy example
Precision When the robot says “repair,” how often is it right? Avoid sending working toys to repair.
Recall Of all toys that need repair, how many did the robot find? Avoid missing broken toys.
Accuracy How often was the robot correct overall? Count all correct predictions.

8. Confusion Matrix: A Map of Model Mistakes

A confusion matrix is a table that compares a model’s predicted labels with the correct labels. It helps show not only how many predictions were right or wrong, but also which categories the model confused with each other.

In the infographic, Paula creates a colorful grid with labels such as “ready,” “charging,” and “needs repair.” Trufa helps her fill in the grid. When the robot correctly predicts “ready,” Paula adds a green check in the right square. When the robot confuses “charging” with “needs repair,” Paula marks the mistake in the grid.

A confusion matrix answers the question: “Where exactly is the model getting confused?”

This is more informative than a single score. A model might have decent accuracy but still make one type of mistake too often. The confusion matrix turns those mistakes into a visible map, making it easier to improve the model.

9. Model Drift: When the World Changes After Training

Model drift happens when a model’s performance changes over time because the real-world data changes. A model trained on old patterns may become less accurate if new situations appear after training.

In Paula’s world, the robot learned from older toys. Later, new toys arrive with different batteries, new buttons, and new charging lights. The robot still follows its old lessons, but those lessons no longer match every toy. Trufa explains that the model may need new examples and retraining.

Model drift answers the question: “Has the world changed since the model learned?”

This is why machine learning does not end at deployment. A model that works well today may need monitoring and updates tomorrow. New products, new user behavior, new data sources, and changing environments can all affect performance.

10. Human-in-the-Loop AI: Why People Still Matter

Human-in-the-loop AI means keeping people involved in reviewing, guiding, correcting, or approving model decisions. This is especially important when the decision is uncertain, sensitive, expensive, or potentially harmful.

In the infographic, Paula lets the robot handle easy toy decisions, but tricky cases go to Trufa for review. If the robot is unsure whether a toy needs repair or just charging, Trufa checks the clues before Paula acts. The human helper improves safety and learning.

Human-in-the-loop AI answers the question: “When should a person review the model’s decision?”

This idea helps beginners understand that AI systems do not have to replace human judgment. In many good systems, models assist people. Humans provide context, responsibility, and common sense, while the model provides speed and pattern recognition.

11. Responsible AI: Building Models We Can Trust

Responsible AI is the practice of designing, building, and using AI systems with attention to safety, fairness, privacy, transparency, accountability, and human oversight. The National Institute of Standards and Technology describes trustworthy AI in terms such as valid and reliable, safe, secure, accountable, transparent, explainable, privacy-enhanced, and fair.

In this Trufa-and-Paula version, Trufa gives Paula a “safe AI checklist” before the robot is allowed to help. Did Paula collect good data? Did she check for unfair gaps? Did she test the robot? Can someone understand its decisions? Is private information protected? Is a human available for tricky cases?

Responsible AI answers the question: “Can we trust how this model was built and used?”

Responsible AI is not one single step. It is a mindset across the whole pipeline. It affects what data we collect, how we test, what we measure, who reviews the system, and how we respond when something goes wrong.

12. Deploying a Model: From Practice Table to Real Life

Deployment is the step where a trained model moves from development or practice into real use. A deployed model might run inside an app, a website, a device, a cloud service, or an internal business workflow.

In the infographic, the toy robot leaves the practice mat and starts helping Paula sort toys in her room. Trufa explains that this is a big step. During practice, mistakes were easy to fix. In real use, the robot needs clear instructions, monitoring, and a way to handle uncertainty.

Deployment answers the question: “How do we safely use the model outside the practice space?”

Deployment is where machine learning becomes part of a product or process. That means the model must work with real inputs, real users, real constraints, and real consequences. Good deployment planning includes performance, security, reliability, monitoring, and rollback options if something breaks.

13. Monitoring and Improving Models: Why Models Need Checkups

Model monitoring means watching a model after deployment to see whether it continues to perform well. Model improvement means updating data, features, thresholds, or training when performance slips or the task changes.

In this Trufa-and-Paula story, Trufa gives the robot regular checkups. Paula tracks how many predictions are correct, where mistakes happen, and whether new toys are confusing the robot. When the robot starts slipping, Trufa helps Paula add new examples and retrain.

Monitoring answers the question: “Is the model still working well after launch?”

This topic connects directly to model drift. A model that is never checked can quietly become less useful. Monitoring gives teams early warning signs so they can improve the system before mistakes become serious.

What to monitor Why it matters Toy-robot example
Prediction quality Checks whether answers are still correct. The robot starts confusing charging and repair.
Data changes Shows whether inputs look different over time. New toys have new battery indicators.
User feedback Captures real-world corrections. Paula marks a robot answer as wrong.
Error patterns Reveals repeated mistakes. The robot misses toys with loose wheels.

14. Privacy in Machine Learning: Using Data Carefully

Privacy in machine learning means using data in ways that respect people, reduce unnecessary exposure, and protect sensitive information. Even when data is useful for learning, teams should think carefully about what they collect, how long they keep it, who can access it, and whether personal information is really needed.

In the infographic, Paula writes toy notes on cards, but Trufa reminds her not to include personal secrets or unnecessary details. The robot only needs toy clues, not private information. This keeps the learning task focused and safer.

Privacy answers the question: “Are we using only the data we need, and are we protecting it properly?”

For beginners, the key idea is simple: useful data should still be handled respectfully. Machine learning teams should avoid collecting extra sensitive information “just in case.” They should also protect stored data and think about privacy from the beginning of the project, not only at the end.

15. Generative AI vs. Predictive AI: Creating vs. Predicting

Predictive AI uses patterns in data to make predictions, classifications, recommendations, or estimates. Generative AI creates new content, such as text, images, audio, code, or designs, based on patterns learned from training data.

In the Trufa-and-Paula infographic, one robot predicts whether a toy needs charging, while another robot draws a new toy design. Trufa explains that both use learned patterns, but they do different jobs. One is mainly answering a question about existing information. The other is generating something new.

Generative AI versus predictive AI answers the question: “Is the system predicting an answer or creating new content?”

This distinction is useful because many people now hear “AI” and immediately think of chatbots or image generators. Those are important, but they are not the whole story. Many machine learning systems still focus on predictions: detecting fraud, recommending products, forecasting demand, sorting messages, or estimating risk.

AI type What it does Simple example
Predictive AI Predicts, classifies, scores, or recommends. The robot predicts whether a toy needs repair.
Generative AI Creates new content. The robot draws a new toy robot design.
Both Learn from patterns in data. Trufa shows Paula that both systems need examples.

Conclusion

The big idea is that machine learning is not only about making a model. It is about building a system that learns from the right data, answers the right question, performs well on new examples, stays useful over time, and is used responsibly.

Part 1 introduced the concepts that help us understand machine learning models. This Part 2 shows that machine learning becomes more meaningful when we look beyond the model itself. Trufa and Paula already learned that models can classify, predict numbers, find groups, learn through layers, and improve through rewards.

Learning stage Part 1 focus Part 2 focus
Understand the basics Classification, regression, clustering, neural networks Predictive AI vs. generative AI
Prepare the data Features, labels, data cleaning, feature engineering Data collection, data quality, privacy
Train and test Training data, testing data, overfitting, underfitting Pipeline thinking, accuracy versus usefulness
Measure performance Model evaluation Precision, recall, confusion matrix
Use responsibly Basic model behavior Bias, fairness, explainability, responsible AI
Run in the real world How models learn Deployment, monitoring, drift, human review

Data collection gives the model examples. Data quality helps those examples teach the right lesson. Bias and fairness remind us to check who and what the model may affect. Explainability helps people understand decisions. Precision, recall, and confusion matrices show different views of performance. Drift, monitoring, and deployment show that the work continues after launch. Privacy and responsible AI remind us that useful systems should also be careful systems.

With Trufa as the guide and Paula as the curious learner, machine learning becomes a story about practice, questions, feedback, responsibility, and improvement. That is a strong foundation for anyone beginning their AI journey.

References

[1] DigitalOcean: Types of Machine Learning: Supervised, Unsupervised and More

[2] Google for Developers: Machine Learning Glossary

[3] NIST: Artificial Intelligence Risk Management Framework