Multi-agent systems: orchestrators and sub-agents.
~60 min
✓
Hands-on: Build a simple research agent using Claude or LangChain.
~3 hrs
// Module 4 Quiz
🤖
AI Agents Quiz
1. What is the key characteristic that makes a language model an "agent"?
A. It has a personality and a name
B. It can generate images as well as text
C. It can take actions in the world (use tools, call APIs, browse the web) and reason about multi-step goals
D. It runs locally on your computer without internet
2. In the ReAct framework, what does the loop look like?
A. Reason about what to do → Take an action → Observe the result → Repeat until done
B. Receive input → Generate output → Done
C. Read documentation → Ask a human → Execute code
D. Retrieve data → Act immediately → Never reconsider
3. Why do agents need "long-term memory" beyond just the context window?
A. To make them smarter than base models
B. So they can run without an internet connection
C. Context windows are limited in size, so persistent storage (like vector databases) lets agents retain and recall information across many sessions
D. Long-term memory is just a marketing term with no real technical meaning
05
The Science
The Science of LLMs
Training, RLHF, fine-tuning, and how these models are actually built from the ground up.
⏱ ~8 hrsDAYS 26–30
▾
// Weekly Breakdown
Days 26–27
Pre-Training
How models learn from the internet
~3 hrs
Days 28–29
RLHF & Alignment
Making models helpful and safe
~3 hrs
Day 30
Fine-tuning & The Future
Customizing models for your use case
~2 hrs
// Learning Topics
✓
Pre-training: how models learn from massive text corpora via next-token prediction.
~90 min
✓
Supervised fine-tuning (SFT): adapting a base model to follow instructions.
~60 min
✓
RLHF — Reinforcement Learning from Human Feedback — how models learn to be helpful.
~90 min
✓
Constitutional AI & model alignment approaches (Anthropic's approach).
~60 min
✓
Fine-tuning vs. prompt engineering — when each approach makes sense.
~60 min
// Module 5 Quiz
⚗️
LLM Science Quiz
1. During pre-training, what task do LLMs primarily learn to perform?
A. Answering trivia questions correctly
B. Predicting the next token in a sequence, learning language patterns from enormous amounts of text
C. Translating between languages
D. Classifying text as positive or negative sentiment
2. What is RLHF and why is it important?
A. A faster training algorithm that reduces compute costs
B. A technique where human raters score model outputs to train a reward model, which is then used to fine-tune the LLM to produce more helpful, harmless responses
C. A way to compress large models for faster inference
D. A retrieval method for finding relevant documents
3. When does fine-tuning a model make more sense than prompt engineering?
A. Always — fine-tuned models are always better
B. When you want the model to answer questions about recent events
C. When you have a specific, consistent task with many labeled examples and need to reduce prompt length or improve performance at scale
D. Never — prompt engineering is always sufficient
🎓 30-Day Final Certification Test
Complete all 5 modules, then take the comprehensive 14-question exam covering every topic in the curriculum.
Final Certification Test
14 questions covering all 5 modules. You need 80% to pass.
1. Approximately how many tokens is 1,000 words of English text?
A. ~500 tokens
B. ~750 tokens
C. ~2,000 tokens
D. ~5,000 tokens
2. Embeddings allow models to understand that "king" and "queen" are related. This is because embeddings…
A. Store a dictionary definition of every word
B. Represent words as vectors in a high-dimensional space where similar meanings cluster together
C. Search Wikipedia for the word each time
D. Use keyword frequency analysis
3. What does "attention" do in a Transformer model?
A. It prevents the model from generating offensive content
B. It allows the model to weigh how much each word in the input should influence the interpretation of every other word
C. It selects which training data to focus on during pre-training
D. It compresses the prompt to save compute
4. You want to extract structured data (name, date, amount) from invoices reliably. The BEST prompting approach is…
A. Ask with no examples and hope for the best
B. Few-shot prompting with 2-3 examples of correctly extracted invoice data in the exact format you want
C. Set temperature to 1.0 for creative output
D. Use a very long, elaborate system prompt with no examples
5. Which of the following is the BEST prompt for solving a multi-step word problem?
A. "Answer: [problem]"
B. "Solve this step by step, showing your reasoning for each step before giving the final answer: [problem]"
C. "Give a one-word answer to: [problem]"
D. "You are a calculator. [problem]"
6. A model confidently tells you a specific study was published in Nature in 2019 by Dr. Jane Smith, but when you search, the study doesn't exist. This is an example of…
A. The model being out of date
B. A hallucination — the model generated plausible but entirely fabricated information
C. A context window limitation
D. Temperature being set too low
7. You want to build a chatbot that can answer questions about your company's internal 500-page knowledge base. The best architecture is…
A. Paste all 500 pages into every prompt
B. RAG — chunk the knowledge base, store it in a vector database, retrieve relevant sections per query, and pass those to the model
C. Fine-tune the model on the documents and use no retrieval
D. Set temperature to 0 and hope the model knows your documents
8. Which of the following is NOT typically a component of an AI agent?
A. A planning / reasoning loop
B. Tools the agent can invoke (search, code execution, APIs)
C. Memory to retain information across steps
D. A generative adversarial network (GAN) for image synthesis
9. In a multi-agent system, what is an "orchestrator" agent?
A. An agent that breaks down a high-level goal and delegates subtasks to specialized sub-agents
B. An agent that only handles database queries
C. The user-facing chatbot in a multi-agent system
D. An agent trained on music data
10. During RLHF, human raters are asked to…
A. Manually write all the training data for the model
B. Rank or compare model outputs for quality, which trains a reward model to guide further fine-tuning
C. Define the model architecture and hyperparameters
D. Test the model for security vulnerabilities
11. What is the primary objective during LLM pre-training?
A. Teaching the model to follow instructions from humans
B. Minimizing the prediction error for the next token across enormous amounts of text data
C. Fine-tuning the model for a specific business task
D. Maximizing model response length
12. You need a model that always responds in JSON with a very specific schema. It will be called millions of times. Fine-tuning is preferred over prompt engineering because…
A. Fine-tuning makes the model smarter overall
B. Fine-tuning can bake the output format into the model weights, reducing prompt length, cost, and improving consistency at scale
C. Prompt engineering doesn't work for JSON outputs
D. Fine-tuning is always cheaper than prompting
13. What is the best way to reduce a model's tendency to hallucinate when answering factual questions?
A. Lower the temperature to 0
B. Use RAG to provide the model with retrieved source documents, and instruct it to only answer based on those sources
C. Ask the model to be more confident
D. Use a larger model
14. Which combination of prompt elements tends to produce the most reliable, high-quality results for complex tasks?
A. A single sentence question with no context
B. A very long prompt with every possible instruction
C. A clear system prompt setting context + a specific task description + a few-shot example + chain-of-thought instruction
D. Repeating the question three times in the same prompt