LEVEL 02 PRACTITIONER

2. Why should you store your API key in a .env file instead of directly in your code?

A. .env files load faster than hardcoded values

B. So the key isn't accidentally committed to GitHub or shared with others — it stays secret on your machine

C. API keys don't work when hardcoded

D. It's a Python requirement for all variables

3. You have a list reviews = ["great", "okay", "bad"] and you try reviews[10]. What happens?

A. Python returns None

B. Python raises IndexError

C. Python loops to the beginning

D. Python extends the list automatically

4. You want to extract only items from a list where a condition is true. What Python pattern is best?

A. A for loop with if statements

B. A list comprehension with filtering

C. Using map()

D. Using filter() with lambda

5. Your API call might fail due to temporary network issues. How should you handle this?

A. Ignore the error and hope it doesn't happen

B. Use try/except to catch the error and retry with exponential backoff

C. Write the error to a log file

D. Restart the entire application

6. You install a package with pip install anthropic. Which statement is true?

A. It installs globally and affects all Python projects

B. It installs only in your current virtual environment

C. It requires admin privileges

D. It modifies your system PATH

7. You have a string prompt = " hello world " and you want to remove the extra spaces. What method do you use?

A. prompt.trim()

B. prompt.strip()

C. prompt.replace(" ", "")

D. prompt.clean()

8. You want to build a prompt dynamically with variables. Which approach is most Pythonic?

A. "prompt: " + system + ", user: " + user

B. f"prompt: {system}, user: {user}"

C. {}.format("prompt: ", system, user)

D. "prompt: " % system

02

Python + AI APIs

Calling LLM APIs Like a Pro

Claude API, OpenAI API, streaming responses, structured outputs, managing conversation history in code.

⏱ ~8 hrs DAYS 8–14

▾

// Learning Topics

✓

API Authentication & the Anthropic SDK

→ Read

✓

Multi-Provider: OpenAI SDK & Wrappers

→ Read

✓

Streaming Responses

→ Read

✓

Cost Management & Token Counting

→ Read

✓

Error Handling & Resilience

→ Read

Checkpoint Quiz

1. Where should you store your API key for security?

A. Hardcoded in your Python file

B. In a .env file and loaded via environment variables

C. In a README file for easy sharing

D. In a public GitHub repository

2. You call the Anthropic API and receive a response. Where is the generated text?

A. message.text

B. message.content[0].text

C. message.response

D. message.output

3. You need to process 10,000 documents quickly and cheaply. Which model choice makes sense?

A. Always use Sonnet for best quality

B. Always use Haiku for lowest cost

C. Use Opus regardless of cost

D. Rotate between models randomly

4. You want to build an application that can switch between Claude and GPT-4. What pattern is best?

A. Hardcode which provider to use

B. Create a wrapper function that routes based on a provider parameter

C. Use environment variables for each provider separately

D. Rewrite the entire app for each API

5. An API call fails with a 429 status (rate limit). What should you do?

A. Immediately give up and report failure

B. Wait with exponential backoff and retry

C. Switch to a different API immediately

D. Increase your request rate

6. You want real-time response display in your application. What technique should you use?

A. Make a regular API call and display once it completes

B. Stream the response to display tokens as they arrive

C. Use webhooks for delayed updates

D. Split the response into chunks manually

7. You're estimating the cost of processing 5,000 documents with Sonnet ($0.003 per 1K input). Average 100 tokens per doc. What's the estimated cost?

A. $0.015

B. $1.50

C. $0.15

D. $15.00

8. Your system calls both Claude and GPT-4. Claude fails but GPT-4 succeeds. This is an example of:

A. Redundancy

B. Graceful degradation and failover

C. Load balancing

D. Caching

03

RAG Foundations

RAG Systems — Foundations

Embeddings, vector databases, retrieval, and generation. How to make LLMs know your data.

⏱ ~10 hrs DAYS 15–21

▾

// Learning Topics

✓

The Knowledge Gap Problem & RAG Architecture

→ Read

✓

Embeddings in Practice

→ Read

✓

Vector Databases: Options & Setup

→ Read

✓

The Chunking Problem

→ Read

✓

Build Your First RAG Pipeline

→ Read

Checkpoint Quiz

1. Why is RAG better than fine-tuning for adding new knowledge to an LLM?

A. Fine-tuning is always slower

B. RAG is dynamic (add documents without retraining), fine-tuning requires retraining and is expensive

C. RAG is cheaper

D. Fine-tuning doesn't work

2. You index 1,000 document chunks. A user asks a question. How many chunks should you retrieve to pass to the model?

A. All 1,000 (provide complete context)

B. Top-5 (balance relevance and context)

C. Just 1 (the most similar)

D. Top-100 (be comprehensive)

3. Your chunk size is 2,000 tokens. Retrieval quality is poor. What's the likely cause?

A. Chunks are too large and mix multiple topics, reducing retrieval precision

B. Chunks are too small

C. Your embedding model is bad

D. You have too many chunks

4. Cosine similarity between two embeddings is 0.95. What does this mean?

A. The texts are completely unrelated

B. The texts are very semantically similar

C. The texts are 95% identical

D. The embeddings are corrupted

5. You're building a RAG system for 10 million customer support documents. Which vector database is appropriate?

A. Chroma (local, in-memory)

B. pgvector in Postgres or cloud vector DBs like Pinecone

C. A simple Python list

D. Excel spreadsheet

6. You retrieve 5 chunks but the model's answer is wrong. What's the likely failure mode?

A. Retrieval failed (wrong chunks retrieved)

B. Generation failed (model ignored the chunks and hallucinated)

C. The model is broken

D. The user's question was bad

7. Your chunk overlap is 0 (no overlap between chunks). A key sentence spans two chunks. What happens?

A. The sentence is complete in both chunks

B. The sentence is incomplete in one or both chunks, reducing relevance

C. Python automatically merges the chunks

D. The embedding model reconstructs it

8. You embed a user question and retrieve the 5 most similar chunks. This is stage __ of RAG.

A. Indexing

B. Retrieval

C. Generation

D. Evaluation

04

Applied RAG

Applied RAG — Production Patterns

Multi-stage retrieval, reranking, evaluation metrics, and deploying RAG systems that scale.

⏱ ~10 hrs DAYS 22–28

▾

// Learning Topics

✓

Why RAG Breaks in Production

→ Read

✓

Hybrid Search & Reranking

→ Read

✓

RAG Evaluation: Measuring Quality

→ Read

✓

Deploying RAG as an API

→ Read

✓

Upgrade Exercise & Future Paths

→ Read

Checkpoint Quiz

1. Vector search alone can miss what type of retrieval?

A. Semantic similarity

B. Exact phrase matches and codes

C. Typos in documents

D. Long-form text

2. Reciprocal Rank Fusion (RRF) is used to:

A. Speed up vector search

B. Combine rankings from multiple retrieval methods on a common scale

C. Compress embeddings

D. Detect duplicate documents

3. Reranking uses which approach to score chunk-query pairs?

A. Cosine similarity of embeddings

B. Cross-encoder neural networks trained on relevance judgments

C. BM25 keyword matching

D. Random sampling

4. In the RAGAS framework, Context Precision measures:

A. Whether the answer is faithful to context

B. Whether retrieved chunks are relevant to the query

C. How fast retrieval is

D. The size of the vector database

5. You're deploying a RAG system to production. What's essential besides the core logic?

A. Just put the Python script on a server

B. Authentication, rate limiting, logging, monitoring, error handling

C. A fancy UI

D. A database for every file

6. The law firm case study achieved what improvement?

A. 100% automation, zero human review needed

B. Contract review time reduced from 4 hours to 22 minutes per contract

C. Eliminated the need for lawyers

D. No improvement, just a demo

7. Contextual compression helps by:

A. Making the vector database smaller

B. Extracting relevant sentences and removing noise from chunks

C. Compressing files on disk

D. Reducing the number of tokens in embeddings

8. When deploying with FastAPI, what prevents abuse from unlimited requests?

A. Hope users are nice

B. Implement rate limiting middleware

C. Shut down the server

D. Only serve friends

05

Agent Architecture

ReAct loops, tool use, multi-agent systems, and designing agents that think before they act.

⏱ ~12 hrs DAYS 29–35

▾

// Learning Topics

✓

From Answering to Acting: Agent Fundamentals

→ Read

✓

Tool Definition & Tool Use

→ Read

✓

Memory Architecture

→ Read

✓

Multi-Agent Systems

→ Read

✓

Human-in-the-Loop Design + Build Exercise

→ Read

Checkpoint Quiz

1. In a ReAct loop, what is the correct sequence of steps?

A. Think → Select tool → Execute tool → Observe result → Loop

B. Select tool → Execute tool → Think → Observe result

C. Execute tool → Think → Observe

D. Observe → Think → Execute

2. Why is the max_iterations limit important in a ReAct agent?

A. It ensures the agent finishes quickly

B. It prevents runaway loops and limits token costs

C. It improves accuracy

D. It's required by the API

3. How does the model decide which tool to use?

A. It randomly selects from available tools

B. It uses the tool descriptions in the system prompt to match the current task

C. The user tells it which tool to use

D. It uses the first tool every time

4. In a multi-agent system, what is the primary role of the orchestrator agent?

A. It executes all the work itself

B. It breaks the goal into subtasks and dispatches them to specialist agents

C. It just passes messages

D. It stores the results

5. What is the purpose of an agent's short-term memory?

A. To store facts between sessions

B. To track the history of thoughts and observations within a single run

C. To cache API responses

D. To remember previous conversations

6. When should human approval be required for an agent action?

A. Always, for every action

B. Only for high-stakes actions (approval, spending >$1000, etc.)

C. Never, agents should be fully autonomous

D. Only for debugging

7. What is the risk of storing long-term agent memory without hygiene practices?

A. It uses too much storage space

B. Stale or incorrect information can corrupt future decisions

C. The agent forgets everything

D. It breaks the model

8. Why is writing good tool descriptions in the schema critical?

A. It makes the code easier to read

B. It helps the model understand when and how to use each tool

C. It improves performance

D. It's just for documentation

06

Production Agents

Reliability, observability, security, cost control. Hardening agents for real-world use.

⏱ ~12 hrs DAYS 36–42

▾

// Learning Topics

✓

The Production Gap & Async Architecture

→ Read

✓

Observability: Logging & Monitoring

→ Read

✓

Error Handling, Retry Logic & Security

→ Read

✓

Cost Control & Deployment Architecture

→ Read

✓

Hands-On: Harden the Research Agent

→ Read

Checkpoint Quiz

1. What is the primary benefit of using async/await in agent systems?

A. It makes the code easier to read

B. It allows parallel execution of independent tool calls, reducing total latency

C. It prevents errors

D. It reduces memory usage

2. In structured logging, what is the purpose of a run_id?

A. To identify which user submitted the request

B. To track a single agent run across multiple log entries, enabling full request tracing

C. To count errors

D. To debug the logger

3. Why is exponential backoff with jitter better than immediate retry?

A. It reduces the total number of retries needed

B. It prevents the thundering herd problem where all clients retry at once, overwhelming the server

C. It always succeeds

D. It's faster

4. What is prompt injection and how is it mitigated?

A. When the agent uses too many tools. Mitigate by limiting tools.

B. When user input contains commands trying to override the agent. Mitigate by sanitizing input before passing to the model.

C. When the API is slow

D. When tokens run out

5. What is the "$1,000 rule" for agent authorization?

A. Agents can approve any action under $1,000

B. Actions costing >$1,000 or taking >1 hour to reverse require human approval

C. Agents always need approval

D. $1,000 is the maximum any agent can spend

6. How should you manage API keys in production?

A. Hardcode them in the code for easy debugging

B. Store them in environment variables or a secrets manager, never log them

C. Share them in Slack

D. Put them in a README

7. What is model tiering and why is it useful?

A. Running multiple models in sequence for accuracy

B. Routing simple tasks to cheaper models (Haiku) and hard tasks to expensive models (Sonnet/Opus)

C. Changing model names

D. Hiding models from users

8. What does a health check endpoint do?

A. Monitors agent accuracy

B. Provides a simple way to verify the service is running and its dependencies are healthy

C. Checks user health

D. Monitors token usage

07

Pre-Capstone

Pre-Capstone — System Design

Design your capstone project: a Payment Risk Modeling Agent.

⏱ ~8 hrs DAYS 43–49

▾

// Learning Topics

✓

The Capstone Revealed: Requirements & Architecture

→ Read

✓

Data, Technology Selection & Build Plan

→ Read

✓

Hands-On: Write the Requirements Document

→ Read

Checkpoint Quiz

1. What is the primary purpose of the Payment Risk Modeling Agent?

A. To process payments from merchants

B. To assess the risk of a merchant and recommend approval, review, or decline

C. To store payment data

D. To display bank statements

2. In the layered architecture, which layer retrieves relevant risk guidelines from the vector store?

A. Layer 1: Input Validation

B. Layer 2: Risk Guidelines Vector Store

C. Layer 3: Risk Scoring

D. Layer 4: Output

3. The five sample merchant applications are designed to test what?

A. The FastAPI server's performance

B. The Pydantic model validation and end-to-end agent logic

C. Database queries

D. API authentication

4. Why is auditability critical in a payment risk system?

A. It makes the system faster

B. Underwriters need to understand why decisions were made, especially for declined or high-risk merchants

C. It's required by law

D. It improves accuracy

5. What does Pydantic do in this architecture?

A. It stores embeddings of the risk guidelines

B. It validates input and output data against schemas

C. It manages the database

D. It trains the model

6. In the risk scoring model, why combine rule-based scoring and LLM reasoning?

A. To use more tokens and test the system

B. Rules provide consistency and auditability; reasoning catches edge cases and context

C. Rules are faster

D. LLM reasoning is always better

08

Capstone Build

Capstone Build — Payment Risk Agent

Build and deploy a production-grade payment risk assessment system.

⏱ ~20 hrs DAYS 50–60

▾

// Learning Topics

✓

Day One: Foundations & Risk Guidelines Vector Store

→ Read

✓

The Risk Assessment Agent & Audit Logging

→ Read

✓

FastAPI Wrapper, Integration Testing & Deployment

→ Read

Checkpoint Quiz

1. In the Payment Risk Agent, what is the purpose of the Pydantic models (MerchantApplication and RiskAssessment)?

A. To define the risk scoring algorithm

B. To enforce input/output schemas and catch invalid data early

C. To store data in the database

D. To train the LLM

2. How is the agent's risk score calculated?

A. Purely from LLM reasoning

B. A combination of rule-based signals (volume, chargeback rate) and LLM judgment

C. Purely from rule-based calculations

D. Randomly

3. What is stored in the audit log and why?

A. Just the final risk score

B. The merchant application, retrieved guidelines, agent reasoning, and final assessment — for transparency and debugging

C. Only errors

D. User credentials

4. Which of the five test merchants should result in a DECLINE recommendation?

A. LowRiskSoftware

B. SuspiciousNew (high volume + vague description)

C. MediumRiskEcommerce

D. EstablishedRetail

5. Why is the health check endpoint (/health) important?

A. It checks the agent's accuracy

B. It allows load balancers to detect if the service is alive and dependencies are healthy

C. It monitors users

D. It runs the capstone tests

6. What does "SuspiciousNew" merchant application test?

A. The agent's handling of high-volume merchants

B. The agent's handling of new businesses with high volume and vague descriptions

C. The database performance

D. The API rate limiting

7. How does the agent retrieve relevant risk guidelines for a merchant?

A. It asks the user which guidelines apply

B. It embeds the merchant data and retrieves similar guidelines via vector similarity

C. It uses hard-coded rules

D. It randomly samples guidelines

8. What is the role of the FastAPI wrapper (main.py)?

A. It contains the risk scoring algorithm

B. It exposes the agent as HTTP endpoints and handles authentication, validation, and logging

C. It manages the database

D. It trains the embeddings

9. Why deploy using Docker and Render instead of running on your laptop?

A. Docker and Render are cheaper

B. Docker ensures consistency across environments; Render provides global availability and automatic scaling

C. Your laptop can't run Docker

D. It's faster

10. What does Level 2 completion mean?

A. You understand AI theory

B. You can build production-grade AI agents that solve real business problems

C. You passed a test

D. You watched videos

09

Developer Tools

Developer Tools & Agent Platforms

Claude Code, Codex, OpenClaw, ClawHub — tools to build agents faster.

⏱ ~10 hrs DAYS 43–49

▾

// Learning Topics

✓

Claude Code: AI-Powered Software Development

→ Read

✓

OpenAI Codex: Cloud-Based Agent Orchestration

→ Read

✓

OpenClaw: Building Agent Skills

→ Read

✓

Integrating the Tools + Hands-On Exercise

→ Read

Checkpoint Quiz

1. Claude Code is best described as:

A. An autocomplete plugin

B. A command-line tool that gives Claude access to your full codebase for agentic development

C. A website builder

D. A code formatter

2. When should you use Claude Code over Codex?

A. Always use Claude Code

B. When you need interactive, real-time development where you can guide the process

C. When you need parallel task execution

D. When Codex is broken

3. Codex's primary advantage for team workflows is:

A. Lower cost

B. Parallel execution of independent tasks in isolated sandboxes

C. Better code quality

D. Easier to use

4. An OpenClaw skill consists of:

A. Just a Python script

B. Metadata, capability definitions, execution logic, and configuration options

C. Only a description

D. A video tutorial

5. ClawHub functions as:

A. A code hosting platform like GitHub

B. A skill marketplace with versioning and one-command installation — like npm for agent skills

C. A training platform

D. A documentation wiki

6. After completing Modules 01-08, developer tools like Claude Code provide acceleration because:

A. They replace the need to understand fundamentals

B. They amplify your understanding with faster iteration and larger scope

C. They do all the work for you

D. They're required to deploy

7. Three parallel Codex agents working on independent features is an implementation of:

A. Prompt chaining from Module 02

B. The multi-agent orchestration pattern from Module 05

C. RAG from Module 03

D. Streaming from Module 02

8. The 'Meeting Follow-Up' OpenClaw skill example demonstrates:

A. Basic chatbot functionality

B. A complete agent workflow — trigger, reasoning, action, memory — packaged as a reusable modular capability

C. Just a summarization tool

D. Calendar integration