Professional Development @ Boulder
Learning Multi-Agentic AI Systems
This Spring, I’ve reached the midpoint of my program in computer science at the University of Colorado Boulder (UCB) and am starting to branch out to more interesting courses outside the core curriculum. One of these is CSPB 3112 Professional Development in Computer Science where I can choose any CS topic and work on it during the semester for one credit.
Therefore, I’m going all in on agentic AI systems. Here’s what I’m proposing to work on during the next 13 weeks.
Project Proposal
This project is about learning how to design, build and deploy multi-agentic AI systems.
Vision statement
This project will add to my skills in AI system design and development following the different projects implemented during the CSPB program:
CSPB 1300: C++ Implementation of a Basic Recurrent Neural Network.
CSPB 4830: n8n AI automation workflow for construction costing.
CSPB 3308: AI Technical Interview Assistant.
Personal Project: AI Grader & Rich Feedback for continuous education programs.
Motivation
My moonshot goal is to obtain a PhD in Computer Science and I’m studying different CS topics such as HCI, AI and Social Computing and their application in adult learning (Andragogy), upskilling, reskilling, self fulfillment and career change.
Specific and measurable goals (learning objectives) for the project Ability to understand, explain and implement the following:
Agent Design Concepts (Chain of Thought - CoT and Reasoning/Acting Framework - ReAct)
Multi Agent Design Patterns (Router, Parallel, Serial, Orchestrator)
Adding Guardrails to Agent output (Programmatic and LLM Judging)
Use of Short and Long Term Memory
Retrieval Augmented Generation (RAG) and vector databases (Pinecone, PgVector, etc…)
Integration with Tools (Model Context Protocol)
Risks to project completion, possibly including:
Busy with life and work commitments.
Mitigation strategy for the risks listed above
Take a structured approach to learning by following the Udacity Agentic AI Curriculum.
Implement the 4 projects in the Udacity Agentic AI course and have them graded.
Review the Github Actions course on MS certification page and Linkedin Learning.
Blog about my learnings on a weekly basis.
Choose a capstone project with publicly available data.
Follow this loosely designed schedule:
[x] Week 01: Prepare project proposal & research topics
[x] Week 02: Udacity: Course 1 - Prompting For Effective Reasoning & Planning - Submit P1
[x] Week 03: Udacity: Course 2 - Agentic Workflows - Submit P2
[x] Week 04: Udacity: Course 3 - Building Agents
[x] Week 05: Udacity: Course 3 - Building Agents
[x] Week 06: Udacity: Course 3 - Submit P3
[x] Week 07: Udacity Course 4 - Multi-Agent Systems
[ ] Week 08: Udacity Course 4 - Multi-Agent Systems
[ ] Week 09: Udacity Course 4 - Submit P4 & Obtain Certificate
[ ] Week 10: Github Actions - Review MS official course
[ ] Week 11: Github Actions - Review Linkedin Learning Course - Take Exam
[ ] Week 12: Capstone Project: Graduate Record Exam (GRE) AI Essay Tutor
[ ] Week 13: Capstone Project: Graduate Record Exam (GRE) AI Essay Tutor
Project assessments - provide a list of evaluation criteria for the project
Obtained the Udacity Agentic AI Nanodegree.
Deployed an Agent using a CI/CD pipeline.
Bonus: Take the Github Actions Certifications Exam
Project portfolio link: https://starterpad.com
Week 07 [Mar 2 - Mar 9, 2026]
What did you do last week?
Started Course #4 of the program. It is all about applying all the bits and pieces we learned in previous weeks into multi-agentic systems.
First concept is about multi-agent architecture design and answering the following questions to carefully design multiple agents that can effectively work with each other:
Who does what? (What does each agent specialize in?)
How do they talk to each other? (directly or through a manager/orchestrator or not at all)
What are the rules of engagement? (solving for conflict and recovering from failures).
How do we save and manage the state of agents? and how does data flow from agent to agent?
Second concept is the introduction of Smolagents as a multi-agentic framework.
It is a Python framework for building multi-agent systems powered by large language models (LLMs). It is designed to make it easy to create, orchestrate, and manage agents that can use tools (Python functions), maintain state, and collaborate to solve complex tasks.
Smolagents lets us build intelligent, tool-using, multi-agent systems with LLMs and Python, making it easy to combine language understanding with real-world actions and stateful workflows.
Third concept is about organizing agents around an orchestrator that acts as a Project Manager that:
Delegates tasks to specialized agents providing them with the necessary information to do their job.
Handles response from the specialized agents.
Manages state (keeping track of program and retaining information)
Recovers from errors and failures.
Fourth concept is handling routing between agents and how can that happen using:
Content based Routing: by looking at keywords or data types.
Round Robin Routing: Divide tasks equally usually among the same kind of specialized agents
Priority Based Routing: Check the priority of tasks and accelerate the execution of high priority tasks.
In addition, we need to manage the data flow between agents especially if we are mixing structured and unstructured data together so that each agent can receive information in a format that it understands.
What do you plan to do this week?
This week, I intend to complete the fourth course with exploring the following three concepts:
State Management in Multi-agentic systems and understanding the difference between conversation level state (short term memory) and system level state (long term memory)
How do we coordinate state between different agents and the importance of the orchestrator to do that?
How to use RAG to empower multiple agents to work together to solve complex tasks.
Are there any impediments in your way?
There shouldn’t be any problems or anticipated delays this week.
Reflection on the process you used last week — how can you make the process work better?
I put additional effort last week to keep on track and am planning to hopefully allocate some more time during Spring break to finish the course in order to spend more time on the capstone project.
Week 06 [Feb 23 - Mar 2, 2026]
Mid Project Update
My progress so far is going nearly as planned with a couple of days of delay so far which I anticipate being able to cover during Spring Break at the latest.
1. Learning Objectives
2. Completed Weekly Tasks
3. Remaining Weekly Tasks
4. Reflection
This project has been a good steppingstone into the world of Agentic AI. Although the progress of the field is phenomenal (with moltbots now on the loose); I believe I have understood the building blocks of AI Agents and am excited to apply all my learnings to the specified capstone project. Taking this course with the Intensive Programming Workshop is turning out to be a bit of a chore which is sometimes affecting my progress overall.
Week 05 [Feb 15 - Feb 23, 2026]
What did you do last week?
Completed course #3 as part of the Udacity Agentic AI course. The concepts are really interesting which involved:
Function calling with tools which involves having the LLM trained to recognize function call requests in its prompt.
Structuring output using Pydantic so that it can be easily validated.
Modelling agentic workflows as state machines where each step execution has inputs that are processed to produce outputs (e.g. LLM processing, tool calling, termination, etc…)
Short Term memory can be modelled as a collection of workflows that are available and fed back to the LLM during one session (each session would have multiple prompts/interactions)
Sometimes the Agent needs to query updated (past its training data cutoff date) and unstructured data so it can use an API to execute searches on the web and return a structured output.
Agents need to query data from databases such as SQL, NoSQL or vector databases. For SQL databases, the agent would have to convert text to SQL.
Retrieval Augmented Generation (RAG) is an important framework to search relevant knowledges for specific and particular information. This is done by querying a knowledgebase first which doesn’t necessarily need to be a vector database like Pinecone or Chroma.
Long Term memory is needed to personalize agents across sessions and there are three types:
Semantic Memory: Facts and Knowledge
Storing important facts like the user’s name for example that can be retrieved using similarity searches.Episodic Memory: Events and Experiences
Provides few shot examples and summaries of previous interactions for how the agent dealt with similar cases before.
Procedural Memory: Behavior and Patterns
This allows the agents to adapt its rules and prompts to specific requests such as maintaining a certain tone of voice. So with each interaction, the agent adapts to its users.
Finally, evaluating the Agent’s output for
Task Completion
Quality Control
Tool Interaction
System Metrics
We can evaluate agents using:
Final Response Evaluation: After a task, look at the final output.
Single-Step Evaluation: Evaluate a single decision.
Trajectory Evaluation: Trace the entire path the agent took.
So to build effective evaluation, we would need inputs, outputs, reference data and evaluators (such as LLM Judge for example)
What do you plan to do this week?
I intend on working on the third project which involves developing an assistant called UdaPlay. Executives, analysts, and gamers want to ask natural language questions like:
Who developed FIFA 21?
When was God of War Ragnarok released?
What platform was Pokémon Red launched on?
What is Rockstar Games working on right now?
The agent should:
Attempt to answer the question from internal knowledge (about a pre-loaded list of companies and games)
If the information is not found or confidence is low, search the web
Parse and persist the information in long-term memory
Generate a clean, structured answer/report
Are there any impediments in your way?
There shouldn’t be any problems or anticipated delays this week.
Reflection on the process you used last week — how can you make the process work better?
I was expecting a more detailed explanation and implementation of Anthropic’s Model Context Protocol (MCP) as a standard way to use external tools and APIs but the course glossed over it quickly and so will be looking to supplement my knowledge with more information.
Week 04 [Feb 8 - Feb 15, 2026]
What did you do last week?
I started with course #3 as part of the Udacity Agentic AI course. This course goes deeper into using tools, formatting structured output, interacting with databases, using RAG and adding short and long term memory support. I’m about 60% done with the nanodegree.
I managed to review the first part about utilizing tools and will be continuing with the course content today. Managed to squeeze some work in between my traveling last week and working on the large first project for the Intensive Programming Workshop.
I also managed to watch Ben Snyder’s AI talk and especially enjoyed the intersection of sociology with computer science which involves a lot of graph and network theory.
What do you plan to do this week?
I intend on completing the course videos and coding exercises to allow some time to work on the course project next week.
Are there any impediments in your way?
There shouldn’t be any problems or anticipated delays this week.
Reflection on the process you used last week — how can you make the process work better?
I would say reviewing and marking the tough weeks and adding a bit of buffer in beforehand. And putting the work in even for 20 or 30 minutes during the week regardless of other commitments to push forward a little during the tough weeks.
Week 03 [Feb 1 - Feb 8, 2026]
Last Week:
I worked on the second project in the Udacity Agentic AI course where I got into more detail about the different agentic patterns, including running agents in parallel or sequence, having an intelligent orchestrator to route prompts to the most relevant agent as well as study two agentic workflows:
The first being the evaluator-optimizer workflow where we using LLM judging to assess if the output of the LLM meets our passing criteria. if the evaluation fails, another iteration happens with feedback generated from the first failure fed back as a prompt to the workflow.
The second is the orchestrator-worker workflow where specialist agents handle very specific tasks and the orchestrator routes the tasks to their best agents before terminating with the final response. We should have tasks first and that’s where Chain of Thought is important to divide our prompt into actionable tasks.
This project is also really interesting and is based on setting up an agentic software project management workflow to help build a sprint backlog for development from a set of customer requirements.
Here’s an excerpt from the final result:
**Feature 1: Automated and Efficient Email Handling**
- **Task ID: T001**
- **Task: Develop Email Response Automation Module**
- **User Story Reference**: US001 - As a user, I want routine inquiries to be automatically handled so that I can focus on more complex tasks.
- **Detailed Description**: Build logic to identify routine inquiries by analyzing email content patterns. Develop algorithms to generate automated responses using natural language processing. Integrate a library of standardized responses into the system to ensure consistency.
- **Acceptance Criteria**: Routine inquiries are correctly identified and responded to with a 95% accuracy rate. Automated responses are generated within 2 seconds.
- **Estimated Effort**: 40 hours
- **Dependencies**: Completion of the standardized response library (T002).
- **Task ID: T002**
- **Task: Design and Implement Standardized Response Library**
- **User Story Reference**: US002 - As a user, I want a library of standardized responses to ensure consistent communication.
- **Detailed Description**: Create a database to store common inquiries and responses. Develop a system for easy access and retrieval of these responses, ensuring they are up-to-date and relevant.
- **Acceptance Criteria**: The library contains at least 100 standardized responses and can be accessed in under 1 second.
- **Estimated Effort**: 30 hours
- **Dependencies**: None
- **Task ID: T003**
- **Task: Testing and Validation**
- **User Story Reference**: US003 - As a user, I want to ensure the accuracy of automated responses to maintain customer satisfaction.
- **Detailed Description**: Perform unit and integration testing to ensure response accuracy. Validate the consistency and accuracy of the automated responses through user testing and feedback.
- **Acceptance Criteria**: Automated responses pass all test cases with a 95% success rate.
- **Estimated Effort**: 20 hours
- **Dependencies**: Completion of T001 and T002.This is pretty amazing and I’m only using GPT-4o as the LLM processor.
So the project is in review and will keep you updated with the outcome.
You can check the source code for the second project here.
This Week:
I will start studying course #3 which is building agents with access to memory (short/long term), databases and tools.
Impediments:
I travelled to attend a startup conference in Dubai and although I finished the work a bit early, I couldn’t submit and provide my progress till today. I’m taking the intensive programming workshop too and we have a deadline for the first project next week which may slow down my progress a bit for professional development till I’m able to submit the project.
Reflections / Improved Process:
I’m still not sticking to a strict schedule but am doing fine up till now with my schedule being a bit fluid around priorities, work, life and other course requirements. I’ll be monitoring this closely to make sure I don’t slip on the way to the finish line.
Week 02 [Jan 26 - Feb 1, 2026]
What did you do last week?
I successfully completed Course #1 and passed the first project.One thing I really like about Udacity is that their projects are reviewed by humans to provide detailed feedback and code reviews (don’t know if they’re still using humans or have shifted to AI grading after the Accenture acquisition).
I added the review above to summarize my takeaways from the lesson. Agentic AI is all about system design and less about actual machine learning. LLMs are treated as black boxes with prompts as inputs to the system and assistant responses as outputs. This distinction will become important in Course #2 as I work on workflow design.
First step to optimize the system is to work on refining the input. In other words, how do we construct a good prompt?
The course suggests to divide the prompt into 5 sections as follows:
[Role]: The persona the LLM should adopt (e.g., “Act as a high school teacher.”).
[Task]: The specific instruction or question (e.g., “find a solution to the following trigonometric identity.”).
[Output Format]: How the response should be structured (e.g., “One sentence answer”).
[Examples]: Sample input/output pairs
(e.g., “Q: What is the value of
\(sin^2(x) + cos^2(x)\)A: It is 1”).
[Context]: Additional information needed for the task (e.g., current date, if asking for the date).
Use Chain of Thought (CoT), which is a process to divide tasks into smaller sized sub tasks executing one after the other. This is accomplished by explicitly mentioning in the prompt to “Reason step by step”.
ChatGPT explains this quite well:
CoT is the agent’s internal deliberation / scratchpad that helps it:break a goal into steps (“what sub-problems do I need to solve?”)
choose the next action (“which tool/API should I call?”)
keep state (“what have I learned so far?”)
handle errors (“that tool failed—what’s the fallback?”)
Insure that we specify a schema for the output and validate the output. This can be done by using Pydantic models and asserting if the LLM output adheres to the specified schema. Here is an example from the Pydantic documentation:
and an output field can be verified by asserting the following:
the
model_validate()class function can be used as well to validate the LLM output against the schema.Use ReAct (Reasoning /Acting framework)
ReAct is a pattern where the LLM can alternate between thinking and taking actions:
Think: a step where an LLM can decide what’s missing and what to do next (e.g. “I need to get the current date to find suitable events”)
Act: perform an action or call a tool (e.g. “use tool get_current_date with no arguments“ )
Observe: Read the result and use it to choose the next step (e.g. “Today is February 4, 2026, now I can filter events for today.”)
Final Answer: This loop repeats until the model has enough information, then it produces a Final Answer.
The purpose is to reduce hallucinations by checking real world information.
I applied these concepts to a travel AI agent that plans a trip to the wonderful city of AgentsVille! You can check the code here.
What do you plan to do this week?
This week, I’ll be continuing with Course 2 from the Udacity Nanodegree looking at Agentic workflow patterns such as prompt chaining, parallelization and routing and then I’ll review two important agentic workflows:Evaluator-Optimizer workflow
Orchestrator-Worker workflow
Are there any impediments in your way?
Currently no problems in sight.Reflection on the process you used last week, how can you make the process work better?
To be truthful, I didn’t stick to my slotted time and had to delay it from my scheduled Friday sessions to the next available times on Monday and Tuesday when I reviewed the course content and completed the project on one day and then worked on my blog post and providing updates on the second. I still have my next scheduled session on Friday and I hope that I can stick to it this time. I feel that spacing out my work between sessions helps me process and recall concepts better.
Week 01 [Jan 20 - Jan 26, 2026]
I’ll be providing Agile style weekly standup updates and here’s the first one:
What did you do last week?
Last week was focused on narrowing my topic of interest and since I’m on a trajectory to develop my AI skills, I went for Agentic AI. It’s a trendy topic and I’d love to gain a better understanding.I’ve been experimenting with construction cost estimations on several projects using line item descriptions and a basic knowledgebase of two previously priced construction projects with chatGPT in Thinking mode (i.e. using agents) but with mixed results. Here’s a sample of a project that was recently estimated using AI agents vs a real cost estimator for mechanical works in a 5 stars hotel:
I’m working on refining the AI system architecture for better use of short and long term memory as well as vector databases for Retrieval Augmented Generation (RAG). Therefore, I’ve identified the Udacity Agentic AI course as the source for my learning throughout the next 13 weeks.
What do you plan to do this week?
I’ve already started with the first module of the Udacity Agentic AI course and will be completing and submitting the first project this week. I’ll then be writing my first article summarizing my learnings from the week.Are there any impediments in your way?
Currently no problems in sight.Reflection on the process you used last week, how can you make the process work better?
I haven’t really tested the process last week but with the plan set in the proposal, I’ll be starting to time my commitment to the slotted weekly study time.

















