Apache Kafka is a powerful, open-source stream processing platform that enables businesses to process and analyze data in real-time. This course introduces the core concepts and architecture of Apache Kafka, guiding learners. This course is designed for aspiring data engineers, software developers interested in data processing, and IT professionals looking to diversify into … [Read more...] about Apache Kafka – An Introduction
Computer Science
Designing Larger Python Programs for Data Science
Modern programs are complicated structures, with hundreds to thousands of lines of code, but how do you efficiently move from smaller programs to more robust, complicated programs? How do data scientists simulate the randomness of real world problems in their programs? What techniques and best practices can you leverage to design pieces of software that can efficiently handle … [Read more...] about Designing Larger Python Programs for Data Science
VSCode for Developers: Set up a professional environment
Upgrade your development workflow and start writing code like a professional! This project aims to empower software developers to leverage advanced features within Visual Studio Code to enhance their development workflow. Throughout this project-based course, participants will explore various advanced functionalities of Visual Studio Code, including advanced code navigation, … [Read more...] about VSCode for Developers: Set up a professional environment
Pretraining LLMs
In Pretraining LLMs you’ll explore the first step of training large language models using a technique called pretraining. You’ll learn the essential steps to pretrain an LLM, understand the associated costs, and discover how starting with smaller, existing open source models can be more cost-effective. Pretraining involves teaching an LLM to predict the next token using vast … [Read more...] about Pretraining LLMs
Building Applications with Vector Databases
Vector databases use embeddings to capture the meaning of data, gauge the similarity between different pairs of vectors, and navigate large datasets to identify the most similar vectors. In the context of large language models, the primary use of vector databases is retrieval augmented generation (RAG), where text embeddings are stored and retrieved for specific queries. … [Read more...] about Building Applications with Vector Databases