
Creator: DeepLearning.AI
Category: Software > Computer Software > Educational Software
Topic: Data Analysis, Data Science
Tag: clean, Data, databases, real, web
Price: USD 49.00
This course focuses on collecting and preprocessing real-world data, moving beyond the clean datasets that learners have encountered in earlier coursework. The core narrative is about handling data as it exists “in the wild” – messy, inconsistent, and coming from various sources. The course includes three modules focusing on data from different sources: web scraping, APIs, and databases. It begins with web scraping, including how to extract data from websites using tools like Pandas and Beautiful Soup, while considering ethical implications. You’ll also learn how to clean and preprocess text data, the primary type of data you’ll encounter on the web. Module 2 introduces APIs, a method of getting real-time data from online sources. You’ll learn how to parse JSON data and authenticate your access with API keys.
You’ll also explore broader data cleaning concepts, particularly around handling numerical data, including normalization and other techniques. The final module focuses on databases. You’ll learn how databases are structured, and how to access them using SQL queries. You’ll learn when to choose SQL versus Python for different data cleaning tasks. You’ll also cover the core join operations that allow you to combine database tables, which make up many interview questions. The course aims to prepare you for real-world scenarios where data rarely comes in a perfect, analysis-ready format.