Lesson reader

Machine Learning HeroChapter 81

Chapter 3: Data Preprocessing and Feature Engineering

Section 2 of 4-~ 12 min read-Synced from Cuantum content

What is the purpose of data cleaning in data preprocessing?
- a) To improve model performance by transforming features

b) To identify and handle missing data, remove duplicates, and correct errors

c) To scale data to a consistent range

d) To reduce the dimensionality of the dataset

Which technique is typically used for handling missing data?
- a) One-hot encoding

b) Data augmentation

c) Imputation

d) PCA

Feature engineering involves which of the following?
- a) Creating new features from existing ones

b) Reducing noise from the data

c) Increasing the number of samples in the dataset

d) Both a and b

Why is it important to scale numerical features?
- a) To remove outliers from the dataset

b) To ensure features with different ranges contribute equally to model performance

c) To increase the size of the dataset

d) To remove noise from the dataset

What is the Train-Test Split used for?
- a) Creating synthetic data samples

b) Separating data into training and testing sets for model validation

c) Increasing the number of features in the dataset

d) Standardizing features to the same scale