Data Analysis Foundations with PythonChapter 243

16.3 Predictive Modeling

Section 3 of 5-~ 12 min read-Synced from Cuantum content

After understanding our data through EDA and visualization, the next sensible step is to make some predictions based on this understanding. Predictive modeling enables us to anticipate future trends and outcomes by using algorithms and statistical models. This is a crucial step in the data analysis process, as it allows us to make informed decisions and plan for the future.

By building a predictive model, we can gain insights into potential future sales trends and patterns. This can help us to identify areas for improvement, optimize our resources, and make informed business decisions. Predictive modeling is like a magic wand, but backed by data, and can provide us with valuable insights that we might not otherwise have access to.

In this section of our Sales Data Analysis case study, we'll delve deeper into the process of building a predictive model. We'll explore the different types of models that we can use, and the different algorithms and statistical models that underpin them. We'll also look at how we can evaluate the performance of our model, and how we can use it to make informed predictions about future sales.

So, are you ready to take your data analysis skills to the next level? Let's dive in and explore the fascinating world of predictive modeling!

16.3.1 Preprocessing for Predictive Modeling

Before we proceed with building a model, let's make sure our data is in the right format. We've already cleaned our data in the previous section, so we'll just check that the features we plan to use are appropriately scaled.

from sklearn.preprocessing import StandardScaler # Create a new DataFrame for modelingdf_for_modeling = df_monthly_sales[['Quantity', 'TotalSales']] # Scaling the featuresscaler = StandardScaler()df_scaled = scaler.fit_transform(df_for_modeling)

16.3.2 Model Selection and Training

For our sales data, we'll use a simple linear regression model to predict TotalSales based on Quantity.

from sklearn.model_selection import train_test_splitfrom sklearn.linear_model import LinearRegression # Splitting the data into training and test setsX = df_scaled[:, 0].reshape(-1, 1)y = df_scaled[:, 1]X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Initialize and train the modelmodel = LinearRegression()model.fit(X_train, y_train)

16.3.3 Model Evaluation

Let's assess how well our model performs using metrics like RMSE and R-squared.

from sklearn.metrics import mean_squared_error, r2_score # Making predictionsy_pred = model.predict(X_test) # Calculate the performance metricsrmse = np.sqrt(mean_squared_error(y_test, y_pred))r2 = r2_score(y_test, y_pred) print(f'RMSE: {rmse}')print(f'R-squared: {r2}')

16.3.4 Making Future Predictions

Now that our model is trained and evaluated, let's make some future sales predictions.

# Make future predictionsfuture_quantity = np.array([1200, 1400, 1600]).reshape(-1, 1)future_quantity_scaled = scaler.transform(future_quantity)future_sales_scaled = model.predict(future_quantity_scaled) # Inverse transform to get actual sales valuesfuture_sales = scaler.inverse_transform(np.column_stack((future_quantity, future_sales_scaled)))[:, 1]print(f"Predicted Future Sales: {future_sales}")

And voilà! You now have a predictive model for your sales data, ready to guide you in your future endeavors. Isn’t that exciting?

Feel empowered, because understanding the past and present through EDA, and peeking into the future with predictive modeling, can be the keys to your business success!