In Python machine learning, how do I batch process data?

In Python machine learning, batch processing of data is an efficient way to manage and train models on large datasets. This technique involves splitting the dataset into smaller, manageable portions (batches), which can improve performance and reduce memory usage during training.

# Import necessary libraries import numpy as np from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression # Sample dataset generation data = np.random.rand(1000, 10) # 1000 samples, 10 features labels = np.random.randint(0, 2, size=(1000,)) # Binary labels # Splitting the dataset into training and testing sets X_train, X_test, y_train, y_test = train_test_split(data, labels, test_size=0.2) # Standardizing the features scaler = StandardScaler() X_train = scaler.fit_transform(X_train) X_test = scaler.transform(X_test) # Batch processing batch_size = 32 model = LogisticRegression() for i in range(0, len(X_train), batch_size): X_batch = X_train[i:i + batch_size] y_batch = y_train[i:i + batch_size] model.fit(X_batch, y_batch) # Model evaluation accuracy = model.score(X_test, y_test) print("Accuracy:", accuracy)

Python machine learning batch processing datasets memory management training models