Area (Language):
[Introduction]
Use XGBoost-Regression-Time-Series to perform regression analysis of time series data.
[Interface functions and descriptions]
• Dataset
The drop-down menu shows the datasets that can be analyzed.
• Open the folder location of the dataset
You can quickly edit and add datasets.
• Documentations and instructional videos
Open the official website to view the documentations and instructional videos.
• Program flow
Set the parameters of each process and execute in the order of the process.
[Operation steps and instructions]
1. Select dataset
From the drop-down menu, select the dataset you want to analyze.
Introduction to the datasets:
• sales-forecast-airline
Airline passenger forecasting.
• stock
Stock price prediction, input the opening, closing, high, low, and trading volume, and predict the closing price in five days.
Dataset preparation:
• Training data set
File name: train_data.csv
File content:
The first column is the data index, or the time and date of the time series data. This column will be automatically ignored during analysis.
The first N columns are input, and the last column is output (prediction).
The following figure is an example of train_input.csv, 1 represents data index, 2 represents input, and 3 represents output.
• Testing dataset
File name: inference_data.csv
File content:
Same as the training dataset.
2. In the program flow area 1. Prepare Train Data, set the time sequence parameters and press Run. The input train_data.csv will be performed data augmentation according to the set time sequence parameters and output train_data_time_series.csv.
Parameters setting:
• Time Sequence
The length of the time series data.
Results:
• Display the size of the training data (train_data.csv) after data augmentation in the console
3. In the program flow 2. Train, edit the training parameters and press Run to execute the training.
Parameters setting:
• Estimator
The number of gradient boosted trees (default is 1000).
Results:
• Display the Root Mean Squared Error and R-squared of the trained model against the training dataset (train_data_time_series.csv) in the console
• The comparison graph of the predicted value and the true value of the training dataset (train_data_time_series.csv).
• The scatter plot of the predicted value and the true value of the training dataset (train_data_time_series.csv).
• Output predicted value (train_data_time_series_prediction.csv)
After opening the train_data_time_series_prediction.csv file, the last column is the predicted value of the training dataset (train_data_time_series.csv)
4. In the program flow 3. Prepare Inference Data, set the time sequence parameters and press Run. The input inference_data.csv will be performed data augmentation according to the set time sequence parameters and output inference_data_time_series.csv.
Parameters setting:
• Time Sequence
The length of the time series data (This parameter must be the same as the length of the time series during training).
Results:
• Display the size of the testing data (inference_data.csv) after data augmentation in the console
5. In the program flow 4. Inference, press Run to execute the inference
Parameters setting:
• Estimator
The number of gradient boosted trees (default is 1000).
Results:
• Display the Root Mean Squared Error and R-squared of the trained model against the testing dataset (inference_data_time_series.csv) in the console
• Comparison of the predicted and true values of the testing dataset (inference_data_time_series.csv)
• The scatter plot of the predicted and true values of the testing dataset (inference_data_time_series.csv)
• Output predicted value (inference_data_time_series_prediction.csv)
After opening the inference_data_time_series_prediction.csv file, the last column is the predicted value of the testing dataset (inference_data_time_series.csv).