Inventory Forecasting with AI

9. Forecasting Short-History Items

Summary

In the previous chapter, I trained the model on the entire assortment for the first time – not only SKUs with a full 36-month history, but also those with just a few months of data, or even no history at all.

  • Only items with a full history
  • Items with at least one month of history
  • The entire dataset, including items with no history at all

The minimum 1 month of history variant delivered the highest stability – the model was able to use as much data as possible without training on “empty” rows. However, detailed validation revealed that the biggest weakness remains items with less than 6 months of history:

  • Items with at least 6 months of history are handled relatively reliably.
  • Below this threshold – especially for brand-new SKUs – errors increase sharply.
  • The most common problem: the model fails to estimate even the basic scale – it doesn’t know whether to forecast 10, 100, or 1,000 units.

Feature Engineering III – Group Features

In chapter five, I showed how additional features can turn raw numbers into a “map” that allows the neural network to orient itself and predict demand with useful accuracy.
This time, however, I’m targeting items with short history, which simply don’t contain enough information for the model to make accurate predictions on their own.

The goal

  • Find similar items – those sharing the same category, seasonality, and promotion behavior.
  • Use group embeddings – each segment receives its own vector capturing demand similarity.
  • Calculate group averages – if an item has no history, the model can use averages from its group.
  • Add these values as new features – the network learns that it can “transfer” information from rich series to sparse ones.

Why it works

  • A new product in the “tools” category can immediately benefit from the history of “electrical accessories” if they share a long-term demand pattern.
  • The model can better distinguish one-off sales spikes from normal seasonal patterns.
  • Short series stop collapsing to zero forecasts without harming the predictions of items with long history.

New group features

The same types of features I first used at the individual sales level are now applied at the group level:

  • Relative (ratio) features – comparing an item to its group average.
  • Lag & rolling windows – delayed values and moving averages to capture trends.
  • Wavelet signals – detecting periodic patterns.
  • Trend indicators – slope and direction of sales changes.
  • Absolute and log-transformed values – better scaling across volume levels.


For each item, I also calculate
ratio-based log-features, giving the model a more fine-grained measure of how much the item deviates from its group – exactly the kind of missing signal short-history products need.

Results by history length

To get a detailed view, I again split validation into five segments by history length.
Validation was run on a model trained on items with at least one month of history plus the added group features.

MetricsFull dataset36 months12-35 months
Value (ALL)Value (Item)Value (ALL)Value (Item)Value (ALL)Value (Item)
WAPE32.404555.296930.305640.970335.653341.0825
RMSE39.906220.702441.442121.161125.448214.6151
0.87450.88620.7558
MAPE60.659955.21662.2879
ROBUST0.507169.69460.507855.82210.56761.2863
STABLE0.39690.36950.4645
Metrics6-12 months1-6 monthsno history
Value (ALL)Value (Item)Value (ALL)Value (Item)Value (ALL)Value (Item)
WAPE43.643758.813553.867959.459495.8109522.2663
RMSE22.938615.958348.472433.335944.672327.5461
0.75510.647-0.0372
MAPE63.481173.2619373.445
ROBUST0.503864.56020.562875.33690.3612550.8374
STABLE0.45370.45941.0963
  • Full history (36 months): Metrics remain virtually unchanged, group features did not harm the model.
  • Medium-length series (12–35 months): Results are comparable to the baseline, with no drop in performance.
  • Short series (≤ 6 months): Noticeable improvement in R² and MAPE, the model estimates the sales scale more accurately.
  • No history (0 months): Improved from absolute disaster to “still unusable,” but the model now shows some ability to infer curve shape from embeddings and group context.

Synthetic Data – When to (Not) Include It in the Model

Short and zero histories are the biggest challenge for forecasting – the model often fails to estimate even the basic scale. To give these items at least a hint of “history,” I replaced missing sales values with synthetic values.

These values are calculated as a weighted average of sales from similar items, where:

  • Similarity weights are derived from embeddings across a combination of groups (category × season × …).
  • Unlike the feature-engineering approach, where multiple separate values are created for different groups, here only a single final value is generated for a given time point.

The next question: When should synthetic data be used?

VariantAdvantagesDisadvantages
1️⃣ Synthetic data already in training• Model immediately learns the scale of the new item → less tendency to collapse to zero.• Trains on values that never actually existed → risk of noise.
• More “complete” series = better stability.• Predefined patterns may persist even after real sales start to behave differently (overfitting to synthetic history).
• Reduced risk of noise from empty items.
2️⃣ Synthetic data only at validation
• Training remains “clean,” without noise risk.• Model hasn’t seen these patterns during training → may ignore synthetic data or scale it incorrectly.
• Easy to test what synthetic data actually brings.• Possible confusion (why were there zeros before, and now there aren’t).
• Synthetic rules can be swapped anytime without retraining.

For evaluation, I again split the results into five history length segments. All models already included the group-features extension.

Variant 1 – Synthetic in Training
MetricsFull dataset33 months13-32 months
Value (ALL)Value (Item)Value (ALL)Value (Item)Value (ALL)Value (Item)
WAPE32.186271.201930.007943.247143.569651.4367
RMSE39.814120.165641.792320.651837.020718.3537
0.87510.88430.7773
MAPE71.348862.889766.1173
ROBUST0.513687.46980.518264.07480.467365.5932
STABLE0.39730.36860.544
Metrics7-12 months1-6 monthsno history
Value (ALL)Value (Item)Value (ALL)Value (Item)Value (ALL)Value (Item)
WAPE66.81591721.728852.156955.7443109.8849788.7963
RMSE42.614922.113147.994832.827343.008731.3547
0.49990.61550.0386
MAPE271.815379.4936605.8361
ROBUST0.43461021.12670.477586.10180.4294853.5104
STABLE0.8040.63871.2805
  • Almost all metrics worsened, including those for items with longer history.
  • The model started treating synthetic values as seriously as real data → noise from synthetic history overshadowed genuine signals.
  • For items with no history, there was a slight improvement in the prediction shape, but at the cost of degraded performance for the rest of the assortment.
Variant 2 – Synthetic Data Only at Validation
MetricsFull dataset33 months13-32 months
Value (ALL)Value (Item)Value (ALL)Value (Item)Value (ALL)Value (Item)
WAPE32.4858.131930.318740.910340.653346.0825
RMSE40.080820.771139.442122.878441.448219.6151
0.87340.88640.7558
MAPE61.262653.18862.8778
ROBUST0.506872.69010.508352.81410.563960.4654
STABLE0.39770.36080.471
Metrics7-12 months1-6 monthsno history
Value (ALL)Value (Item)Value (ALL)Value (Item)Value (ALL)Value (Item)
WAPE42.868558.883656.096458.591118.9845885.2588
RMSE22.650915.702157.751136.316244.705834.277
0.75870.6433-368.2597
MAPE63.757667.9023632.3944
ROBUST0.502964.97120.552670.35210.3759937.0026
STABLE0.44610.48161.3776
  • Metrics were practically the same as with pure group-features.
  • For items with no history, scores actually worsened slightly – synthetic data during validation seemed to confuse the model, as it had never encountered such values during training.

Visual Analysis

After the purely numerical comparison, the question arises: Does the combination of group-features and synthetic data actually add value, or is it just noise within normal stochastic variation? The answer can only come from visual analysis – and that’s exactly where interesting details appear that are easy to miss in aggregated metrics.

Models shown in the charts:

  • 🔴 Red – model with only item-level (ID) features.
  • 🟢 Green – model with added group features.
  • 🟣 Purple – model with synthetic data applied only during validation.
  • 🟠 Orange – model with synthetic data used during both training and validation.

Long sales history of the item

  • For items with complete history, the impact of group-features is minimal – the model already has enough information from the item’s own data.
  • The benefit of group-level information increases proportionally as history shortens – the fewer own data points, the more important the knowledge of patterns across the group becomes.

Short sales history of the item

  • Group-features (🟢 green) keep the level and trend much closer to real sales than the pure ID-model (🔴 red).
  • Synthetic data only during validation (🟣 purple) has a similar benefit to pure group-features, but sometimes scales less accurately.
  • Training with synthetic data (🟠 orange) more often pulls predictions back toward the group average, suppressing deviations.

No sales history of the item

  • Even here, group-features keep the trend and scale much closer to real sales than the red model.
  • Without them, the model falls back to the group average, which for some SKUs means dramatic underestimation or overestimation.
  • Synthetic data in training can provide a curve shape but loses real variability – results appear “smoothed” and react less to potential fluctuations.
  • Overall, predictions for items with no history remain unusable – without a single real sales data point, the model cannot estimate the correct sales scale

Group-features bring the greatest benefit for short histories, where they help keep both trend and scale close to reality. Synthetic data in training, on the other hand, often “smooths out” predictions and reduces variability. Without a single real sales data point, however, predictions remain unusable – the model has nothing to base an accurate sales scale on.