When I first thought of using a Transformer model for stock price prediction, my approach was simple - treat time series data like NLP data. After all, in NLP, words are converted into vector embeddings where similar words have high dot products, which helps in calculating attention scores - the heart of a Transformer model. Could I apply a similar principle to financial time series data?
That’s where Time2Vec⏳ came in - a technique that introduces periodic patterns to linear time series data, much like positional encodings in NLP. My goal was to make stock prices more “Transformer-friendly,” enabling the model to effectively capture long-term dependencies in financial trends. However, the journey wasn’t straightforward. Several challenges in data preprocessing, feature engineering, model architecture, and loss function design had to be overcome to make the model perform well.
Let’s break down the entire process, the obstacles faced, and how I solved them.
In NLP, words are transformed into embeddings (like Word2Vec) and combined with positional encodings so that Transformers can understand the sequence of words. Similarly, for stock price prediction, we needed an approach that could capture both linear time variations and periodic market patterns.
🔹Time2Vec solves this problem by adding periodic properties to the time series data using sine and cosine functions with different frequencies.
🔹The peaks of the sine wave correspond to the most influential time steps, allowing the model to capture cyclical trends in stock movements.
🔹Essentially, Time2Vec acts like a combination of word embeddings and positional encoding for time series data.
📌The final input to our Transformer model consists of:
This way, we ensure that the model gets a rich set of features that combine temporal dependencies and financial indicators.
Data preprocessing is one of the most crucial parts of stock price prediction. Poor preprocessing can significantly degrade the model’s performance. Here’s what I learned the hard way:
Many stocks start with very low prices and gradually increase over time. This creates a left-skewed distribution. Applying a standard scaler on this kind of data forces most values near the mean, which distorts the stock’s actual price movements. As a result, the model fails to predict rising stock prices accurately.
Below is a comparison of how Word2Vec + Positional Encoding works in NLP versus Time2Vec + Stock Prices in time series:

🛑Solution: Instead of using StandardScaler or MinMaxScaler directly, I applied a combination of:
This solved the issue and made the data more suitable for Transformers, which prefer normal distributions. After prediction, the inverse transform is done using an exponential function.
Below is a comparison of predicted prices using Standard Scaler versus Log + Quantile Scaling:

Figure 3: Predicted Prices with Standard Scaler.

Figure 4: Predicted Prices with Log + Quantile Scaling.
Stock prices alone don’t provide enough information. I manually added additional indicators like:
These features help capture market momentum and trends, improving prediction accuracy.
The architecture consists of:
⚙Hyperparameters used:
d_model): 64📌The loss function was calculated on the close price feature.
I tested the model on various stocks, and here’s what I observed:
MinMaxScaler sometimes works better.I tested it on Tata Motors, Crude Oil, Apple, and Tesla. The performance was accurate for stable and declining stocks but slightly underestimated sharp upward trends.
Below are the actual vs. predicted price plots for four different stocks:

Figure 5: Tata Motors - Actual vs. Predicted Prices.
Test MSE: 871.0430
Test RMSE: 29.5134
Test MAPE: 3.73%

Figure 6: Crude Oil - Actual vs. Predicted Prices.

Figure 7: Apple - Actual vs. Predicted Prices.
Test MSE: 367.5187
Test RMSE: 19.1708
Test MAPE: 8.58%

Figure 8: Tesla - Actual vs. Predicted Prices.
Test MSE: 137.8093
Test RMSE: 11.7392
Test MAPE: 3.40%
This project was an exciting experiment in adapting NLP techniques for time series forecasting. By introducing Time2Vec and refining data preprocessing, I was able to significantly improve the Transformer model’s performance in stock price prediction.
🚀 Key Takeaways:
✔ Time2Vec effectively converts time series data into a Transformer-friendly format
✔ Log + Quantile transformations improve skewed stock data prediction
✔ Feature engineering is essential for capturing market trends
✔ Transformer models need a lot of data; small datasets perform poorly
While the model performs well in many scenarios, predicting sharp upward trends remains a challenge. Moving forward, hybrid models and modified attention mechanisms could further enhance its predictive power.
If you’re interested in exploring the code and results, check out my repository!
Would love to hear your thoughts! Feel free to drop a comment or reach out for collaborations! 🚀