Spectrogram Analysis of Trading Data — Feasibility

Parent: ideas. Assesses the idea of time-frequency transforming market data into spectrograms and feeding them to a CNN/ANN for pattern recognition.

Verdict

Plausible and an active research area. The transform is mechanically sound; the binding constraint is the weak, non-stationary, low-SNR nature of the signal — not the method. The strongest spectral structure lives in trading activity (volume, trade arrivals), not in price/returns, so this should be approached as a multi-channel problem with volume/order-flow channels, not a price spectrogram alone.

Prior Work

Two strands, and the idea sits at their intersection:

1. Image-encoding of price series for CNNs — well established. Gramian Angular Fields (GADF/GAF), recurrence plots, raw candlestick/bar-chart images. E.g. Sezer & Ozbayoglu label points Buy/Sell/Hold and train a 2-D CNN on the image conversion (Dow Jones 30 + ETFs).

2. Actual spectrogram/wavelet → CNN — the direct match. Du et al. (2020) ran discrete and continuous wavelet transforms on five signals chosen via the Maximum Information Coefficient, converted them to spectrograms, and used a CNN to forecast intraday Up/Down. Key finding: shallow networks beat deep ones, attributed to financial data's low SNR and the scarcity of price-prediction data. This inverts the audio/vision norm and is a design warning.

A 2025 variant (GADF + Spectral Relevance Analysis on BTC) adds relevance-based clustering: trade only when the current window's relevance map matches a high-performance cluster.

Where the Signal Actually Lives

Robust spectral structure exists in trading activity, far less in price. Wu, Zhang & Dai (Spectral Volume Models, Management Science 2025) recover persistent, universal high-frequency periodicities in intraday volume across US and Chinese markets; dominant frequencies explain a significant fraction of intraday volume variance, likely reflecting algorithms issuing repeated, regular instructions. There is also the trivial ~255-day annual cycle and the intraday volume U-shape.

Implication: feed volume / order-flow channels, not just a price spectrogram. Multi-channel input (price, volume, spread as separate spectrogram channels) is the right instinct.

Data Granularity

Availability: fine-grained data exists down to individual trades and full limit-order-book updates. Tick-by-tick ultra-high-frequency (UHF) data is increasingly available as storage/gathering costs fall. Cost is the gate, not existence — paid (LSEG/Refinitiv, Polygon, Databento, AlgoSeek, dxFeed); free genuine tick data from crypto exchange APIs.

The subtlety that breaks naive spectrograms: tick data is irregularly sampled; STFT/FFT assume uniform sampling. The standard fix — resample onto a fixed grid — is exactly what the microstructure literature warns against: resampling/interpolating UHF data introduces spurious data (microstructure noise) and loses information. Use a transform built for irregular sampling (Lomb–Scargle periodogram/FT) or switch to event-time / volume-clock bars.

Candlesticks vs Individual Trades

Candlesticks (OHLC bars) — pre-aggregated, uniformly spaced, FFT-ready and convenient, but aggregation has already destroyed the micro-timing where the algo-periodicity signal lives. Fine for a first prototype; weak for the real hypothesis.

Individual trades — preserve the arrival-time structure carrying the strongest spectral signal, but you must respect irregular sampling (Lomb–Scargle, or volume/event-clock bars rather than wall-clock bars) or you will be imaging resampling artefacts rather than market structure.

Pragmatic Approach (ladder)

1. Prototype on minute bars with scipy.signal.stft to validate the end-to-end pipeline.

2. Move to volume-clock or tick bars; add volume and spread channels.

3. For the activity-periodicity signal, go to trade-level with Lomb–Scargle and order-flow channels.

4. Keep the network shallow (Du et al.); guard against lookahead bias — the STFT window must use only past data relative to the label timestamp.

Note on “Even” Convolutions

Intended meaning is even-symmetric (zero-phase) kernels — e.g. the real/cosine part of a Gabor filter — not even-sized kernels (2×2, 4×4). Even-sized kernels are avoided in vision because they break spatial symmetry/alignment. Even-symmetric kernels are a defensible inductive bias for spectrogram input (phase-coherent, no directional bias), but non-standard — would need validation.

References

• Du et al. (2020), Image Processing Tools for Financial Time Series Classification, arXiv:2008.06042 — wavelet→spectrogram→CNN; shallow beats deep.

• Wu, Zhang & Dai (2025), Spectral Volume Models: Universal High-Frequency Periodicities in Intraday Trading Activities, Management Science; SSRN 4230610.

• Sezer & Ozbayoglu (2018), Algorithmic Financial Trading with Deep CNN: Time Series to Image Conversion Approach.

Analysis of ultra-high-frequency financial data using advanced Fourier transforms (Lomb–Scargle for irregular tick data), Economics Letters / ScienceDirect S1544612308000603.

• GADF + Spectral Relevance Analysis (BTC, 2025), Expert Systems with Applications, ScienceDirect S0957417425031707.

Major Issues in High-Frequency Financial Data Analysis: A Survey (2025), MDPI Mathematics 13(3):347 — wavelet preprocessing for HFT.

version 1  ·  created 2026-05-31  ·  updated 2026-05-31  ·  tags ['ideas', 'ml', 'signal-processing', 'trading']