Acoustics Behind the Angst: Fourier Analysis of Mitski’s Horror-Inspired Single
wavessignal-processingmusic-physics

Acoustics Behind the Angst: Fourier Analysis of Mitski’s Horror-Inspired Single

UUnknown
2026-02-24
10 min read
Advertisement

Use Mitski’s 'Where’s My Phone?' to learn Fourier, spectral analysis, and psychoacoustics with practical labs and modern 2026 tools.

Hook: From student confusion to aural intuition — learn Fourier through Mitski’s anxiety

Struggling with abstract concepts like the Fourier transform, wave superposition, or why some sounds make your skin crawl? You’re not alone. Many students and teachers hit a wall when math and perception separate: equations explain one thing, while the listening experience feels completely different. This article uses Mitski’s 2026 horror-tinged single "Where’s My Phone?" as a focused, practical case study to bridge that gap. By the end you’ll be able to map what you hear to measurable spectral features, reproduce core sound-design tricks in a DAW, and explain the psychoacoustic mechanisms that produce anxiety.

Executive summary — what you’ll learn (most important first)

  • How wave superposition and interference create beating, comb filters, and phase-induced textures in the track.
  • How to apply the Fourier transform and spectrograms to isolate harmonic vs. inharmonic content and transient events.
  • Which psychoacoustic cues (roughness, masking, timbral instability, spatial cues) Mitski’s team uses to evoke dread.
  • Step-by-step analysis methods (audio extraction, STFT parameters, peak-picking) with practical tasks you can run in Python or a DAW.
  • 2025–2026 trends in sound design and tools (real-time spectral editing, ML-based source separation, spatial audio) and how they change pedagogy.

Why analyze a pop single for physics and acoustics in 2026?

Mitski’s "Where’s My Phone?" is a compact, modern example where narrative, visual direction (a horror-inflected video), and carefully curated sound design converge to manipulate listener emotion. In late 2025 and early 2026, sound design workflows have increasingly blended classical spectral tools with machine learning and real-time spectral editors. Using a current commercial release gives students exposure to contemporary techniques and ethical discussions about creativity and technological augmentation. This case study is practical, reproducible, and directly relevant to exam-focused learners who need both conceptual clarity and portfolio-ready projects.

"No live organism can continue for long to exist sanely under conditions of absolute reality." — quoted in promotional materials for Mitski’s single (attributed to Shirley Jackson)

Spotlight on the track: key sound-design elements to analyze

Listen once for story, a second time for texture. In "Where’s My Phone?" pay attention to:

  • Layered noise and tone: breathy vocal sibilance sits with low rumble and ringing metallic partials.
  • Close-up vs distant mics: spatial cues alternate rapidly, creating an uncanny closeness/distance effect.
  • Micro-modulation: subtle amplitude and frequency modulation that produces roughness and beats.
  • Spectral smearing: reverbs and convolution tails that smear harmonic detail while preserving eerie high-frequency energy.
  • Transient emphasis: clicks and percussive hits placed to interrupt the listener’s expectation.

Wave superposition: the audible math of interference

At its simplest, superposition says that multiple waves add together. If two tones of similar frequency are present, listeners perceive beats. Mathematically we can write a two-tone sum as:
x(t) = A1 cos(2π f1 t + φ1) + A2 cos(2π f2 t + φ2)

When f1 and f2 are close, the envelope beats at |f1 − f2|. For example, a 440 Hz and a 442 Hz sine pair produces a beat frequency of 2 Hz; the waveform’s amplitude waxes and wanes twice per second. Producers use slightly detuned layers (vocal doubles, synth voices) to create temporal instability — this is a core ingredient of the track’s tension.

Comb filtering and phase interference

Comb filtering occurs when a direct sound and a delayed copy interfere — spectral notches appear at regular intervals. When Mitski’s vocal is mixed with a delayed, filtered phone- or hallway-like sample, comb filtering creates hollow, non-linear timbres that sound unnatural. The notches’ frequencies follow f_n = n/(delay), so measuring spectral minima in a spectrogram gives an estimate of delay time.

Practical micro-lab: hearing beats and comb filters

  1. Create two sine waves: 440 Hz (A4) and 442 Hz, equal amplitude. Play them together and note the 2 Hz beating.
  2. Record the sum and compute its Fourier transform — you’ll see two close peaks at 440 and 442 Hz.
  3. Now take one signal, delay by 5 ms, and sum with the original. Produce a spectrogram or listen for comb filtering (regular notches appear). The notch spacing is 200 Hz (1 / 0.005 s).

From time to frequency: applying the Fourier transform and STFT

The Fourier transform decomposes signals into sinusoids. For audio, the Short-Time Fourier Transform (STFT) gives time-varying spectral content — the spectrogram. Important parameters determine what you can see and hear:

  • Window length: longer windows improve frequency resolution (useful to separate close harmonics) but hurt time resolution (smearing transients).
  • Hop size (overlap): controls temporal sampling of the spectrogram and artifacting.
  • Window type: Hann, Hamming, Blackman affect sidelobes and peak leakage.

Step-by-step spectral analysis for Mitski’s track

  1. Extract the audio from the video (use respectful, legal means — short clips for analysis fall under many educational fair-use policies; check local policy).
  2. Normalize and high-pass at 20–30 Hz to remove subsonic rumble.
  3. Compute STFT with multiple resolutions: e.g., 2048-sample window at 44.1 kHz for fine frequency, and 512-sample window for transients.
  4. Plot both spectrograms (linear and log frequency) and a spectral centroid trace.
  5. Use peak-picking to extract harmonic series of voice and metallic elements; label harmonic vs. inharmonic partials.

Minimal Python example (librosa)

Below is a compact recipe you can paste into a notebook. Install librosa and matplotlib in your environment first.

import librosa
import librosa.display
import matplotlib.pyplot as plt
y, sr = librosa.load('wheres_my_phone.wav', sr=44100)
D = librosa.stft(y, n_fft=2048, hop_length=512, window='hann')
S_db = librosa.amplitude_to_db(abs(D))
plt.figure(figsize=(10,4))
librosa.display.specshow(S_db, sr=sr, hop_length=512, x_axis='time', y_axis='log')
plt.colorbar(format='%+2.0f dB')
plt.title('Log-frequency spectrogram (2048-pt)')
plt.show()
  

Interpreting spectral features: harmonics, noise, and inharmonic partials

Once you have a spectrogram, look for:

  • Harmonic stacks — evenly spaced peaks (fundamental f0 and integer multiples). Vocal harmonics will show as slanted stacks when pitch changes.
  • Inharmonic energy — metallic ringing or processed phone tones show up as partials that aren't integer multiples; these add grit and unpredictability.
  • Transient bursts — short high-energy events that alter perception even if they contain little sustained energy.
  • Noise bands — broadband energy centered in sibilant or breathy regions; crucial for perceived closeness and breath.

Psychoacoustics: why the sounds induce anxiety

Psychoacoustics links measurable signal features to perceived emotion. Mitski’s single leverages several well-known mechanisms:

  • Roughness: Amplitude or frequency modulation in the 20–150 Hz rate creates roughness — a sensation linked to urgency and discomfort. Subtle tremolo on vocal layers sits squarely in this band.
  • Unresolved harmonics: When partials shift unpredictably, the ear cannot lock onto a stable pitch, producing unease.
  • Masking: Sudden broadband events mask expected vocal timbres, causing a perceptual “loss” that raises tension.
  • Spatial ambiguity: Rapid alternation between near and far cues (dry vs wet, binaural panning) breaks the listener’s model of space and agency.
  • Predictive violation: Psychoacoustic models (and human predictive coding) suggest that the brain attunes to expected patterns; deliberate interruption (a click, a filtered cough) creates a prediction error felt as alarm.

Measured correlates you can extract

  • Spectral centroid: correlates with perceived brightness; spikes often align with moments of alarm.
  • Spectral flux: tracks change rate in the spectrum; high flux indicates sudden timbral change.
  • RMS envelope modulation depth: quantify amplitude modulation rates to see roughness.

Reproducing the sound-design tricks in a DAW — practical recipes

Try these concrete recreations to internalize the physics:

  1. Beating vocal double: Duplicate the vocal, detune by 1–3 Hz, pan slightly opposite, low-pass the duplicate and mix at −6 to −12 dB. Listen for beating and instability.
  2. Comb-filtered phone tone: Take a short tone, add 5–20 ms delay, use a bandpass around 800–3000 Hz, and sweep the delay to create moving spectral notches.
  3. Micro-modulation for roughness: Apply low-frequency AM at 30–80 Hz with shallow depth to a sub-bass or vocal layer for tension without pitch change.
  4. Spectral smear reverb: Convolution with an impulse response that emphasizes high-frequency tails and adds inharmonic ringing. Alternatively, freeze a short sound and granular-scrub it into a long tail.

Practice problems and assessment tasks

These tasks are classroom-ready and exam-aligned. Each includes a clear success metric.

  1. Compute the STFT of a 10-second clip and identify at least three frequency bands where spectral flux exceeds a chosen threshold. Success: correctly time-stamp 80% of high-flux events.
  2. Create a 5-second audio patch that produces a perceptible beating rate of 4 Hz. Success: FFT shows two peaks separated by 4 Hz and blind listeners report beating at ~4 Hz.
  3. Design a comb filter with notches at roughly 400 Hz, 800 Hz, and 1200 Hz. Success: measure notch frequencies and verify spacing equals 400 Hz (±5%).
  4. Measure the RMS modulation depth of a vocal segment and relate it to perceived roughness (use a small listening panel). Success: find a correlation between higher modulation depth and higher roughness ratings.

As of early 2026, workflows are changing rapidly:

  • Real-time spectral editing: Desktop tools now allow isolated attenuation or boosting of spectral components in real time, enabling surgical alteration of timbre without traditional EQ artifacts.
  • Machine learning source separation: Improvements in separation models (late 2025 releases) let educators extract stems with higher fidelity for analysis, making it easier to study isolated vocal harmonics or ambience.
  • Generative timbre transfer: Style-transfer models can impose the inharmonic ringing of a metallic object onto a vocal in seconds — a powerful creative and pedagogical tool (use responsibly and cite original creators).
  • Spatial audio and binauralization: Dolby Atmos and personal binaural renderers are now mainstream for streaming platforms. Teaching spatial cues and HRTF effects is critical for modern psychoacoustics curricula.

Teaching notes: how to run this case study in a classroom or tutoring session

Allocate a 2–3 hour block broken into:

  1. 15 min: guided listening and hypothesis formation (what makes the piece scary?).
  2. 30–45 min: hands-on STFT and spectrogram generation (two resolution settings).
  3. 45 min: DAW lab reproducing one effect (beating or comb filter).
  4. 30 min: psychoacoustic measurement and brief listening tests.
  5. 15–20 min: wrap-up, interpretation, and discussion of ethical and creative implications.

Use short clips for analysis under fair-use where applicable, always attribute sources, and avoid redistributing full stems unless you have rights. When using AI tools to recreate textures, be transparent about generative methods and respect artists' intent.

Actionable takeaways — what to do next (apply these immediately)

  • Download a short clip of the song for class (check fair-use) and generate two spectrograms: long-window and short-window.
  • Reproduce a beating pair and a comb-filtered tone in your DAW; document the parameter values (detune, delay) and the measured spectral features.
  • Measure spectral centroid and flux across a dramatic moment in the track and correlate those measures with perceived intensity using a 5-person listening panel.
  • Experiment with a modern ML source separation tool to isolate vocal texture — compare pre- and post-separation spectrograms to study leakage and model artifacts.

Final notes: connecting physics, perception, and creativity in 2026

Using Mitski’s "Where’s My Phone?" as a teaching case brings together concrete acoustics theory, reproducible signal-processing steps, and modern sound-design practice. Students learn more than formulas: they develop an auditory intuition that maps mathematics to emotion. As tools evolve (real-time spectral editing, ML separation, spatial audio), instructors must emphasize both technical literacy and ethical stewardship. This case study is intentionally modular — adapt it to AP, A-level, undergraduate, or studio courses.

Call to action

Ready to practice? Download the guided lab packet and starter Python notebooks at studyphysics.online/labs/acoustics-mitski (free for students and teachers). Join our next live workshop where we do a full in-class breakdown of the single, reproduce the core effects in REAPER/Ableton, and run listening tests with peers. Sign up now — spots fill fast for our 2026 workshops.

Advertisement

Related Topics

#waves#signal-processing#music-physics
U

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

Advertisement
2026-02-24T00:51:59.748Z