Anime Lyrics Visualizer App

Khalid - "Talk"
Khalid - "Talk"

Category:

Coding Projects

Company:

Self


Anime Lyrics Visualizer

Vision

This app started as a side quest during a hackathon focused around another idea I had for enriching song data with lyrics analysis. As I was wrapping up the main project a day ahead of schedule, I kept SZA's "SOS" lyrics visualizers playing on my secondary screen. After watching them loop for the 50th time, I thought to myself: "I wonder how hard it would be to build this."

This project became my "what if" experiment - a Go app that takes in an artist name and track title and transforms it into a fully-produced anime-style music video with sync'd, stylized lyrics and 90s VHS effects on a CRT screen. I only had one day to build this before demoing it alongside my main hackathon project.


Multi-API Orchestration
  • Spotify API for searching tracks based on a given artist name and track title which returned track metadata

  • Lyrics API for fetching lyrics based on a given artist name and track title

  • OpenAI API for generating a sequence of scenes based on a pre-defined prompt

  • DALL-E for creating images from the given scene descriptions


Prompt Engineering

One of the interesting problems for this projects was teaching GPT-4o to think like a 90s anime director.


prompt := fmt.Sprintf(`You are a visionary creative director hired to craft a
  cinematic storyboard for a music video inspired by the raw emotion and dreamlike
  beauty of 1990s Japanese anime.

  Your task is to analyze the following lyrics and develop a vivid, emotionally-charged
  sequence of 7 scenes. Each scene should feel like a painterly frame
  from an arthouse film — poetic, symbolic, and drenched in feeling. Evoke deep moods
  like longing, joy, heartbreak, euphoria, and transformation. Let emotion drive the
  imagery more than narrative.

  ...

  Base this on the lyrics to "%s" by %s: %s`, track, artist, lyrics)

reqBody := ChatRequest{
    Model: "gpt-4o",
    Messages: []ChatEntry{
        {
            Role: "user", 
            Content: prompt},
        },
}

bodyBytes, err := json.Marshal(reqBody)
if err != nil {
    return nil, fmt.Errorf("failed to marshal request: %w", err)
}

req, err := http.NewRequestWithContext(ctx, http.MethodPost,
    "https://api.openai.com/v1/chat/completions", bytes.NewBuffer(bodyBytes


Video Processing with FFmpeg

The heart of this project is a series of 7 FFmpeg video processing operations wrapped in Go to turn high-res static images into a grainy VHS-like video.


// 1. Dynamic Zoom-Pan: each image processed concurrently using goroutines
zoompan := `zoompan=z='zoom+0.0005':x='iw/2-(iw/zoom/2)+on/500':y='ih/2-(ih/zoom/2)+on/520':d=540:fps=30,scale=720:480`

cmd := exec.Command("ffmpeg",
    "-loop", "1",
    "-i", input,
    "-vf", zoompan,
    "-t", "9",
    "-c:v", "libx264",
    output,
)

// 2. Crossfade slideshow chaining multiple transitions
for i := range mp4Paths {
    filter.WriteString(fmt.Sprintf("[%d:v]setpts=PTS-STARTPTS[v%d];", i, i))
}

prev := "[v0]"
for i := 1; i < len(mp4Paths); i++ {
    offset := (8.0 - 1.0) * float64(i) 
    filter.WriteString(fmt.Sprintf(
        "%s[v%d]xfade=transition=fade:duration=1.0:offset=%.1f[v%dout];",
        prev, i, offset, i))
    prev = fmt.Sprintf("[v%dout]", i)
}

cmd := exec.Command("ffmpeg", append(inputs,
    "-filter_complex", filter.String(),
    "-map", prev,
    "-c:v", "libx264",
    outputPath)...)

// 3. Loop to variable song duration with fade-out applied
exec.Command("ffmpeg",
    "-stream_loop", "-1",
    "-i", inputPath,
    "-t", fmt.Sprintf("%d", durationSeconds+1),
    "-c:v", "libx264",
    tempPath,
)

fadeStart := durationSeconds - 1
exec.Command("ffmpeg",
    "-i", tempPath,
    "-vf", fmt.Sprintf("fade=t=out:st=%d:d=1", fadeStart),
    "-t", fmt.Sprintf("%d", durationSeconds),
    outputPath,
)

// 4. Lyrics burn-in with LRC to Advanced SubStation Alpha (ASS) conversion and custom animations
for i, line := range lines {
    start := parseASSTimeToSeconds(line.StartTime)
    end := parseASSTimeToSeconds(lines[i+1].StartTime) - 0.1 

    fmt.Sprintf(
        `Dialogue: 0,%s,%s,TikTok,,0,0,0,,{\an2\move(960,1020,960,1000)\fad(50,100)}%s`,
        formatASS(start), formatASS(end), strings.ToUpper(line.Text),
    )
}

cmd := exec.Command("ffmpeg",
    "-i", inputVideo,
    "-vf", fmt.Sprintf("ass=%s", assPath),
    "-c:a", "copy",
    outputVideo,
)

// 5. Add audio track to slideshow using stream copying
cmd := exec.Command("ffmpeg",
    "-i", videoPath,
    "-i", audioPath,
    "-c:v", "copy",   
    "-c:a", "aac",
    "-shortest",
    outputPath,
)

// 6. VHS effect filter chain (11 layered filters) to time-travel to my childhood
filter := `format=yuv420p,` +
    `lutrgb=r='val*1.15':g='val*0.85':b='val*1.05',` +
    `chromashift=cbh=2:cbv=-1:crh=-2:crv=1,` +        
    `boxblur=1:1,` +                                  
    `noise=alls=30:allf=t,` +                         
    `tmix=frames=3:weights='1 1 1',` +                
    `vignette,` +                                     
    `crop=in_w:in_h-10:0:'mod(t*12\,in_h)',` +        
    `hflip,hflip,` +                                  

    // Glitch bursts every 16s (0.3s duration)
    `chromashift=cbh=10:crv=10:crh=-10:cbv=-10:` +
    `enable='gte(t\,2)*lt(mod(t-2\,16)\,0.3)',` +
    // Create random glitch lines on the Y-axis
    `drawbox=x=0:y='mod(t*37\,ih)':w=iw:h=5:[email protected]:t=fill:` +
    `enable='gte(t\,2)*lt(mod(t-2\,16)\,0.3)',` +
    // CRT screen curvature
    `lenscorrection=k1=0.15:k2=0.15`                            

cmd := exec.Command("ffmpeg",
    "-i", inputPath,
    "-vf", filter,
    "-c:v", "libx264",
    "-c:a", "copy",
    outputPath,
)

// 7. Web-optimized compression
cmd := exec.Command("ffmpeg",
    "-i", inputPath,
    "-vf", "scale=-2:480",     
    "-c:v", "libx264",
    "-preset", "slow",         
    "-crf", "23",              
    "-c:a", "aac",
    "-b:a", "160k",
    "-movflags", "+faststart", 
    outputPath


Concurrent Image to Video Conversion

Each zoom-pan operation is independent, so I was able to leverage the worker pool pattern.


wg := sync.WaitGroup{}
errChan := make(chan error, len(images))

for i, image := range images {
    wg.Add(1)

    go func(i int, image, outputFile string) {
        defer wg.Done()

        _, err := ffmpeg.CreateZoomPanVideo(image, outputFile)
		if err != nil {
            errChan <- fmt.Errorf("failed to create zoom-pan for %s: %w", image, err)
        }
    }(i, image, outputFile)
}

wg.Wait()
close(errChan)

errs := make([]error, 0, len(images))
for err := range errChan {
    errs = append(errs, err)
}

if len(errs) > 0 {
    return fmt.Errorf("video generation failed: %v", errs)
}


Results

$ go run ./cmd/app/lyricsvisualizer/main.go -artist="khalid" -track="talk"

The system programmatically generates lyrics visualizers in less than 10 minutes. A sequence of 7 unique AI-generated scenes, sync'd animated lyrics, and a 90s VHS aesthetic with dynamically programmatic glitch effects.

The one-day constraint meant that I didn't get to implement all the features I wanted to, but maybe one day I'll dust this one off, deploy it, and use it to build my own lyrics visualizer YouTube channel like the ones that have inspired me.