Convert SRT subtitles to plain text in JavaScript with regex

Convert srt to text or srt to txt in JavaScript: regex to remove index and timestamp lines, line-skip variant, block split for multi-line cues, Node readFileSync/writeFileSync with utf8—no npm install fs. Includes expected sample output.

Published

Updated

Read time 4 min read

Reviewed byDeepak Prasad

Convert SRT subtitles to plain text in JavaScript with regex

SubRip (.srt) subtitles are plain text, but they are structured: each block has a numeric index, a start → end time line, then the caption lines, separated by blank lines. When you convert srt to text or convert srt to txt, you usually want only the spoken lines—no cue numbers and no timestamps—whether for search indexing, LLM input, or a simple srt to text export in the browser or Node.js.

This article shows a regex-first way to strip cue headers, a line-skipping variant, and a small block parser that behaves well when a cue spans multiple lines. It also shows how to use the built-in fs module in Node without installing extra packages.

Tested on: Node.js v20.18.2. After each runnable block, a short note describes the plain-text dialogue you should see (blank lines may match the sample spacing).


Quick reference

Use this table for convert srt to text regex javascript pipelines.

Step Detail
Read file fs.readFileSync(path, "utf8") in Node
Strip cue start Regex on index + HH:MM:SS,mmm --> ... + newline
Multi-line cues Split blocks on blank lines; drop first two lines per block
Save srt to txt fs.writeFileSync("out.txt", plain, "utf8")

1. Strip cue headers with String.prototype.replace

Match the start of each cue: optional Windows line endings (\r?\n), a numeric index, newline, timestamp line, newline. Use the g and m flags so ^ applies to each line.

javascript
const srt = `1
00:00:51,916 --> 00:00:54,582
London in the 1960s.

2
00:00:54,708 --> 00:00:57,124
Everyone had a story about the Krays.
`;

// In Node.js you can load the same string with:
// const srt = require("node:fs").readFileSync("./captions.srt", "utf8");

const cueHeader =
  /^\d+\r?\n\d{1,2}:\d{2}:\d{2},\d{3} --> \d{1,2}:\d{2}:\d{2},\d{3}\r?\n/gm;

const plain = srt
  .replace(cueHeader, "")
  .replace(/\n{3,}/g, "\n\n")
  .trim();

console.log(plain);
Output

You should see only the two dialogue lines, separated by a blank line—no cue indices or timestamps.

\d{1,2} for the hour field covers common 00: and 01: style exports. After stripping, /\n{3,}/g collapses leftover gap lines so srt to text reads like paragraphs.


2. Line loop (skip index, time, and empty lines)

Another way to convert srt to text is to split lines and continue on index-only lines, timestamp-only lines, and blanks. Dialogue prints in cue order; blank lines between cues are removed here, so spacing differs from §1.

javascript
const srt = `1
00:00:51,916 --> 00:00:54,582
London in the 1960s.

2
00:00:54,708 --> 00:00:57,124
Everyone had a story about the Krays.
`;

const timeLine =
  /^\d{1,2}:\d{2}:\d{2},\d{3} --> \d{1,2}:\d{2}:\d{2},\d{3}$/;

for (const line of srt.split(/\r?\n/)) {
  if (/^\d+$/.test(line)) continue;
  if (timeLine.test(line)) continue;
  if (line.trim() === "") continue;
  console.log(line);
}
Output

Each dialogue line prints on its own console.log line with no blank line between them (the loop skips empty lines).


3. Block split for multi-line cues

If a cue has two dialogue lines, a regex that only removes the header still leaves both lines—good—but a naive “skip two lines per block” without grouping can get wrong. Splitting on blank-line-separated blocks is easy to reason about for convert srt file to text when cues span multiple lines:

javascript
function srtToPlain(srt) {
  return srt
    .replace(/\r\n/g, "\n")
    .trim()
    .split(/\n\s*\n/)
    .map((block) => block.trim().split("\n").slice(2).join("\n"))
    .filter(Boolean)
    .join("\n\n");
}

const srt = `1
00:00:00,000 --> 00:00:02,000
Hello there
General Kenobi

2
00:00:02,100 --> 00:00:04,000
Second cue
`;

console.log(srtToPlain(srt));
Output

You should see a block with Hello there and General Kenobi, a blank line, then Second cue—multi-line cues stay grouped.


4. Node.js file I/O (built-in fs)

For convert srt to text file scripts, read UTF-8 text with the built-in node:fs module—do not run npm install fs; fs is part of Node. Apply the same cueHeader chain as in §1 to the string returned by readFileSync(path, "utf8"). Writing convert srt to txt is then writeFileSync("out.txt", plain, "utf8").


Summary

  • Regex replace removes standard cue headers; validate against malformed or WebVTT inputs.
  • Line loops and block splits trade simplicity for multi-line cue safety.
  • Always use utf8 when reading and writing; avoid npm fs.

References

MDN, Node.js, and community references for regex-based SRT cleanup.


Frequently Asked Questions

1. How do I convert srt to text in JavaScript?

Read the file as a UTF-8 string, then remove each cue header (the numeric index line and the HH:MM:SS,mmm --> HH:MM:SS,mmm line). You can do that with a multiline regex replace or by splitting on blank lines and dropping the first two lines of each block.

2. Do I need npm install fs for Node.js?

No. The fs module is built into Node.js. Use require('node:fs') or require('fs'); installing fs from npm is unnecessary and confusing.

3. Can one regex convert srt to txt for every file?

Simple patterns work for standard SubRip cues. They can fail on malformed files, WebVTT (.vtt) headers, unusual timestamp shapes, or dialogue lines that look exactly like a timestamp or a lone number.

4. How do I handle multi-line subtitle cues?

A block-based approach—split on blank lines, skip the first two lines of each block (index and time), join the rest—handles multiple dialogue lines per cue more predictably than ad-hoc line skipping alone.

5. What encoding should I use when I convert srt file to text?

Use utf8 when reading and writing. SRT files are plain text but may include Unicode letters; wrong encoding corrupts characters.

6. Is srt to text the same as srt to txt?

Usually yes: both mean plain dialogue without timing, saved as .txt or used as a string. The on-disk extension is a convention, not a different format.
Olorunfemi Akinlua

Boasting over five years of experience in JavaScript, specializing in technical content writing and UX design. With a keen focus on programming languages, he crafts compelling content and designs …