Bash / Linux: count string or word occurrences in a file (`grep`, `wc`, `awk`)

Count how many times a word or substring appears in a file, count lines and words with wc, whole-word matches with grep -E, and build a frequency table with awk or sort/uniq.

Published

Updated

Read time 5 min read

Reviewed byDeepak Prasad

Bash / Linux: count string or word occurrences in a file (`grep`, `wc`, `awk`)

Counting how often a string appears in a file with Bash usually means one of three jobs: count lines that match, count non-overlapping occurrences of a substring, or count whole words (tokens) and maybe rank them. wc answers bash word count for whole files; grep answers grep word count for patterns; awk (or sort/uniq) builds a linux count specific word in file style frequency table.

Tested the commands below on Ubuntu 25.04, kernel 6.14.0-37-generic, Bash 5.2.37, using a small demo file under /tmp/glc-count-demo.txt built from the same lines shown in the first code block.


Demo file used for the samples

Save this as /tmp/glc-count-demo.txt (or any path) so the numbers in the snippets match what you see when you paste them:

text
count occurrences of word in file linux
shell script to count number of words in a file
count occurrences of all words in file linux
shell script to count number of lines in a file without using wc command
shell script to counts number of lines and words in a file
find count of string in file linux
shell script to counting number of lines words and characters in a file
count number of lines in a file linux

Linux bash count lines in file and bash count words in file (wc)

linux bash count lines in file:

bash
wc -l </tmp/glc-count-demo.txt
text
8

bash count words in file (whitespace-separated tokens across the whole file):

bash
wc -w </tmp/glc-count-demo.txt
text
79

wc is the right first tool when someone says bash word count for an entire file without caring which word.


Grep word count: lines vs occurrences

grep -c counts lines with at least one match. grep -o prints one line per match, so piping to wc -l counts occurrences.

Whole-word count (GNU grep word boundaries \<\>):

bash
grep -E -c '\<count\>' /tmp/glc-count-demo.txt
text
6

Substring count (also inside counts, counting, …):

bash
grep -E -c 'count' /tmp/glc-count-demo.txt
text
8

Here eight lines contain the substring count somewhere (including inside counts / counting). That is still not the same as eight occurrences—for that, keep reading.

bash
grep -o 'count' /tmp/glc-count-demo.txt | wc -l
text
8

Tiny file where line count and occurrence count differ (save as /tmp/glc-count-demo2.txt):

text
count count on one line
single count here
bash
grep -E -c '\<count\>' /tmp/glc-count-demo2.txt   # lines with a whole word "count"
grep -oE '\<count\>' /tmp/glc-count-demo2.txt | wc -l
text
2
3

So for linux count occurrences of string in file, decide up front whether you need per-line (-c) or per-match (-o … | wc -l). On modern systems prefer grep -E over egrep.


Linux count occurrences of string in file with tr and grep

Splitting on spaces, then filtering, is another grep word count pattern:

bash
tr ' ' '\n' </tmp/glc-count-demo.txt | grep -E '\<count\>' | wc -l
text
6

This only sees space-separated chunks; punctuation stays attached unless you strip it, so it is weaker than the grep -Eo '\b…\b' approach in the next section.


Bash count words in string

bash count words in string is the same wc idea on a variable:

bash
s='one two three two'
printf '%s\n' "$s" | wc -w
text
4

In Bash you can also use a parameter array if you split explicitly, but wc -w is hard to beat for a quick check.


Linux count specific word in file: frequency table (grep + sort + uniq)

To rank every alphabetic token, pull words then aggregate:

bash
grep -Eo '\<[[:alpha:]]+\>' /tmp/glc-count-demo.txt | sort | uniq -c | sort -nr | head
text
8 of
      8 in
      8 file
      6 count
      5 number
      5 a
      4 words
      4 to
      4 shell
      4 script
      4 linux
      4 lines

That is a compact answer to linux count specific word in file when “word” means letters only tokens. For heavier parsing (hyphens, apostrophes, UTF-8), move to awk or a small Python script.


Script: frequency of words in one file (awk)

Same idea as the old helper script, with slightly safer Bash: quote the path, use grep -E, and exit with a normal code.

bash
#!/usr/bin/env bash
set -euo pipefail

if [[ $# -ne 1 ]]; then
  echo "Usage: $0 filename" >&2
  exit 1
fi

grep -Eo '\<[[:alpha:]]+\>' "$1" |
  awk '{ c[$0]++ }
       END { printf "%-14s%s\n", "Word", "Count"
             for (w in c) printf "%-14s%d\n", w, c[w] }' |
  sort -k2,2nr -k1,1

Pair that with the awk tutorial when you need custom field separators instead of “letters bounded by word edges”.


Bash for count: loop counter

When people search bash for count, they often want a loop index or running total:

bash
total=0
for word in $(grep -Eo '\<[[:alpha:]]+\>' /tmp/glc-count-demo.txt); do
  [[ $word == count ]] && total=$((total + 1))
done
echo "$total"
text
6

For big inputs, do not build a giant $(…) word list—stream with while read instead.



Summary

Use wc -l and wc -w for linux bash count lines in file and bash count words in file. For bash count occurrences of string in file, use grep -o 'pattern' | wc -l when you need every match, and grep -E -c '\<word\>' only when lines with a whole word are enough. For a quick frequency table of how often tokens appear, combine grep -Eo with sort | uniq -c; awk scales when token rules get messy. printf '%s\n' "$s" | wc -w covers bash count words in string, and a for loop with a counter covers the usual bash for count pattern—prefer streaming on large files.

Deepak Prasad

R&D Engineer

Founder of GoLinuxCloud with more than 15 years of expertise in Linux, Python, Go, Laravel, DevOps, Kubernetes, Git, Shell scripting, OpenShift, AWS, Networking, and Security. With extensive …