This guide is a practical FAISS Python API reference: you build NumPy float32 embedding matrices, pick an index type, add / search, optionally attach IDs and remove_ids, train IVF indexes, tune nprobe, and save or load indexes on disk. It is not a generic vector-database survey; it stays close to dense vectors, shapes, dtypes, and the calls that break in real projects.
FAISS (Facebook AI Similarity Search) is optimized for similarity search and clustering over dense vectors—typical inputs are embedding rows from models (image/text/audio). Think vectors in, (distances, neighbor indices) out; optional custom IDs when you must map neighbors back to rows in your database. For ML taxonomy context, see types of machine learning; for NumPy matrix shaping before indexing, see combine two column matrices in Python.
Tested on: Python 3.10+ with NumPy and
faiss-cpu(PyPI); Linux x86_64.
What is FAISS in Python?
FAISS stores d-dimensional vectors and answers k-nearest-neighbor queries under a metric (L2 or inner product). The Python API wraps the C++ core: you work with numpy.ndarray objects, call faiss.index_factory or constructors such as IndexFlatL2, IndexIVFFlat, then add, train (when required), and search. For RAG or semantic search, embeddings from a model become rows of a matrix (num_vectors, dim); FAISS does not run the model—it only searches vectors you pass in.
Install FAISS for Python
For CPU-only machines, install faiss-cpu from PyPI (this is the supported CPU wheel name; the bare faiss package name on PyPI is not the one you want for routine installs):
pip install faiss-cpu numpyGPU builds depend on CUDA version, driver, and platform; Conda-forge or Facebook’s build instructions often fit GPU stacks better than a one-line pip. If pip install faiss-gpu fails on your machine, treat GPU setup as a separate environment task and stay on faiss-cpu until the stack matches upstream wheels.
Quick sanity check:
python -c "import faiss, numpy as np; print(faiss.__version__, np.__version__)"FAISS Python API basic workflow
End-to-end pattern:
- Build database vectors
xbasfloat32, shape(N, d). - Create an index for dimension
d(e.g.IndexFlatL2(d)). index.add(xb)(oradd_with_idson anIndexIDMap).- Prepare queries
xqasfloat32, shape(Q, d). D, I = index.search(xq, k)— distancesD, neighbor positions or IDsI, both shape(Q, k).
import numpy as np
import faiss
d = 64
nb = 1000
nq = 10
np.random.seed(0)
xb = np.random.random((nb, d)).astype("float32")
xq = np.random.random((nq, d)).astype("float32")
index = faiss.IndexFlatL2(d)
index.add(xb)
k = 4
D, I = index.search(xq, k)
assert D.shape == (nq, k) and I.shape == (nq, k)NumPy array requirements for FAISS
This section targets IndexFlatL2 + NumPy float32 issues that show up often in search logs.
- 2-D only:
xb.shape == (N, d)andxq.shape == (Q, d). A single query must still be(1, d), not(d,). dmatches the index: constructorfaiss.IndexFlatL2(d)fixesd;add/searchlast dimension must equalindex.d.- dtype: use
numpy.float32(astype("float32")).float64can error or be rejected depending on build and call path. - C-contiguous: after slicing or transposing, call
np.ascontiguousarray(x, dtype=np.float32)if you see errors about non-contiguous buffers.
import numpy as np
d = 8
bad_row = np.random.rand(d) # shape (d,) — wrong for add
good = np.random.rand(1, d).astype("float32") # shape (1, d) — OK for one queryCreate a simple IndexFlatL2 index
IndexFlatL2 is the exact L2 (Euclidean) index: at search time it compares the query to all stored vectors. It is ideal for learning the API and for small or medium N where brute force is acceptable.
import faiss
d = 128
index = faiss.IndexFlatL2(d)Add vectors to a FAISS index
index.add(xb) requires xb shape (N, d), float32, same d as the index. Typical failures: wrong dtype, 1-D array, transposed shape (d, N), or d mismatch.
import numpy as np
import faiss
d = 32
index = faiss.IndexFlatL2(d)
xb = np.random.random((500, d)).astype("float32")
index.add(xb)
print(index.ntotal) # 500Search vectors with FAISS
search(xq, k) returns D, I: for each of the Q queries, the k smallest distances (for L2) and the indices of neighbors in the index (row order for a flat index without ID mapping). k cannot exceed index.ntotal unless the index allows it (for an empty index, expect errors).
import numpy as np
import faiss
d = 32
index = faiss.IndexFlatL2(d)
xb = np.random.random((500, d)).astype("float32")
index.add(xb)
k = 5
D, I = index.search(xb[:3], k)
assert D.shape == (3, k)IndexFlatL2 vs IndexFlatIP
| Index | Metric | Typical use |
|---|---|---|
IndexFlatL2 |
Squared L2 distance | Euclidean nearest neighbors |
IndexFlatIP |
Inner product | Maximum dot-product; for cosine similarity, L2-normalize rows to unit length then use IP (or dedicated cosine preprocessing from the wiki) |
import faiss
d = 16
index_l2 = faiss.IndexFlatL2(d)
index_ip = faiss.IndexFlatIP(d)Use IDs with IndexIDMap
Plain IndexFlatL2 assigns implicit sequential IDs 0 .. ntotal-1. To attach your own int64 IDs (database primary keys, chunk ids), wrap the base index with faiss.IndexIDMap and use add_with_ids.
import numpy as np
import faiss
d = 8
nb = 100
xb = np.random.random((nb, d)).astype("float32")
ids = (np.arange(nb) + 1000).astype("int64") # custom IDs
base = faiss.IndexFlatL2(d)
index = faiss.IndexIDMap(base)
index.add_with_ids(xb, ids)
k = 3
D, I = index.search(xb[:2], k) # I contains custom IDs where applicableadd_with_ids is only valid when the stack supports it (ID-mapped wrappers); otherwise use add and track the mapping yourself.
Remove vectors with remove_ids
remove_ids deletes vectors whose IDs match the selector. On large flat structures, removal can scan storage; the wiki notes removal patterns and performance depend on index family. Always check ntotal after removal to confirm.
import numpy as np
import faiss
d = 4
xb = np.random.random((20, d)).astype("float32")
ids = np.arange(200, 220, dtype="int64")
index = faiss.IndexIDMap(faiss.IndexFlatL2(d))
index.add_with_ids(xb, ids)
index.remove_ids(np.array([210], dtype="int64"))
print(index.ntotal)If remove_ids appears to “do nothing,” confirm you used IndexIDMap, passed int64 IDs, and that IDs exist.
Create an IndexIVFFlat index
IndexIVFFlat is an IVF (inverted file) approximate index: vectors are partitioned into nlist lists for faster search at scale. Construction needs:
- a quantizer (often
IndexFlatL2(d)), - dimension
d, nlist(number of clusters / lists).
import numpy as np
import faiss
d = 16
nlist = 10
quantizer = faiss.IndexFlatL2(d)
index_ivf = faiss.IndexIVFFlat(quantizer, d, nlist)Train IndexIVFFlat before adding vectors
IVF indexes must learn the partition structure. Call train(x_train) on a representative sample with the same d, usually float32, before add. Skipping train is a common runtime error.
import numpy as np
import faiss
d = 16
nlist = 8
xb = np.random.random((5000, d)).astype("float32")
quantizer = faiss.IndexFlatL2(d)
index = faiss.IndexIVFFlat(quantizer, d, nlist)
index.train(xb)
index.add(xb)Tune nprobe for IVF search
At query time, index.nprobe controls how many IVF lists are visited. Higher nprobe improves recall but costs more distance work and time. Start small (for example nlist // 32 or 1) and increase until recall is acceptable on a held-out query set.
import numpy as np
import faiss
d, nlist = 16, 8
xb = np.random.random((2000, d)).astype("float32")
index = faiss.IndexIVFFlat(faiss.IndexFlatL2(d), d, nlist)
index.train(xb)
index.add(xb)
index.nprobe = 4
D, I = index.search(xb[:5], 10)Save and load a FAISS index
Persist to disk with write_index / read_index (paths are local; encrypt or ACL-protect sensitive embedding stores).
import faiss
import tempfile
import os
d = 8
index = faiss.IndexFlatL2(d)
fd, path = tempfile.mkstemp(suffix=".index")
os.close(fd)
faiss.write_index(index, path)
loaded = faiss.read_index(path)
os.remove(path)
assert loaded.d == dCommon FAISS Python errors
| Symptom | Likely cause | What to check |
|---|---|---|
dtype / type error on add |
float64 or object dtype |
xb.astype("float32"), np.ascontiguousarray |
add: dimension mismatch |
Wrong trailing dimension | xb.shape[1] == index.d |
search: wrong shape |
Query passed as 1-D | Reshape to (1, d) or (Q, d) |
| IVF assert / train error | train not called or too few points |
Call train before add; ensure enough vectors vs nlist |
add_with_ids unsupported |
Base index not wrapped | Use IndexIDMap / supported stack |
remove_ids surprising result |
Wrong ID type or ID not present | int64 IDs, verify membership |
pip install faiss confusion |
Wrong package name on PyPI | Prefer faiss-cpu for CPU wheels |
# Before add or search (vectors shape (N, d), float32):
assert vectors.shape[1] == index.d, (vectors.shape[1], index.d)Which FAISS index should you use?
| Use case | Good starting index |
|---|---|
Learning, exact L2, smaller N |
IndexFlatL2 |
| Cosine-like with unit vectors | Normalize rows, then IndexFlatIP |
Larger N, approximate search |
IndexIVFFlat (train + nprobe) |
| Custom vector IDs | IndexIDMap / IndexIDMap2 + add_with_ids |
| Delete by custom ID | IDMap + remove_ids (check supported combinations) |
| Memory pressure at scale | PQ / IVFPQ and other compressed indexes (see wiki) |
FAISS Python API cheat sheet
| Task | Typical call |
|---|---|
| Install (CPU) | pip install faiss-cpu numpy |
| Flat L2 index | faiss.IndexFlatL2(d) |
| Add vectors | index.add(xb) with xb (N,d) float32 |
| Search | D, I = index.search(xq, k) |
| Inner product | faiss.IndexFlatIP(d) (often with normalized vectors) |
| Custom IDs | faiss.IndexIDMap(base) + add_with_ids |
| Remove IDs | index.remove_ids(...) |
| IVF index | faiss.IndexIVFFlat(quantizer, d, nlist) |
| Train IVF | index.train(x_train) before add |
| IVF recall / speed | Tune index.nprobe |
| Save / load | faiss.write_index, faiss.read_index |
Official references: FAISS GitHub, Wiki, Getting started.
Summary
This article positions FAISS as a Python API for dense vector search: float32 NumPy arrays shaped (n, d), IndexFlatL2 for exact L2 search, IndexFlatIP when inner product (often with normalized embeddings) matches your metric, IndexIVFFlat with train then add and nprobe for approximate IVF search, IndexIDMap with add_with_ids and remove_ids for application-level IDs, and write_index / read_index for persistence. The troubleshooting table maps common mistakes—dtype, shape, untrained IVF, and ID semantics—to quick checks. For broader ML context, see types of machine learning; for NumPy shaping habits, see NumPy column stacking.

