Stay free if you only need basic features. Upgrade if you need advanced features. Most solo builders can start free.
Embeddings convert text or images into dense vectors that you store in a vector database for approximate nearest-neighbor retrieval — this is the first-stage recall step. The reranker is a cross-encoder that takes a query and a shortlist of candidate documents (typically the top 50-100 from vector search) and scores them jointly, producing a much more accurate final ordering. Most production RAG pipelines use both: embeddings for fast recall, reranker for precision before passing context to the LLM.
Reader (r.jina.ai) is purpose-built to produce LLM-friendly output: it renders JavaScript, strips navigation, ads, cookie banners, and boilerplate, then returns clean Markdown with preserved structure (headings, lists, links, tables). Traditional scrapers return raw HTML that wastes context tokens and confuses models. Reader also handles PDF extraction, image captioning via vision models, and can be called with a single GET request — just prefix any URL with r.jina.ai/.
Yes. Most Jina embedding and reranker models are released with open weights on Hugging Face under Apache 2.0 or CC-BY-NC licenses (check each model card). You can run them locally with sentence-transformers, vLLM, or Text Embeddings Inference. The hosted API still tends to be cheaper than self-hosting for small to mid-scale workloads once you factor in GPU costs, but self-hosting is the right choice for air-gapped or strict-data-residency deployments.
DeepSearch is an agentic endpoint that takes a complex research question and autonomously runs multiple search-read-reason iterations until it produces a cited, grounded answer — similar in concept to Perplexity Pro or OpenAI Deep Research. Use it for questions requiring synthesis across multiple sources (market research, technical comparisons, fact-checking) rather than simple lookups. For single-shot queries, the Search API (s.jina.ai) is faster and cheaper.
Jina uses a unified token-based credit system: you purchase tokens and they are consumed by whichever endpoint you call, at different rates per service (embeddings are cheapest, DeepSearch most expensive per call due to multi-step reasoning). New API keys receive 10 million free tokens with no credit card required. Beyond that, you top up pay-as-you-go without monthly commitments, which is unusual in the embeddings market where most competitors require enterprise contracts at scale.
Start with the free plan — upgrade when you need more.
Get Started Free →Still not sure? Read our full verdict →
Last verified March 2026