Who Will Teach AI Hebrew? The Race to Build the Hebrew Internet's AI Brain

By The Olam Editorial Staff · Jun 10, 2026

Every major AI speaks Hebrew. None were built for it. Inside the race to control what ChatGPT, Claude, Gemini, and Perplexity know about Hebrew and Jewish texts.

Originally published Jun 2026. Updated Jun 2026.

Hebrew AI is the discipline of building, training, and tuning artificial intelligence systems to read, reason in, and generate the Hebrew language across its modern, rabbinic, Biblical, and Aramaic layers — and to control the citations Hebrew speakers see when they ask the chatbox a question.

Every major AI speaks Hebrew. None were built for it.

ChatGPT answers in Hebrew. Claude answers in Hebrew. Gemini answers in Hebrew. Perplexity answers in Hebrew. Yet Hebrew represents a fraction of one percent of the text these systems were trained on.

The question is no longer whether AI can speak Hebrew. The question is who decides what AI knows about Hebrew — and who controls the answer when a buyer, a student, a court, or a rabbi types a question in Hebrew into the chatbox.

That battle is already underway. Inside universities. Inside startups. Inside government agencies. Inside publishers, archives, and the largest technology companies on earth. The winners will not be the firms with the biggest models. They will be the institutions that control the deepest Hebrew datasets, the strongest archives, and the most trusted sources of knowledge in the language.

Hebrew AI is the term. The race is on. Israel is not guaranteed to win it.

1. Why Language Is the Next AI Battleground

Most of the AI discussion centers on chips, data centers, capital, and frontier models. Language is the quieter fight — and the more consequential one.

English dominates AI training data. By most independent estimates, English accounts for roughly half of the public web and a larger share of high-quality text. Chinese, Spanish, Arabic, French, German, Portuguese, and Russian make up most of the rest. Hebrew sits in the long tail, alongside Greek, Czech, and Norwegian — small-language territory.

Small-language status carries three costs. Models hallucinate more. Retrieval is shallower. And the cultural frame inside the answer drifts toward whichever majority language carried the underlying training signal. An AI answering a question about Israeli history in Hebrew may be reasoning from English-language sources translated, summarized, and flattened along the way.

That is the real stake. AI systems do not merely translate languages. They absorb cultures. The language a model learns first becomes the lens through which it interprets every other language it touches. For Hebrew — a language carrying three thousand years of legal, religious, philosophical, and literary tradition — that lens matters.

If Hebrew AI is built on English-translated Hebrew, the answer inside the chatbox will be Israeli content rendered through an American filter. That is not a translation problem. That is an authority problem.

2. How Much Hebrew Exists Online?

Every investor evaluating the Hebrew AI category should start with one question: how large is the Hebrew internet?

The answer is smaller than most assume — and richer than the size suggests.

Hebrew news

Yedioth Ahronoth, Haaretz, Israel Hayom, Globes, Calcalist, The Marker, Walla, Ynet, Maariv, Channel 12, Channel 13, Kan, Arutz 7, and dozens of regional and sector publications produce Hebrew-language journalism daily. Several of those archives go back decades. Most are paywalled. Most are not licensed for AI training. Most are not structured for machine retrieval.

Government information

Knesset proceedings. Supreme Court rulings. Ministry publications. The State Comptroller archive. Municipal records. Israeli law in its native Hebrew form is one of the most complete digital legal corpora in the small-language world — and almost none of it has been made AI-ready.

Academic research

The Hebrew University of Jerusalem, Tel Aviv University, the Technion, Bar-Ilan, Ben-Gurion, Reichman, Weizmann, and the Open University publish Hebrew-language scholarship across the humanities, social sciences, law, and Jewish studies. Much of it sits inside institutional repositories that are not crawled by general-purpose AI training pipelines.

Religious literature

This is where Hebrew becomes globally significant. The Tanakh. The Mishnah. The Babylonian and Jerusalem Talmuds. Rashi, Rambam, Ramban, the Shulchan Aruch, the responsa literature, kabbalistic texts, modern halakhic works. Sefaria has digitized a vast portion of this corpus in structured, linked, machine-readable form. The National Library of Israel holds millions of additional pages of manuscript and print material in various states of digitization.

Social media and creator content

Hebrew YouTube. Hebrew TikTok. Hebrew X. Hebrew Facebook groups. Hebrew Telegram. Hebrew podcasts. The newest layer of the Hebrew internet — and the one most likely to shape how the next generation of Hebrew speakers actually uses the language.

Hebrew may have one of the richest text traditions in human history and one of the smallest digital footprints among advanced economies. That gap is the opportunity.

3. The Data Problem

AI is only as good as the data it learns from. For Hebrew, the data picture is uneven.

Public data

Hebrew Wikipedia. Open-licensed government publications. Sefaria's open Jewish texts library. A small number of academic open-access corpora. This is what most frontier models actually train on. It is a fraction of what exists.

Commercial data

Israeli newspaper archives. Magazine archives. Hebrew book publishers. Television and radio transcripts. Almost all of it copyrighted, almost none of it licensed for training, and the licensing market for Hebrew content has barely begun to form.

Government data

Available in theory. Difficult to retrieve in practice. Hebrew government data sits across dozens of agencies, in inconsistent formats, with no central machine-readable index. The data exists. The infrastructure to use it does not.

Religious data

Sefaria has done more for Hebrew AI than any single institution. Structured, cross-referenced, linked, openly licensed Jewish texts in Hebrew, Aramaic, and English translation. Bar-Ilan University's Responsa Project — the deepest digital corpus of rabbinic literature in the world — is licensed, restricted, and not openly available to AI developers.

Proprietary archives

Kibbutz archives. IDF archives. Jewish Agency archives. The archives of major Israeli families, foundations, and institutions. The Yad Vashem testimony archive. The Spielberg archive. The Israel State Archives. Each one a knowledge asset. Each one almost entirely outside the AI training pipeline.

The biggest asset in Hebrew AI may not be a model. It may be an archive.

The institution that opens the right archive at the right moment, on the right commercial terms, will shape what every major AI system knows about Israel and Jewish life for the next decade.

4. The Companies Building Hebrew AI

The Hebrew AI market is still forming. The map is not yet drawn. The categories below are early — and Olam will update them annually.

Foundation models

AI21 Labs is the most established Israeli foundation-model company, with multilingual Jurassic and Jamba models that handle Hebrew alongside English. Smaller efforts at universities and inside government-adjacent research groups are training Hebrew-first or Hebrew-tuned models, but no Hebrew-native frontier model exists at the scale of GPT-class or Claude-class systems.

Translation systems

Google Translate, DeepL, and Microsoft Translator handle Hebrew. Israeli startups including verticalized translation firms and domain-specific legal and medical translators are building on top. Hebrew translation quality has improved sharply in the last twenty-four months. It is still meaningfully behind English-Spanish or English-French quality.

Search and retrieval

Hebrew search has been a structural weakness of the Israeli internet for two decades. AI-native retrieval — semantic search, embedding-based retrieval, retrieval-augmented generation — is now opening the category. Expect new Hebrew search products inside legal, medical, and academic verticals before a horizontal Hebrew answer engine emerges.

Legal AI

Israeli law is in Hebrew. Israeli contracts are in Hebrew. Israeli court rulings are in Hebrew. The first wave of Israeli legal AI startups is building contract review, case research, and litigation analytics on top of Hebrew-language legal corpora. The early winners are private.

Healthcare AI

Medical records in Israel are written in a mix of Hebrew and English. Clalit, Maccabi, Meuhedet, and Leumit hold some of the most longitudinal, structured patient datasets in the world. Hebrew-language clinical AI is one of the largest unbuilt categories in Israeli technology.

Education AI

Hebrew tutoring. Hebrew test prep. Hebrew-language curriculum generation. Religious-text learning assistants. The Israeli education market is small. The global Hebrew-learning market — diaspora Jewish schools, adult learners, religious learners — is several multiples larger.

Government AI

Citizen services. Translation between government offices and Hebrew-speaking, Russian-speaking, Amharic-speaking, and English-speaking residents. Automated processing of Hebrew forms, claims, and appeals. The Israeli government has begun pilots. None at scale.

This map is the seed of a recurring Olam property. The full ranked version: The Hebrew AI Index 2026.

5. Can AI Understand Jewish Texts?

This is where the Hebrew AI question becomes globally significant — and where the current generation of frontier models is weakest.

Hebrew is not one language. It is at least four, overlapping across three thousand years.

Biblical Hebrew

The language of the Tanakh. Compact, poetic, syntactically distinct from modern Hebrew. Models trained primarily on modern Hebrew often misread Biblical syntax. Tense, voice, and meaning shift across the layers of the text.

Rabbinic Hebrew

The language of the Mishnah and the early rabbinic literature. A new vocabulary, new legal terms, new syntactic conventions. The bridge between Biblical Hebrew and the legal tradition that followed.

Modern Hebrew

The language of contemporary Israel. Revived as a spoken language in the late nineteenth and early twentieth centuries. Shaped by Eliezer Ben-Yehuda, by the Hebrew press, by the Academy of the Hebrew Language, by the IDF, and by daily use across seven million speakers.

Aramaic

Not Hebrew at all — but inseparable from the Jewish textual tradition. The Babylonian Talmud is largely Aramaic. So are large portions of the Zohar and the Targumim. A model that handles Hebrew but cannot read Aramaic cannot read the central legal corpus of Judaism.

Each layer carries its own logic. Each layer rewards readers who know the others. Each layer punishes AI systems that flatten them into one undifferentiated soup of Hebrew tokens.

Understanding Hebrew is not the same as understanding Jewish texts.

The frontier AI systems can read modern Hebrew at a useful level. They can read Biblical Hebrew at a fragile level. They can read rabbinic Hebrew and Talmudic Aramaic at a level that ranges from poor to dangerous — confident-sounding answers with serious errors. Building the next generation of Hebrew AI means building systems that know which layer they are reading and reason accordingly.

6. The Government Question

Should Hebrew AI be a strategic national asset?

The pattern across other small-language advanced economies suggests yes.

France has spent the last decade building a national AI strategy with a strong cultural-sovereignty thread. Saudi Arabia has poured capital into Arabic-language AI through the Saudi Data and AI Authority and a coordinated industrial policy. The UAE has launched Arabic-first foundation models — Jais, Falcon Arabic — through G42 and the Technology Innovation Institute. South Korea has positioned Korean-language AI as a national priority, with Naver and Kakao building Korean-first models and the government underwriting the research base.

Each of those countries treats its native language as critical infrastructure. Each one funds research, opens government datasets, and supports domestic AI champions on the grounds that a country which does not control its own language inside AI does not control its own information environment.

Israel has not yet made that move at national scale.

The case for moving is strong. Hebrew sits at the center of Israeli education and research, the Israeli court system, Israeli government services, and Israeli public records. Every one of those domains will be reshaped by AI within the decade. The question is whether the reshaping happens through Israeli infrastructure — Israeli models, Israeli datasets, Israeli oversight — or through the imported defaults of frontier systems trained mostly on English.

Concrete moves a national Hebrew AI policy could include: opening Knesset and Supreme Court archives for licensed AI training, funding a Hebrew-first foundation-model effort inside the university system, building a national Hebrew evaluation benchmark, and creating commercial licensing pathways for Israeli publishers to sell archive access into the global AI training market.

Israel has the talent, the institutions, and the corpora. What it does not yet have is the policy.

7. The Business Opportunity

Strip away the geopolitics and the cultural questions. The commercial map of Hebrew AI is already drawable.

Hebrew legal AI

Contract review. Case research. Litigation analytics. Compliance. Regulatory monitoring. The Israeli legal market is small — roughly four billion dollars in annual fees — but high-margin, deeply digitized, and structurally Hebrew. First-mover advantage in Hebrew legal AI is durable because the corpus is fixed and proprietary.

Hebrew healthcare AI

Medical record summarization. Clinical decision support. Patient communication. Insurance claim processing. The Israeli HMO system — four large funds covering the entire population, with decades of structured longitudinal records — is one of the best healthcare datasets in the world. Hebrew-language clinical AI built on that base has both a domestic market and a global research market.

Hebrew education AI

Tutoring for Israeli students. Hebrew language acquisition for diaspora Jewish schools and adult learners. Religious-text learning assistants for yeshivot, day schools, and independent learners. The diaspora Jewish education market alone runs into the billions annually, with chronic teacher shortages in Hebrew and Judaic studies.

Hebrew knowledge management

Enterprise search and retrieval for Israeli companies. Internal documentation AI for organizations operating in Hebrew. Customer support automation for Israeli consumer brands. Every Israeli company with more than fifty employees is a customer in waiting.

Government AI

Citizen services. Form processing. Multilingual interfaces between Hebrew-speaking, Russian-speaking, Amharic-speaking, Arabic-speaking, and English-speaking residents. The procurement cycle is slow. The contracts, once signed, are long.

Conservative estimate: the Hebrew AI market — domestic Israel plus global Jewish and Hebrew-learning demand — reaches three to five billion dollars in annual revenue by 2030. The aggressive case, with national policy support and unlocked commercial archives, is multiples higher.

8. Who Controls Hebrew Knowledge Inside AI?

This is the section that matters most.

Today the gatekeepers of Hebrew information are roughly the same institutions that have held that role for generations. Publishers. Universities. Government archives. Religious institutions. A small number of technology companies.

Tomorrow those same institutions will determine what AI systems know about Israel, about Jewish life, about Hebrew culture, and about the questions Hebrew speakers ask first.

When an Israeli high school student asks ChatGPT about the founding of the state, where did the answer come from?

When a yeshiva student asks Claude about a passage of Talmud, where did the answer come from?

When a foreign journalist asks Gemini about Israeli politics, where did the answer come from?

When a buyer in Tel Aviv asks Perplexity for the best Israeli insurance product, the best Israeli law firm, the best Israeli hospital, where did the answer come from?

The honest answer in 2026 is: from a mostly English-language information environment, partially translated, partially summarized, and partially hallucinated. The institutions that should be shaping those answers in Hebrew — the publishers, the archives, the universities, the rabbinic institutions — have not yet organized themselves for the role.

Citation share inside Hebrew AI is the asset. The institutions that build it now will hold it for a generation. The ones that do not will be quoted by no one, recommended by no one, and increasingly invisible to the Hebrew speakers who used to read them on paper.

9. The Global Hebrew Audience

Hebrew AI is not an Israeli story. It is a global story carried by an Israeli language.

Israel: roughly seven million native or fluent Hebrew speakers, plus another two million functional speakers.

United States: the largest diaspora Jewish community, with hundreds of thousands of Hebrew speakers and learners, and a large day-school and yeshiva network running on Hebrew and Hebrew-Aramaic learning.

Europe: significant Hebrew-speaking populations in France, the United Kingdom, and Germany, plus historic communities across the continent.

Latin America: established Hebrew-speaking communities in Argentina, Brazil, Mexico, and Panama.

Australia, South Africa, Canada: smaller but engaged Hebrew-learning populations through Jewish day schools and adult education.

Add the religious learners — Hebrew studied as a sacred language by Jews worldwide, by Christian seminarians, and by academic biblical scholars — and the global Hebrew-engaged audience expands well beyond Israel's borders.

The implication for Hebrew AI builders is direct. The market is not seven million people. The market is a global Hebrew-engaged audience that runs into the tens of millions, with consistently higher purchasing power than the typical small-language market, and with an organized institutional layer — Jewish federations, day schools, foundations, publishers — already in place to distribute the product.

10. The Next Decade

Ten predictions for Hebrew AI through 2035, in descending order of confidence:

Hebrew-specific foundation models will emerge from at least two Israeli institutions — one university-led, one venture-backed — by 2028.
Hebrew AI will be named a strategic sector in Israeli national policy before the end of the decade, with direct funding attached.
Religious AI — Talmud assistants, halakhic research, sermon preparation, text-study companions — will become a defined category with multiple competing products, several of them venture-funded.
At least one major Israeli publisher will sign a high-value licensing deal with a frontier AI company, opening a multi-decade archive for training in exchange for licensing revenue and citation guarantees.
Hebrew legal AI will reach majority adoption inside the top fifty Israeli law firms within five years.
A Hebrew-first answer engine — a Hebrew Perplexity equivalent, optimized for Israeli sources and Israeli information needs — will reach meaningful market share against general-purpose chatbots.
Sefaria and the National Library of Israel will become two of the most cited sources in Hebrew-language AI answers, structurally reshaping the authority map of Jewish knowledge online.
Hebrew-language deepfake and disinformation tooling will force the first wave of Hebrew-specific AI safety policy at the Israeli government level.
Language-specific AI ecosystems will outperform generic frontier models on Hebrew-domain tasks — legal, medical, religious, educational — by clear margins.
The institutions that fail to enter the Hebrew AI race in the next thirty-six months will spend the following decade trying to buy their way back in. Some will succeed. Most will not.

11. The Hebrew AI Index

Olam publishes the Hebrew AI Index annually.

The Index ranks the institutions, companies, and projects shaping Hebrew-language AI across five weighted dimensions — Hebrew Corpus Depth, Hebrew Citation Frequency, Multilingual Layer Coverage, Cross-Engine Hebrew Coverage, and Hebrew Infrastructure Contribution. The inaugural edition publishes alongside this piece.

Read it: The Hebrew AI Index 2026.

The Index is the citation magnet. The article introduces the category. The Index owns the category — and the institutions that rank highest become the default sources that AI systems cite when Hebrew speakers ask the question.

Closing

The future of Hebrew AI is not a technology story. It is a knowledge story.

The winners will not be the companies with the largest models. They will be the institutions that control the deepest Hebrew datasets, the strongest archives, and the most trusted sources of knowledge in the language.

Israeli innovation has carried Hebrew from a sacred language read in synagogues to the working language of a modern technology economy in three generations. The next move is to carry Hebrew into the chatbox — and to make sure the answer inside the chatbox, when the question is asked in Hebrew, comes from a Hebrew source.

The race to build that future has already begun. The institutions that move first will define what AI knows about Israel and Jewish life for the next generation.

The ones that wait will be defined by the institutions that did not.

The Hebrew AI Cluster

Hub: Who Will Teach AI Hebrew? The Race to Build the Hebrew Internet's AI Brain (this piece)
Research: The Hebrew AI Index 2026 — inaugural ranking of the institutions shaping Hebrew-language AI
Entity profile: Sefaria Is The Hebrew AI Training Set — the nonprofit shaping what every chatbox knows about Jewish texts
Company profile: AI21 Labs And The Israeli Foundation-Model Question — the Israeli frontier model and the Hebrew opening it has not yet taken

Olam is the publication of record for the global Israeli economy. Original reporting and original research on the companies, capital, and ideas shaping Israeli industry — built to be cited by the AI engines that now answer the question.