Google Scholar — the largest and most comprehensive scholarly search engine — turns 20 this week. Over its two decades, some researchers say, the tool has become one of the most important in science. But in recent years, competitors that use artificial intelligence (AI) to improve the search experience have emerged, as have others that allow users to download their data.
The impact that Google Scholar — which is owned by web giant Google in Mountain View, California — has had on science is remarkable, says Jevin West, a computational social scientist at the University of Washington in Seattle who uses the database daily. But “if there was ever a moment when Google Scholar could be overthrown as the main search engine, it might be now, because of some of these new tools and some of the innovation that’s happening in other places,” West says.
Many of Google Scholar’s advantages — free access, breadth of information and sophisticated search options — “are now being shared by other platforms”, says Alberto Martín Martín, a bibliometrics researcher at the University of Granada in Spain.
AI-powered chatbots such as ChatGPT and other tools that use large language models have become go-to applications for some scientists when it comes to searching, reviewing and summarizing the literature. And some researchers have swapped Google Scholar for them. “Up until recently, Google Scholar was my default search,” says Aaron Tay, an academic librarian at Singapore Management University. It’s still top of his list, but “recently, I started using other AI tools”.
Still, given Google Scholar’s size and how deeply entrenched it is in the scientific community, “it would take a lot to dethrone”, adds West.
Anurag Acharya, co-founder of Google Scholar, at Google, says he welcomes all efforts to make scholarly information easier to find, understand and build on. “The more we can all do, the better it is for the advancement of science.”
Biggest and broadest
Google Scholar came onto the literature-search scene in 2004 and changed everything. At the time, researchers used libraries to find information or searched for academic papers by accessing paid online services such as the science-citation database Web of Science. Another paid service launched the same month as Google Scholar — Elsevier’s Scopus, a large database of scientific references and abstracts.
Google Scholar crawled the web for scholarly work of any kind, such as book chapters, reports, preprints and web documents — including those in languages other than English. The goal was “to make researchers of the world more effective, to help make it possible for everybody to be able to stand on a common frontier of science”, says Acharya.
Google Scholar’s agreements with publishers give it unrivalled access to the full text of articles behind paywalls — not just titles and abstracts, which is what most search engines offer. It ranks papers by how relevant they are to a search query — typically bringing the most-cited articles to the top — and suggests further queries. Its depth of coverage facilitates highly specific searches.
Google declined to share usage data for the service, but according to the web-traffic meter Similarweb, Google Scholar receives more than 100 million visits per month.
The database is also very good at pointing people to free versions of an article, says Martín Martín. This promotes the open-access movement, says José Luis Ortega, a bibliometrician at the Institute of Advanced Social Studies, Spanish National Research Council in Córdoba.
But in other ways, Google Scholar is opaque. Among the key concerns is a lack of insight into what content, including what journals, it searches and the algorithm it uses to recommend articles. It also restricts bulk downloads of its search results, which could be used for bibliometric analyses among other things. “We don’t have a lot of insight into one of the most valuable tools that we have in science,” says West.
Acharya says Google Scholar is chiefly a search tool and its main goal is to help scholars find the most useful research.
Updated engines
In the past few years, competitors have emerged that offer this kind of bibliometrics data, although none can beat Google Scholar’s size and access to full-text articles behind paywalls. One noteable example is the index OpenAlex, which launched in 2022. The previous year, Microsoft Academic Graph, which crawled the web for scholarly information, had been discontinued and its entire data set released. OpenAlex builds on this and other open sources of scholarly data. Users can search the content it catalogues by authors, institutions and citations and also download its entire records for free. “They are doing what we wanted Google Scholar to do,” says Martín-Martín.
Another popular research tool, Semantic Scholar, launched in 2015, uses AI to create readable summaries of papers and identifies its most relevant citations. Another tool, Consensus, launched in 2022, relies on Semantic Scholar’s database to find answers to questions informed by research (West is an adviser for Consensus). One of Tay’s favourites is Undermind, which uses a more sophisticated agent-based search, in which an autonomous entity scans the scientific literature the way a human would, adapting the search based on the content it finds. It takes a few minutes — as opposed to seconds for Google Scholar — to spit out results, but Tay says the wait is worth it. “I find the quality of the results that come back are better than Google Scholar.”
Acharya says Google Scholar also uses AI to rank articles, suggest further search queries and recommend related articles. And earlier this month, the company introduced AI-generated article outlines to its PDF reader. Acharya also says the search tool tries to understand the intent and context behind a query. This semantic search approach is based on language models and has been in use for about two years, he says.
One thing Google Scholar does not yet do is include AI-generated overviews of answers to a searched query, similar to those that are now found at the top of a typical Google search. Acharya says that summarizing conclusions from multiple papers in a way that is succinct and includes important context is challenging. “We haven’t yet seen an effective solution to this challenge,” he says.