News
The news endpoint allows you perform low-latency search on our enriched news vector database, which is updated every 5 minutes.
The API reference provides the most detailed information
Building your query
AskNews is built on a flexible vector database (Qdrant) that has two types of indices: dense vectors (semantic search) and sparse vectors (keyword search). In both cases, you can use natural language as your query
, the difference is how your query gets treated in our databases. The beauty of Qdrant, is that we can combine these in a hybrid search, add constraints to them (with filters e.g. string_guarantee
, categories
etc.), and do it all in under a second (most times under 100 milliseconds).
Your query
can be any phrase, keyword, question, or paragraph. If method='kw'
is the most common and most effective, it is a sparse representation of keyword matching with term expansion. You can use method='kw'
with full paragraphs and you will get great results. We also support more semantic matching with method='nl'
, which matches general semantics and meaning, instead of direct match. If you want the best of both words, toss method='both'
for a re-ranked retrieval. In all cases, your query can be as long as a paragraph of natural english text. You read that right, there is no special query language to learn for AskNews...just plain english, which is why AskNews excels at LLM interactions!
Please note that method='kw'
is keyword matching, so most times it gets you the closest match in the database to your query
. But, if you have a strict requirement to run your search on documents that must contain (or must not contain) a set of keywords, for a specific keyword, you can use the parameter string_guarantee
, as demonstrated below.
You can also use AskNews without any similarity search, by foregoing query
and simply defining your filters (outlined below). In this case, you scroll through all results associated with your filters. This is great if you want all articles associated with a set of filters, and you do not want to run any similarity search.
Example usage
You could also guarantee that your search runs on documents containing an explicit string by using a list of strings in the string_guarantee
field:
(there also exists a reverse_string_guarantee
to search on articles that do not contain any of your list of strings)
But if you want to get the latest news articles semantically similar to "Tesla's stock price is soaring", you could use:
And you could of course, use the string_guarantee
with your nl
search as well.
Scrolling on documents without similarity search
You can forego similarity search entirely by not defining a query
and simply defining your filters. This is great if you want all articles associated with a set of filters, and you do not want to run any similarity search:
Controlling the time period of your search
In most cases, you are looking to search through the latest news. By default, the search will occur for the most recent 48 hours. However, if you want more fine-grained control over the search period, you can do so using start_timestamp
and end_timestamp
:
We maintain two databases, one is hot and holds the latest 48 hours of news. The other is an archive and goes back to 2023. If you want to search the archive, you can set historical=True
:
And you can use the start_timestamp and end_timestamp to control the period of the search in the archive equivalently.
Choosing your return type
You can choose to return the search results in two formats: string
or dicts
or both
. The string
format is prompt-optimized and ready to be immediately injected into any prompt. The dicts
format is a structured dictionary, containing structured information and additional metadata (like a classic news API).
String return object
The .as_string
object is a string that is optimized for injecting directly into your prompt. It is structured like this:
output truncated for brevity
Dicts return object
Meanwhile, the .as_dicts
object is a list of dictionaries structured like this:
Detailed and updated response structures are always available in the API reference
You can also choose to return both formats, this may be useful if you want to ask your LLM to start making citations:
The script would output:
Citation key: [1] with link https://www.sueddeutsche.de/meinung/kommentar-bundesverfassungsgericht-schutz-demokratie-1.6501233
Citation key: [2] with link https://www.welt.de/politik/deutschland/plus250797354/Friedrich-Merz-Mehrheit-weiss-im-tiefsten-Inneren-wie-es-um-das-Land-steht.html
Citation key: [3] with link https://www.welt.de/politik/deutschland/plus250802278/Russland-Kurs-SPD-entfernt-sich-von-ihrer-Kernwaehlerschaft-wird-von-Funktionaeren-dominiert.html
Citation key: [4] with link https://www.tichyseinblick.de/meinungen/erstwaehler-konservativ
Citation key: [5] with link https://www.sueddeutsche.de/politik/demonstrationen-rechtsextremismus-afd-remigration-1.6497936
Citation key: [6] with link https://www.gmx.net/magazine/politik/politische-talkshows/frust-staat-kommunalpolitiker-klagen-markus-lanz-ampel-regierung-39488786
Citation key: [7] with link https://web.de/magazine/politik/us-politik/gruenen-politiker-schaefer-transatlantische-verhaeltnis-pflegt-39480726
Citation key: [8] with link https://www.saarbruecker-zeitung.de/nachrichten/politik/inland/baerbock-neuwahl-vorstoss-ist-parteipolitisches-spielchen_aid-109767673
Citation key: [9] with link https://www.gazeta.ru/business/news/2024/03/29/22661113.shtml
Citation key: [10] with link https://www.lefigaro.fr/conjoncture/la-panne-de-l-economie-allemande-pese-lourdement-sur-l-europe-20240328
Response from GPT3.5
The current political situation in Germany involves several key developments:
-
Internal Security Concerns: There are doubts about Germany's internal security, with an increasing focus on strengthening the Federal Constitutional Court. This indicates a growing sense of insecurity about both external threats and internal vulnerabilities within the state [1].
-
Budget and Defense: The CDU is facing financial challenges, with Party leader Friedrich Merz questioning the feasibility of spending 40 billion euros on citizen's income while maintaining defense capabilities. This could influence the Union's campaign strategy for the upcoming federal elections [2].
-
Shifts in Political Positions: The SPD is facing criticism for moving away from its core voter base and reviving old policies, particularly concerning Russia. This has led to concerns about the party leadership's approach and its impact on foreign policy [3].
-
Youth Political Leanings: A survey found that a significant percentage of first-time voters trust far-right parties like the AfD to solve European problems. However, there is also a trend towards left-green and far-left views among young people, especially on university campuses, where protests and discussions on various social and political issues are prevalent [4].
-
Political Protests: There have been protests against right-wing extremism and the AfD in Germany, with a notable decrease in participation in recent demonstrations. The reasons for this decline are being investigated [5].
LLM output truncated for brevity
Controlling similarity
If you would like to be particular about your search results, you can control the similarity score threshold. This is a value between 0 and 1, where 1 is an exact match and 0 is a very loose match. By default, the similarity score threshold is set to 0.5:
Searching on categories
We have classified all the articles in our database into the following categories: "Business", "Crime", "Politics", "Science", "Sports", "Technology", "Military", "Health", "Entertainment". You can filter your search results by setting the categories
parameter:
Pagination
If you want to paginate through the search results, you can set the offset
parameter. This is the number of articles to skip to get to your page of interest. For example, if you want to get page 3 for an n_article
page size of 10, you would set offset
to 20:
We conveniently return response.offset
in every response object, which can be injected into your next call to obtain the next page of results.
Controlling the document delimiters
Different LLMs may work better with different document delimiters in the prompt. You can control the document start and end delimiters by setting the doc_start_delimiter
and doc_end_delimiter
parameters:
The default values for these are "<doc>" and "</doc>", respectively.
Diversifying sources
If you want to ensure a diverse representation of sources is returned with your search, you can use the diversify_sources
parameter. This parameter tells AskNews to cast a wider net during the search, analzye the source distribution of results, and then return a set of articles with a matching distribution as the wider net. Warning: this will add some latency to the call.
Filtering by Page Rank or Domain
You can filter the search results by the page rank of the articles or by the domain of the articles. This can be useful if you want to ensure that only high-quality articles are returned, or if you want to restrict the search to a specific set of sources.
Meanwhile, if you want to filter by domain, you can do so like this:
Filtering by domain restricts the search quite a bit, so you will want to also increase your similarity_score_threshold
, and likely also look deeper into the past using the start_timestamp
and end_timestamp
parameters.
Filtering on Entities, Countries, Continents, and Languages
AskNews trained and released the State of the Art entity extraction model, which has been used to extract entities on all our articles. You can leverage these to guarantee that your search contains any of our long list of entities:
Returning Knowledge Graphs
AskNews is maintaining the largest news knowledge graph on the planet, and you can leverage the underlying data with a simple parameter setting:
Which will add the key entity_relation_graph
to the article dictionary with a structure that looks like:
"entity_relation_graph": {
"nodes": [
{
"detailed_type": "sporting event",
"id": "Olympic Games",
"ner_type": "Event",
"type": "event"
},
{
"detailed_type": "magazine",
"id": "Hérodote",
"ner_type": "Media",
"type": "organization"
},
{
"detailed_type": "historical figure",
"id": "Baron Pierre de Coubertin",
"ner_type": "Person",
"type": "person"
},
{
"detailed_type": "geographer",
"id": "Béatrice Giblin",
"ner_type": "Person",
"type": "person"
},
...
],
"edges": [
{
"from": "Olympic Games",
"label": "reported by",
"to": "Hérodote"
},
{
"from": "Olympic Games",
"label": "compared to",
"to": "Baron Pierre de Coubertin"
},
{
"from": "Olympic Games",
"label": "commented by",
"to": "Béatrice Giblin"
},
...
]
}
Advanced filtering
There are many other filtering options available, including enforcing maximum page rank, filtering by the domain url of interest, provocative rating, reporting voice types, and more. You can find all the available options in the API reference.