Astra DB (Cassandra)
DataStax Astra DB is a serverless vector-capable database built on
Cassandra
and made conveniently available through an easy-to-use JSON API.
In the walkthrough, we'll demo the SelfQueryRetriever
with an Astra DB
vector store.
Creating an Astra DB vector storeโ
First we'll want to create an Astra DB VectorStore and seed it with some data. We've created a small demo set of documents that contain summaries of movies.
NOTE: The self-query retriever requires you to have lark
installed (pip install lark
). We also need the astrapy
package.
%pip install --upgrade --quiet lark astrapy langchain-openai
We want to use OpenAIEmbeddings
so we have to get the OpenAI API Key.
import os
from getpass import getpass
from langchain_openai.embeddings import OpenAIEmbeddings
if "OPENAI_API_KEY" not in os.environ:
os.environ["OPENAI_API_KEY"] = getpass("OpenAI API Key:")
embeddings = OpenAIEmbeddings()
API Reference:OpenAIEmbeddings
Create the Astra DB VectorStore:
- the API Endpoint looks like
https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com
- the Token looks like
AstraCS:6gBhNmsk135....
ASTRA_DB_API_ENDPOINT = input("ASTRA_DB_API_ENDPOINT = ")
ASTRA_DB_APPLICATION_TOKEN = getpass("ASTRA_DB_APPLICATION_TOKEN = ")
from langchain_community.vectorstores import AstraDB
from langchain_core.documents import Document
docs = [
Document(
page_content="A bunch of scientists bring back dinosaurs and mayhem breaks loose",
metadata={"year": 1993, "rating": 7.7, "genre": "science fiction"},
),
Document(
page_content="Leo DiCaprio gets lost in a dream within a dream within a dream within a ...",
metadata={"year": 2010, "director": "Christopher Nolan", "rating": 8.2},
),
Document(
page_content="A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea",
metadata={"year": 2006, "director": "Satoshi Kon", "rating": 8.6},
),
Document(
page_content="A bunch of normal-sized women are supremely wholesome and some men pine after them",
metadata={"year": 2019, "director": "Greta Gerwig", "rating": 8.3},
),
Document(
page_content="Toys come alive and have a blast doing so",
metadata={"year": 1995, "genre": "animated"},
),
Document(
page_content="Three men walk into the Zone, three men walk out of the Zone",
metadata={
"year": 1979,
"director": "Andrei Tarkovsky",
"genre": "science fiction",
"rating": 9.9,
},
),
]
vectorstore = AstraDB.from_documents(
docs,
embeddings,
collection_name="astra_self_query_demo",
api_endpoint=ASTRA_DB_API_ENDPOINT,
token=ASTRA_DB_APPLICATION_TOKEN,
)