title: Building Your Own Database Agent date: 2024-07-16 tags:

llm updated: 2024-07-16 up:

Building Your Own Database Agent

https://learn.deeplearning.ai/courses/building-your-own-database-agent/lesson/1/introduction ![[Database Agent 1.png]] ![[Artificial Intelligence.png]] ![[Fine tuning.png]] ![[Database Agents.png]]

from langchain.agents.agent_types import AgentType

from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent

agent = create_pandas_dataframe_agent(llm=model,df=df,verbose=True)

agent.invoke("how many rows are there?")

> Entering new AgentExecutor chain...

Thought: To find out the number of rows in a pandas DataFrame, I can use the shape attribute which returns a tuple with the number of rows and columns. The first element of the tuple will give me the number of rows.

Action: python_repl_ast

Action Input: df.shape[0]

Observation: 20780

Thought:I now know the final answer.

Final Answer: There are 20780 rows in the dataframe df.

> Finished chain.

{'input': 'how many rows are there?',

'output': 'There are 20780 rows in the dataframe df.'}

CSV_PROMPT_PREFIX = """

First set the pandas display options to show all the columns,

get the column names, then answer the question.

"""

CSV_PROMPT_SUFFIX = """

ALWAYS before giving the Final Answer, try another method.

Then reflect on the answers of the two methods you did and ask yourself

if it answers correctly the original question.

If you are not sure, try another method.

If the methods tried do not give the same result,reflect and

try again until you have two methods that have the same result.

If you still cannot arrive to a consistent result, say that

you are not sure of the answer.

If you are sure of the correct answer, create a beautiful

and thorough response using Markdown.

**DO NOT MAKE UP AN ANSWER OR USE PRIOR KNOWLEDGE,

ONLY USE THE RESULTS OF THE CALCULATIONS YOU HAVE DONE**.

ALWAYS, as part of your "Final Answer", explain how you got

to the answer on a section that starts with: "\n\nExplanation:\n".

In the explanation, mention the column names that you used to get

to the final answer.

"""

QUESTION = "How may patients were hospitalized during July 2020"

"in Texas, and nationwide as the total of all states?"

"Use the hospitalizedIncrease column"

agent.invoke(CSV_PROMPT_PREFIX + QUESTION + CSV_PROMPT_SUFFIX) ![[Database.png]] ![[Pasted Graphic 7.png]]

Path to your SQLite database file

database_file_path = "./db/test.db"

Create an engine to connect to the SQLite database

SQLite only requires the path to the database file

engine = create_engine(f'sqlite:///{database_file_path}')

file_url = "./data/all-states-history.csv"

df = pd.read_csv(file_url).fillna(value = 0)

df.to_sql(

'all_states_history',

con=engine,

if_exists='replace',

index=False

)

MSSQL_AGENT_PREFIX = """

You are an agent designed to interact with a SQL database.

Instructions:

Given an input question, create a syntactically correct {dialect} query

to run, then look at the results of the query and return the answer.

Unless the user specifies a specific number of examples they wish to

obtain, ALWAYS limit your query to at most {top_k} results.

You can order the results by a relevant column to return the most

interesting examples in the database.

Never query for all the columns from a specific table, only ask for

the relevant columns given the question.

You have access to tools for interacting with the database.
You MUST double check your query before executing it.If you get an error

while executing a query,rewrite the query and try again.

DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.)

to the database.

DO NOT MAKE UP AN ANSWER OR USE PRIOR KNOWLEDGE, ONLY USE THE RESULTS

OF THE CALCULATIONS YOU HAVE DONE.

Your response should be in Markdown. However, **when running a SQL Query

in "Action Input", do not include the markdown backticks**.

Those are only for formatting the response, not for executing the command.

ALWAYS, as part of your final answer, explain how you got to the answer

on a section that starts with: "Explanation:". Include the SQL query as

part of the explanation section.

If the question does not seem related to the database, just return

"I don't know" as the answer.

Only use the below tools. Only use the information returned by the

below tools to construct your query and final answer.

Do not make up table names, only use the tables returned by any of the

tools below.

Tools:

"""

MSSQL_AGENT_FORMAT_INSTRUCTIONS = """

Use the following format:

Question: the input question you must answer.

Thought: you should always think about what to do.

Action: the action to take, should be one of [{tool_names}].

Action Input: the input to the action.

Observation: the result of the action.

... (this Thought/Action/Action Input/Observation can repeat N times)

Thought: I now know the final answer.

Final Answer: the final answer to the original input question.

Example of Final Answer:

<=== Beginning of example

Action: query_sql_db

Action Input:

SELECT TOP (10) [death]

FROM covidtracking

WHERE state = 'TX' AND date LIKE '2020%'

Observation:

[(27437.0,), (27088.0,), (26762.0,), (26521.0,), (26472.0,), (26421.0,), (26408.0,)]

Thought:I now know the final answer

Final Answer: There were 27437 people who died of covid in Texas in 2020.

Explanation:

I queried the covidtracking table for the death column where the state

is 'TX' and the date starts with '2020'. The query returned a list of tuples

with the number of deaths for each day in 2020. To answer the question,

I took the sum of all the deaths in the list, which is 27437.

I used the following query


SELECT [death] FROM covidtracking WHERE state = 'TX' AND date LIKE '2020%'"

===> End of Example

"""

llm = AzureChatOpenAI(

openai_api_version="2023-05-15",

azure_deployment="gpt-4-1106",

azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),

temperature=0,

max_tokens=500

)

db = SQLDatabase.from_uri(f'sqlite:///{database_file_path}')

toolkit = SQLDatabaseToolkit(db=db, llm=llm)

QUESTION = """How may patients were hospitalized during October 2020

in New York, and nationwide as the total of all states?

Use the hospitalizedIncrease column

"""

agent_executor_SQL = create_sql_agent(

prefix=MSSQL_AGENT_PREFIX,

format_instructions = MSSQL_AGENT_FORMAT_INSTRUCTIONS,

llm=llm,

toolkit=toolkit,

top_k=30,

verbose=True

) ![[Function Calling.png]]

def get_current_weather(location, unit="fahrenheit"):

"""Get the current weather in a given location.

The default unit when not specified is fahrenheit"""

if "new york" in location.lower():

return json.dumps(

{"location": "New York", "temperature": "40", "unit": unit}

)

elif "san francisco" in location.lower():

return json.dumps(

{"location": "San Francisco", "temperature": "50", "unit": unit}

)

elif "las vegas" in location.lower():

return json.dumps(

{"location": "Las Vegas", "temperature": "70", "unit": unit}

)

else:

return json.dumps(

{"location": location, "temperature": "unknown"}

)

get_current_weather("New York")

messages = [

{"role": "user",

"content": """What's the weather like in San Francisco,

New York, and Las Vegass?"""

}

]

tools = [

{

"type": "function",

"function": {

"name": "get_current_weather",

"description": """Get the current weather in a given

location.The default unit when not

specified is fahrenheit""",

"parameters": {

"type": "object",

"properties": {

"location": {

"type": "string",

"description": """The city and state,

e.g. San Francisco, CA""",

"unit": {

"type": "string",

"default":"fahrenheit",

"enum": [ "fahrenheit", "celsius"],

"description": """The messuring unit for

the temperature.

If not explicitly specified

the default unit is

fahrenheit"""

"required": ["location"],

}

]

response = client.chat.completions.create(

model="gpt-4-1106",

messages=messages,

tools=tools,

tool_choice="auto",

)

response_message = response.choices[0].message

tool_calls = response_message.tool_calls

if tool_calls:

print (tool_calls)

available_functions = {

"get_current_weather": get_current_weather,

}

messages.append(response_message)

for tool_call in tool_calls:

function_name = tool_call.function.name

function_to_call = available_functions[function_name]

function_args = json.loads(tool_call.function.arguments)

function_response = function_to_call(

location=function_args.get("location"),

unit=function_args.get("unit"),

)

messages.append(

{

"tool_call_id": tool_call.id,

"role": "tool",

"name": function_name,

"content": function_response,

}

)

print (messages)

second_response = client.chat.completions.create(

model="gpt-4-1106",

messages=messages,

)

print (second_response) ![[Building Your Own Database.png]] ![[for scenarios like e-commerce..png]] ![[Code Interpreter.png]]

from sqlalchemy import create_engine

import pandas as pd

df = pd.read_csv("./data/all-states-history.csv").fillna(value = 0)

response = client.chat.completions.create(

model="gpt-4-1106",

messages=messages,

tools=tools_sql,

tool_choice="auto",

)

response_message = response.choices[0].message

tool_calls = response_message.tool_calls

if tool_calls:

print (tool_calls)

available_functions = {

"get_positive_cases_for_state_on_date": get_positive_cases_for_state_on_date,

"get_hospitalized_increase_for_state_on_date":get_hospitalized_increase_for_state_on_date

}

messages.append(response_message)

for tool_call in tool_calls:

function_name = tool_call.function.name

function_to_call = available_functions[function_name]

function_args = json.loads(tool_call.function.arguments)

function_response = function_to_call(

state_abbr=function_args.get("state_abbr"),

specific_date=function_args.get("specific_date"),

)

messages.append(

{

"tool_call_id": tool_call.id,

"role": "tool",

"name": function_name,

"content": str(function_response),

}

)

print(messages)

second_response = client.chat.completions.create(

model="gpt-4-1106",

messages=messages,

)

print (second_response)

client = AzureOpenAI(

api_key=os.getenv("AZURE_OPENAI_KEY"),

api_version="2024-02-15-preview",

azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")

)

I) Create assistant

assistant = client.beta.assistants.create(

instructions="""You are an assistant answering questions

about a Covid dataset.""",

model="gpt-4-1106",

tools=Helper.tools_sql)

II) Create thread

thread = client.beta.threads.create()

print(thread)

III) Add message

message = client.beta.threads.messages.create(

thread_id=thread.id,

role="user",

content="""how many hospitalized people we had in Alaska

the 2021-03-05?"""

)

print(message)

messages = client.beta.threads.messages.list(

thread_id=thread.id

)

print(messages.model_dump_json(indent=2))

IV) Run assistant on thread

run = client.beta.threads.runs.create(

thread_id=thread.id,

assistant_id=assistant.id,

)

import time

from IPython.display import clear_output

start_time = time.time()

status = run.status

while status not in ["completed", "cancelled", "expired", "failed"]:

time.sleep(5)

run = client.beta.threads.runs.retrieve(

thread_id=thread.id,run_id=run.id

)

print("Elapsed time: {} minutes {} seconds".format(

int((time.time() - start_time) // 60),

int((time.time() - start_time) % 60))

)

status = run.status

print(f'Status: {status}')

if (status=="requires_action"):

available_functions = {

"get_positive_cases_for_state_on_date": get_positive_cases_for_state_on_date,

"get_hospitalized_increase_for_state_on_date":get_hospitalized_increase_for_state_on_date

}

tool_outputs = []

for tool_call in run.required_action.submit_tool_outputs.tool_calls:

function_name = tool_call.function.name

function_to_call = available_functions[function_name]

function_args = json.loads(tool_call.function.arguments)

function_response = function_to_call(

state_abbr=function_args.get("state_abbr"),

specific_date=function_args.get("specific_date"),

)

print(function_response)

print(tool_call.id)

tool_outputs.append(

{ "tool_call_id": tool_call.id,

"output": str(function_response)

}

)

run = client.beta.threads.runs.submit_tool_outputs(

thread_id=thread.id,

run_id=run.id,

tool_outputs = tool_outputs

)

messages = client.beta.threads.messages.list(

thread_id=thread.id

)

print(messages)

print(messages.model_dump_json(indent=2))

file = client.files.create(

file=open("./data/all-states-history.csv", "rb"),

purpose='assistants'

)

assistant = client.beta.assistants.create(

instructions="""You are an assitant answering questions about

a Covid dataset.""",

model="gpt-4-1106",

tools=[{"type": "code_interpreter"}],

file_ids=[file.id])

thread = client.beta.threads.create()

print(thread)

message = client.beta.threads.messages.create(

thread_id=thread.id,

role="user",

content="""how many hospitalized people we had in Alaska

the 2021-03-05?"""

)

print(message)

run = client.beta.threads.runs.create(

thread_id=thread.id,

assistant_id=assistant.id,

)

status = run.status

start_time = time.time()

while status not in ["completed", "cancelled", "expired", "failed"]:

time.sleep(5)

run = client.beta.threads.runs.retrieve(

thread_id=thread.id,

run_id=run.id

)

print("Elapsed time: {} minutes {} seconds".format(

int((time.time() - start_time) // 60),

int((time.time() - start_time) % 60))

)

status = run.status

print(f'Status: {status}')

clear_output(wait=True)

messages = client.beta.threads.messages.list(

thread_id=thread.id

)

print(messages.model_dump_json(indent=2))

alanhc-til