Prof. Frenzel
9 min readOct 5, 2024
#KB API — Data Retrieval

Dear fellow Data Scientists,

As businesses increasingly depend on data for decision-making, quick access to real-time information has become more important than ever. One of the most effective tools for automating data collection is the Application Programming Interface (API), which allows users to directly retrieve data from external sources. APIs enable users to gather diverse datasets — ranging from social media analytics to financial figures — directly into their systems, allowing for timely analysis and forecasting.

👋Michael Morrison, the author of this guide, provides a clear and practical overview of how to work with APIs for data retrieval. This article covers the basics of API interaction, using Python and the ChatGPT API to demonstrate practical applications for integrating external data into your projects.

What are APIs and Their Uses?

A formal definition for an API is a set of protocols and tools that allows different software systems to communicate with each other. Essentially, APIs enable the exchange of data between a client (e.g., a Python script) and a server (e.g., a web service) by sending requests and receiving responses. I personally like to think of APIs as a way of using all the commands that are available through a website in my code instead of through the original interface.

Where automation and real-time data integration are key, APIs are indispensable. In business analytics, they simplify data collection and integration, allowing companies to automate the retrieval of essential metrics. They allow organizations to streamline their workflows, reducing manual data entry and providing real-time access to important information. In my experience, they are especially useful when working with geographical data. For instance, you can determine the distance to the coast from a property using coordinates, which can be helpful in a house-pricing algorithm.

Key Concepts

APIs communicate using the HTTP protocol through requests and responses. Each request includes a method (such as GET, POST, PUT, DELETE) that defines the action to be performed. Below is a breakdown of each method:

  • GET: This method retrieves data from the server. For example, to get weather data, a GET request is sent to a weather API, which responds with the requested information, often in a structured format like JSON or XML.
  • POST: Used to send new data to the server. This method is typically employed to add new information to a database, such as submitting a form or uploading a file.
  • PUT: This method updates existing data on the server. It replaces the current data with new information, often used when you need to edit a record that already exists.
  • DELETE: As the name suggests, this method removes data from the server. It’s used when you want to delete a specific resource identified by the URL.
HTTP requests and responses

The client sends an HTTP request to the server, which processes the request. The server may interact with a database (DB) or a third-party service to gather the necessary data or perform actions. The server then sends an HTTP response back to the client. This simple cycle is the core of how APIs work.

Most public APIs require authentication to ensure that only authorized users can access the data. API keys are the most common method of authentication and must be included in the header or parameters of the request. This allows the API to monitor usage and prevent abuse. While many API keys require a subscription or monthly fee, there are free versions available that permit a limited number of calls per month.

Understanding how to use an API starts with reading its documentation. Each API service has its own set of instructions for accessing and using data properly. Fully understanding the documentation is key to successful data retrieval, as the specifics can vary slightly from one API to another.

Hands on Coding: An Example Using Chat GPT

To demonstrate how to retrieve data using APIs, I’ll walk you through a practical example using the OpenAI API. The first step is obtaining an API key, which requires registering on the platform. After registration, I was able to generate a unique key that is essential for interacting with the API.

Once I had the API key, I used it to authenticate my requests to the API and access the required data. In this example, I used Python to send a request to the OpenAI API to summarize text from a file. The text I chose for this demonstration is from Alice’s Adventures in Wonderland by Lewis Carroll.

Below is the code example for making an API call using Python:

import openai

# Set your OpenAI API key
openai.api_key = "your-api-key-here"

# Open a text file to summarize
with open("alice copy.txt", "r") as file:
content = file.read()

# Define your prompt for the API request
messages = [
{"role": "system", "content": "You are a helpful assistant."}, # Optional: sets assistant's behavior
{"role": "user", "content": f"Summarize the following text:\n\n{content}"}
]

# Make a request to the OpenAI API using the chat completion endpoint
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo", # Use 'gpt-3.5-turbo' for the free version
messages=messages,
max_tokens=100, # Limit the number of tokens in the response
temperature=0.7, # Adjust the creativity level
)

# Extract and print the response
print(response.choices[0].message['content'])
#Output
The text is an introduction to the Project Gutenberg eBook of
"Alice's Adventures in Wonderland" by Lewis Carroll. It provides information
about the book's availability, author, release date, language, and credits.
The first chapter of the book, "Down the Rabbit-Hole," is also included,
describing how Alice follows a White Rabbit into a rabbit-hole and begins
her adventures in Wonderland.

This example demonstrates the basic workflow of interacting with an API using Python. I started by importing the openai library and setting my API key for authentication. Then, I defined the content I wanted the API to summarize and made a request using the ChatCompletion method. Finally, I printed the response, which generated a concise summary of the provided text.

Case Study: Sentiment Analysis of Customer Feedback

Problem Statement

Imagine a scenario where a business needs to analyze customer reviews to understand the overall sentiment — whether it’s positive, negative, or neutral — towards their products or services. Such insights can help the business make data-driven decisions to improve customer satisfaction, adjust marketing strategies, or enhance product features. However, manually analyzing thousands of reviews can be time-consuming and inefficient, which is where sentiment analysis using APIs comes in handy.

Data Retrieval

For this case study, I used a dataset containing customer feedback from the Customer Feedback dataset available on Kaggle. The dataset includes various product reviews, and I’ll be extracting the comments to perform sentiment analysis. This will help us determine whether the feedback is generally positive, negative, or mixed.

Implementation

To perform the sentiment analysis, I used the OpenAI API to classify the customer feedback into three categories: “Good,” “Bad,” or “Mixed.” Below is the implementation, starting with loading the dataset, applying the sentiment classification using the API, and finally visualizing the results.

First, I loaded the dataset and defined a function to get sentiment values for each customer comment using the OpenAI API. The function sends each comment to the API, which classifies the sentiment and returns the result.

import matplotlib.pyplot as plt
import pandas as pd
import openai

# Load the CSV file
df = pd.read_csv("redmi6.csv", encoding='ISO-8859-1')

# Function to get sentiment from the OpenAI API
def get_sentiment(text):
try:
# Use the ChatCompletion endpoint for gpt-3.5-turbo
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a helpful assistant that provides sentiment analysis."},
{"role": "user", "content": f"Please classify the sentiment of the following comment as Good, Bad, or Mixed:\n\n{text}"}
]
)
# Extract the response content
sentiment = response.choices[0].message['content'].strip().lower()
if "good" in sentiment.lower():
return "Good"
elif "bad" in sentiment.lower():
return "Bad"
else:
return "Mixed"
except Exception as e:
return "Error"

# Apply the function to the Comments column and create a new Sentiment column
df['Sentiment'] = df['Comments'].apply(get_sentiment)

In this code, I loaded the customer feedback dataset and passed each comment to the get_sentiment function. The function uses the OpenAI API to classify the sentiment of the comment as "Good," "Bad," or "Mixed" based on the API’s response.

Results

The pie chart below illustrates the sentiment distribution of customer feedback. The majority of the reviews are classified as Good (59.3%), indicating a generally positive reception of the products. However, a notable portion of the feedback is Mixed (22.9%), and a smaller but still significant share is classified as Bad (17.9%), suggesting there may be room for improvement in customer satisfaction.

Sentiment Analysis Results

Common Issues

When working with APIs, there are a few recurring issues that can affect the reliability and efficiency of data retrieval:

  • Rate Limits: Many APIs set limits on the number of requests that can be made within a certain time frame. If these limits are exceeded, access may be temporarily restricted. It’s important to keep track of how many requests you are making and ensure that you stay within the permitted range, which is typically outlined in the API’s documentation.
  • API Key Expiration: API keys can expire or become inactive over time. This can cause requests to fail. It’s a good practice to regularly verify that your API key is still valid and renew it when necessary to avoid any interruptions in your data collection.
  • Token Limits: APIs often impose limits on the amount of data that can be processed in a single request. Large datasets or lengthy text inputs may exceed these limits, especially with APIs like OpenAI, which have token restrictions. If this happens, you may need to truncate the data or split it into smaller pieces before sending it.

Best Practices and Lessons Learned

When using APIs, applying certain strategies can help improve the efficiency and reliability of your requests:

  1. Batching and Caching Requests: Instead of making separate API calls for each data point, try combining multiple requests into one. Caching frequently requested data can also reduce the need to repeatedly call the API. These approaches help minimize the number of API requests you make and can help avoid hitting rate limits.
  2. Monitor API Usage: Every time you call an API, it uses up a portion of your allotted actions. To avoid exhausting your quota or exceeding rate limits, be mindful of when and how often you make requests. Keeping track of your API usage can also help you identify areas where you can reduce unnecessary calls.
  3. Error Handling: It’s common to encounter errors while using APIs — whether due to network timeouts, invalid requests, or server-side issues. Including error handling in your code ensures that these problems are managed effectively. For example, in Python, you can use try-except blocks to catch errors and allow your program to retry the request or handle the issue appropriately. Without proper error handling, diagnosing and fixing problems can become far more difficult.
  4. Handling Specific API Errors: Different APIs come with their own unique error codes and messages. Familiarizing yourself with the documentation will help you understand how to deal with these specific errors. Tailoring your error handling to the API you’re working with can lead to smoother performance and quicker fixes.
Prof. Frenzel

Data Scientist | Engineer - Professor | Entrepreneur - Investor | Finance - World Traveler