Techno Blender
Digitally Yours.

Recursion for Data Scientists. Recursively search JSON API responses… | by Jake | May, 2022

0 97


Recursively search JSON API responses to accelerate your productivity

Credit: Pexels.com

For data scientists, the temptation is real to bypass computer science fundamentals and software engineering best practices in pursuit of tool mastery (Pandas & SKLearn to name two) and start adding value to your organization immediately.

Many fundamental concepts in CS really are worth the hassle as they’ll enable you built concise, scalable logic when tediously wrangling unwieldy data. Throughout this post, we’ll work our way up to defining a recursive function, which can search python dictionaries for target values. Without further ado, let’s dive in!

Recursion in layman’s terms is a self-referencing function. Simply put, the function can call itself! But won’t this result in an infinite loop? The trick is to define a base case, which terminates the function, through use of control flow (if/elif/else logic.)

Pseudo code

def recursive_function(argument):
if base case:
return terminal_action(argument)
else:
new_argument = intermediate_action(argument)
recursive_function(new_argument)

A couple notes on the above:

  1. We need to specify a “fork in the road.” Either the base case is satisfied and we exit the recursive logic or we use the function recursively to reach the base case.
  2. A terminal function in the base case is totally optional. You might only want to return the argument as is, in which case your terminal function would boil down to lambda x: x.
  3. In order to not get stuck in an infinite loop, we can’t pass the same argument to the recursive function over and over and over again; we need to modify the argument using an intermediate action. Unlike the terminal action, the intermediate action cannot be passive (boil down to lambda x: x.)
  4. Side effects, or actions non-essential to the recursive logic or base case, can be added as desired.

Illustrative Example

This example, blastoff, is the helloWorld of recursive functions.

def blastoff(x):
# base case
if x == 0:
print("blastoff!") # side effect
return # terminal action: None/passive
# recursive logic
else:
print(x) #side effect
blastoff(x-1) # intermediate action: decrementing x by 1
blastoff(5)>>>5
4
3
2
1
blastoff!

Now, let’s step up game to handle a real world example — searching for a given value in a nested dictionary.

Why a nested dictionary? Because dictionaries are python-native objective analogous to JSON. In fact, Converting JSON objects into dictionaries is readily supported in Python. Additionally, API responses are typically delivered in JSON (though some still use XML, but this is decreasing in popularity.)

So next time you need to find out if a given value is in your API request results, you can simply (1) convert the results from JSON to a Python dictionary then (2) use the below recursive function!

def obj_dfs(obj, target, results=None):
if not results:
results = []
# recursive logic
if isinstance(obj, dict):
for key, val in obj.items():
results.append(obj_dfs(key, target, results))
results.append(obj_dfs(val, target, results))
elif isinstance(obj, (list, tuple)):
for elem in obj:
results.append(obj_dfs(elem, target, results))
# base case
else:
if obj == target:
return True
else:
return False
return any(results)

Let’s break this down into sections!

Base case

The base case is an object which is not a dictionary, list or tuple. In other words, the base case is valid if the object cannot contain nested dictionaries, tuples or lists. These are the values we’re actually searching for — strings, integers, floats, etc.

When we detect a base case, we simply evaluate the question, “did we find the target?” returning True or False

Recursive logic

If the base case is not valid, such that the object can contain nested dictionaries, lists, or tuples, we recursively call the function on those very nested dictionaries, lists, or tuples.

I’ve used an if block and an elif block to handle dictionaries in one way and lists and tuples in another way. This is necessary because dictionaries are composed of (key, value) tuples whereas lists and tuples simply contain objects.

`isinstance` is a built-in Python function, which determines if the type of an object is either equal to x or in (x,y).

Results

We first check if a results object exists — if it’s None, which is the default behavior, we define an empty list. In subsequent recursive calls, the same results object is updated.

It might seem unusual to append a recursive function to a list — but the function, itself, is never appended to the list. The function follows the recursive logic, only ever returning True or False in the base case. Thus, results will only ever be populated with objects, which are each either equal to True or False.

Lastly, the function returns `any(results)`, which in Python returns True if one or more nested elements is True. Thus if the target object is found in at least one of the detected base cases, the function globally returns True.

Room for Extension

Say you’re not interested in exact matches, but fuzzy matches — or perhaps you have your own context-specific logic. This is fine, we can pass a function as an argument to our recursive function, which will be used in the base case.

obj = {'chicago': 
[{'coffee shops':'hero'}, {'bars':'la vaca'}],
'san francisco':
[{'restaurants':'el techo'},
{'sight seeing':'golden gate bridge'}]}
def base(x):
return 'golden' in x
def obj_dfs(obj, f_base, results=None):
if not results:
results = []
# recursive logic
if isinstance(obj, dict):
for key, val in obj.items():
results.append(obj_dfs(key, f_base, results))
results.append(obj_dfs(val, f_base, results))
elif isinstance(obj, (list, tuple)):
for elem in obj:
results.append(obj_dfs(elem, f_base, results))
# base case
else:
return f_base(obj)
return any(results)

Lastly, to use this function on actual JSON-formatted API response, convert your API response to dictionary using the following code.

import json
with open('data.json') as json_file:
data = json.load(json_file)

You can reference this free template API to get a JSON response to test this code out! https://fakestoreapi.com/

import requests
import json
data = requests.get('https://fakestoreapi.com/products/1')
response = json.loads(data.text)
def obj_dfs(obj, target, f_base, results=None):
if not results:
results = []
# recursive logic
if isinstance(obj, dict):
for key, val in obj.items():
results.append(obj_dfs(key, target, f_base, results))
results.append(obj_dfs(val, target, f_base, results))

elif isinstance(obj, (list, tuple)):
for elem in obj:
results.append(obj_dfs(elem, target, f_base, results))

# base case
else:
return base_case_function(obj)
if any(results):
return list(filter(lambda x: x!=False, results))[0]
else:
return False
def find_url(x):
x = str(x)
if 'http' in x:
return x
else:
return False
obj_dfs(obj=response, target=None, f_base=find_url)
>>> 'https://fakestoreapi.com/img/81fPKd-2AYL._AC_SL1500_.jpg'

I made a small alteration to the code; specifically, instead of returning any(results) I’ve returned the first non-False element in results, which is the image link url found in the JSON API response.

I hope you feel empowered by just how versatile recursion really is with this new tool in your belt. You can modify this code to your needs!

I hope this has been helpful — if so, please subscribe to my blog!


Recursively search JSON API responses to accelerate your productivity

Credit: Pexels.com

For data scientists, the temptation is real to bypass computer science fundamentals and software engineering best practices in pursuit of tool mastery (Pandas & SKLearn to name two) and start adding value to your organization immediately.

Many fundamental concepts in CS really are worth the hassle as they’ll enable you built concise, scalable logic when tediously wrangling unwieldy data. Throughout this post, we’ll work our way up to defining a recursive function, which can search python dictionaries for target values. Without further ado, let’s dive in!

Recursion in layman’s terms is a self-referencing function. Simply put, the function can call itself! But won’t this result in an infinite loop? The trick is to define a base case, which terminates the function, through use of control flow (if/elif/else logic.)

Pseudo code

def recursive_function(argument):
if base case:
return terminal_action(argument)
else:
new_argument = intermediate_action(argument)
recursive_function(new_argument)

A couple notes on the above:

  1. We need to specify a “fork in the road.” Either the base case is satisfied and we exit the recursive logic or we use the function recursively to reach the base case.
  2. A terminal function in the base case is totally optional. You might only want to return the argument as is, in which case your terminal function would boil down to lambda x: x.
  3. In order to not get stuck in an infinite loop, we can’t pass the same argument to the recursive function over and over and over again; we need to modify the argument using an intermediate action. Unlike the terminal action, the intermediate action cannot be passive (boil down to lambda x: x.)
  4. Side effects, or actions non-essential to the recursive logic or base case, can be added as desired.

Illustrative Example

This example, blastoff, is the helloWorld of recursive functions.

def blastoff(x):
# base case
if x == 0:
print("blastoff!") # side effect
return # terminal action: None/passive
# recursive logic
else:
print(x) #side effect
blastoff(x-1) # intermediate action: decrementing x by 1
blastoff(5)>>>5
4
3
2
1
blastoff!

Now, let’s step up game to handle a real world example — searching for a given value in a nested dictionary.

Why a nested dictionary? Because dictionaries are python-native objective analogous to JSON. In fact, Converting JSON objects into dictionaries is readily supported in Python. Additionally, API responses are typically delivered in JSON (though some still use XML, but this is decreasing in popularity.)

So next time you need to find out if a given value is in your API request results, you can simply (1) convert the results from JSON to a Python dictionary then (2) use the below recursive function!

def obj_dfs(obj, target, results=None):
if not results:
results = []
# recursive logic
if isinstance(obj, dict):
for key, val in obj.items():
results.append(obj_dfs(key, target, results))
results.append(obj_dfs(val, target, results))
elif isinstance(obj, (list, tuple)):
for elem in obj:
results.append(obj_dfs(elem, target, results))
# base case
else:
if obj == target:
return True
else:
return False
return any(results)

Let’s break this down into sections!

Base case

The base case is an object which is not a dictionary, list or tuple. In other words, the base case is valid if the object cannot contain nested dictionaries, tuples or lists. These are the values we’re actually searching for — strings, integers, floats, etc.

When we detect a base case, we simply evaluate the question, “did we find the target?” returning True or False

Recursive logic

If the base case is not valid, such that the object can contain nested dictionaries, lists, or tuples, we recursively call the function on those very nested dictionaries, lists, or tuples.

I’ve used an if block and an elif block to handle dictionaries in one way and lists and tuples in another way. This is necessary because dictionaries are composed of (key, value) tuples whereas lists and tuples simply contain objects.

`isinstance` is a built-in Python function, which determines if the type of an object is either equal to x or in (x,y).

Results

We first check if a results object exists — if it’s None, which is the default behavior, we define an empty list. In subsequent recursive calls, the same results object is updated.

It might seem unusual to append a recursive function to a list — but the function, itself, is never appended to the list. The function follows the recursive logic, only ever returning True or False in the base case. Thus, results will only ever be populated with objects, which are each either equal to True or False.

Lastly, the function returns `any(results)`, which in Python returns True if one or more nested elements is True. Thus if the target object is found in at least one of the detected base cases, the function globally returns True.

Room for Extension

Say you’re not interested in exact matches, but fuzzy matches — or perhaps you have your own context-specific logic. This is fine, we can pass a function as an argument to our recursive function, which will be used in the base case.

obj = {'chicago': 
[{'coffee shops':'hero'}, {'bars':'la vaca'}],
'san francisco':
[{'restaurants':'el techo'},
{'sight seeing':'golden gate bridge'}]}
def base(x):
return 'golden' in x
def obj_dfs(obj, f_base, results=None):
if not results:
results = []
# recursive logic
if isinstance(obj, dict):
for key, val in obj.items():
results.append(obj_dfs(key, f_base, results))
results.append(obj_dfs(val, f_base, results))
elif isinstance(obj, (list, tuple)):
for elem in obj:
results.append(obj_dfs(elem, f_base, results))
# base case
else:
return f_base(obj)
return any(results)

Lastly, to use this function on actual JSON-formatted API response, convert your API response to dictionary using the following code.

import json
with open('data.json') as json_file:
data = json.load(json_file)

You can reference this free template API to get a JSON response to test this code out! https://fakestoreapi.com/

import requests
import json
data = requests.get('https://fakestoreapi.com/products/1')
response = json.loads(data.text)
def obj_dfs(obj, target, f_base, results=None):
if not results:
results = []
# recursive logic
if isinstance(obj, dict):
for key, val in obj.items():
results.append(obj_dfs(key, target, f_base, results))
results.append(obj_dfs(val, target, f_base, results))

elif isinstance(obj, (list, tuple)):
for elem in obj:
results.append(obj_dfs(elem, target, f_base, results))

# base case
else:
return base_case_function(obj)
if any(results):
return list(filter(lambda x: x!=False, results))[0]
else:
return False
def find_url(x):
x = str(x)
if 'http' in x:
return x
else:
return False
obj_dfs(obj=response, target=None, f_base=find_url)
>>> 'https://fakestoreapi.com/img/81fPKd-2AYL._AC_SL1500_.jpg'

I made a small alteration to the code; specifically, instead of returning any(results) I’ve returned the first non-False element in results, which is the image link url found in the JSON API response.

I hope you feel empowered by just how versatile recursion really is with this new tool in your belt. You can modify this code to your needs!

I hope this has been helpful — if so, please subscribe to my blog!

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment