
Picture by Creator
# Introduction
Working with JSON in Python is usually difficult. The fundamental json.hundreds() solely will get you thus far.
API responses, configuration recordsdata, and knowledge exports usually comprise JSON that’s messy or poorly structured. You want to flatten nested objects, safely extract values with out KeyError exceptions, merge a number of JSON recordsdata, or convert between JSON and different codecs. These duties come up continuously in net scraping, API integration, and knowledge processing. This text walks you thru 5 sensible features for dealing with frequent JSON parsing and processing duties.
You could find the code for these features on GitHub.
# 1. Safely Extracting Nested Values
JSON objects usually nest a number of ranges deep. Accessing deeply nested values with bracket notation will get difficult quick. If any secret is lacking, you get a KeyError.
Here’s a operate that allows you to entry nested values utilizing dot notation, with a fallback for lacking keys:
def get_nested_value(knowledge, path, default=None):
"""
Safely extract nested values from JSON utilizing dot notation.
Args:
knowledge: Dictionary or JSON object
path: Dot-separated string like "person.profile.e mail"
default: Worth to return if path would not exist
Returns:
The worth on the path, or default if not discovered
"""
keys = path.break up('.')
present = knowledge
for key in keys:
if isinstance(present, dict):
present = present.get(key)
if present is None:
return default
elif isinstance(present, record):
strive:
index = int(key)
present = present[index]
besides (ValueError, IndexError):
return default
else:
return default
return present
Let’s take a look at it with a posh nested construction:
# Pattern JSON knowledge
user_data = {
"person": {
"id": 123,
"profile": {
"identify": "Allie",
"e mail": "allie@instance.com",
"settings": {
"theme": "darkish",
"notifications": True
}
},
"posts": [
{"id": 1, "title": "First Post"},
{"id": 2, "title": "Second Post"}
]
}
}
# Extract values
e mail = get_nested_value(user_data, "person.profile.e mail")
theme = get_nested_value(user_data, "person.profile.settings.theme")
first_post = get_nested_value(user_data, "person.posts.0.title")
lacking = get_nested_value(user_data, "person.profile.age", default=25)
print(f"Electronic mail: {e mail}")
print(f"Theme: {theme}")
print(f"First publish: {first_post}")
print(f"Age (default): {lacking}")
Output:
Electronic mail: allie@instance.com
Theme: darkish
First publish: First Publish
Age (default): 25
The operate splits the trail string on dots and walks by the info construction one key at a time. At every stage, it checks if the present worth is a dictionary or an inventory. For dictionaries, it makes use of .get(key), which returns None for lacking keys as an alternative of elevating an error. For lists, it tries to transform the important thing to an integer index.
The default parameter supplies a fallback when any a part of the trail doesn’t exist. This prevents your code from crashing when coping with incomplete or inconsistent JSON knowledge from APIs.
This sample is very helpful when processing API responses the place some fields are non-obligatory or solely current beneath sure situations.
# 2. Flattening Nested JSON into Single-Stage Dictionaries
Machine studying fashions, CSV exports, and database inserts usually want flat knowledge buildings. However API responses and configuration recordsdata use nested JSON. Changing nested objects to flat key-value pairs is a typical job.
Here’s a operate that flattens nested JSON with customizable separators:
def flatten_json(knowledge, parent_key='', separator="_"):
"""
Flatten nested JSON right into a single-level dictionary.
Args:
knowledge: Nested dictionary or JSON object
parent_key: Prefix for keys (utilized in recursion)
separator: String to affix nested keys
Returns:
Flattened dictionary with concatenated keys
"""
gadgets = []
if isinstance(knowledge, dict):
for key, worth in knowledge.gadgets():
new_key = f"{parent_key}{separator}{key}" if parent_key else key
if isinstance(worth, dict):
# Recursively flatten nested dicts
gadgets.prolong(flatten_json(worth, new_key, separator).gadgets())
elif isinstance(worth, record):
# Flatten lists with listed keys
for i, merchandise in enumerate(worth):
list_key = f"{new_key}{separator}{i}"
if isinstance(merchandise, (dict, record)):
gadgets.prolong(flatten_json(merchandise, list_key, separator).gadgets())
else:
gadgets.append((list_key, merchandise))
else:
gadgets.append((new_key, worth))
else:
gadgets.append((parent_key, knowledge))
return dict(gadgets)
Now let’s flatten a posh nested construction:
# Complicated nested JSON
product_data = {
"product": {
"id": 456,
"identify": "Laptop computer",
"specs": {
"cpu": "Intel i7",
"ram": "16GB",
"storage": {
"sort": "SSD",
"capability": "512GB"
}
},
"evaluations": [
{"rating": 5, "comment": "Excellent"},
{"rating": 4, "comment": "Good value"}
]
}
}
flattened = flatten_json(product_data)
for key, worth in flattened.gadgets():
print(f"{key}: {worth}")
Output:
product_id: 456
product_name: Laptop computer
product_specs_cpu: Intel i7
product_specs_ram: 16GB
product_specs_storage_type: SSD
product_specs_storage_capacity: 512GB
product_reviews_0_rating: 5
product_reviews_0_comment: Wonderful
product_reviews_1_rating: 4
product_reviews_1_comment: Good worth
The operate makes use of recursion to deal with arbitrary nesting depth. When it encounters a dictionary, it processes every key-value pair, build up the flattened key by concatenating mother or father keys with the separator.
For lists, it makes use of the index as a part of the important thing. This allows you to protect the order and construction of array components within the flattened output. The sample reviews_0_rating tells you that is the score from the primary evaluation.
The separator parameter enables you to customise the output format. Use dots for dot notation, underscores for snake_case, or slashes for path-like keys relying in your wants.
This operate is especially helpful when it’s essential to convert JSON API responses into dataframes or CSV rows the place every column wants a singular identify.
# 3. Deep Merging A number of JSON Objects
Configuration administration usually requires merging a number of JSON recordsdata containing default settings, environment-specific configs, person preferences, and extra. A easy dict.replace() solely handles the highest stage. You want deep merging that recursively combines nested buildings.
Here’s a operate that deep merges JSON objects:
def deep_merge_json(base, override):
"""
Deep merge two JSON objects, with override taking priority.
Args:
base: Base dictionary
override: Dictionary with values to override/add
Returns:
New dictionary with merged values
"""
end result = base.copy()
for key, worth in override.gadgets():
if key in end result and isinstance(end result[key], dict) and isinstance(worth, dict):
# Recursively merge nested dictionaries
end result[key] = deep_merge_json(end result[key], worth)
else:
# Override or add the worth
end result[key] = worth
return end result
Let’s strive merging pattern configuration data:
import json
# Default configuration
default_config = {
"database": {
"host": "localhost",
"port": 5432,
"timeout": 30,
"pool": {
"min": 2,
"max": 10
}
},
"cache": {
"enabled": True,
"ttl": 300
},
"logging": {
"stage": "INFO"
}
}
# Manufacturing overrides
prod_config = {
"database": {
"host": "prod-db.instance.com",
"pool": {
"min": 5,
"max": 50
}
},
"cache": {
"ttl": 600
},
"monitoring": {
"enabled": True
}
}
merged = deep_merge_json(default_config, prod_config)
print(json.dumps(merged, indent=2))
Output:
{
"database": {
"host": "prod-db.instance.com",
"port": 5432,
"timeout": 30,
"pool": {
"min": 5,
"max": 50
}
},
"cache": {
"enabled": true,
"ttl": 600
},
"logging": {
"stage": "INFO"
},
"monitoring": {
"enabled": true
}
}
The operate recursively merges nested dictionaries. When each the bottom and override comprise dictionaries on the similar key, it merges these dictionaries as an alternative of changing them totally. This preserves values that aren’t explicitly overridden.
Discover how database.port and database.timeout stay from the default configuration, whereas database.host will get overridden. The pool settings merge on the nested stage, so min and max each get up to date.
The operate additionally provides new keys that don’t exist within the base config, just like the monitoring part within the manufacturing override.
You’ll be able to chain a number of merges to layer configurations:
final_config = deep_merge_json(
deep_merge_json(default_config, prod_config),
user_preferences
)
This sample is frequent in software configuration the place you might have defaults, environment-specific settings, and runtime overrides.
# 4. Filtering JSON by Schema or Whitelist
APIs usually return extra knowledge than you want. Massive JSON responses make your code more durable to learn. Typically you solely need particular fields, or it’s essential to take away delicate knowledge earlier than logging.
Here’s a operate that filters JSON to maintain solely specified fields:
def filter_json(knowledge, schema):
"""
Filter JSON to maintain solely fields laid out in schema.
Args:
knowledge: Dictionary or JSON object to filter
schema: Dictionary defining which fields to maintain
Use True to maintain a area, nested dict for nested filtering
Returns:
Filtered dictionary containing solely specified fields
"""
if not isinstance(knowledge, dict) or not isinstance(schema, dict):
return knowledge
end result = {}
for key, worth in schema.gadgets():
if key not in knowledge:
proceed
if worth is True:
# Maintain this area as-is
end result[key] = knowledge[key]
elif isinstance(worth, dict):
# Recursively filter nested object
if isinstance(knowledge[key], dict):
filtered_nested = filter_json(knowledge[key], worth)
if filtered_nested:
end result[key] = filtered_nested
elif isinstance(knowledge[key], record):
# Filter every merchandise within the record
filtered_list = []
for merchandise in knowledge[key]:
if isinstance(merchandise, dict):
filtered_item = filter_json(merchandise, worth)
if filtered_item:
filtered_list.append(filtered_item)
else:
filtered_list.append(merchandise)
if filtered_list:
end result[key] = filtered_list
return end result
Let’s filter a pattern API response:
import json
# Pattern API response
api_response = {
"person": {
"id": 789,
"username": "Cayla",
"e mail": "cayla@instance.com",
"password_hash": "secret123",
"profile": {
"identify": "Cayla Smith",
"bio": "Software program developer",
"avatar_url": "https://instance.com/avatar.jpg",
"private_notes": "Inside notes"
},
"posts": [
{
"id": 1,
"title": "Hello World",
"content": "My first post",
"views": 100,
"internal_score": 0.85
},
{
"id": 2,
"title": "Python Tips",
"content": "Some tips",
"views": 250,
"internal_score": 0.92
}
]
},
"metadata": {
"request_id": "abc123",
"server": "web-01"
}
}
# Schema defining what to maintain
public_schema = {
"person": {
"id": True,
"username": True,
"profile": {
"identify": True,
"avatar_url": True
},
"posts": {
"id": True,
"title": True,
"views": True
}
}
}
filtered = filter_json(api_response, public_schema)
print(json.dumps(filtered, indent=2))
Output:
{
"person": {
"id": 789,
"username": "Cayla",
"profile": {
"identify": "Cayla Smith",
"avatar_url": "https://instance.com/avatar.jpg"
},
"posts": [
{
"id": 1,
"title": "Hello World",
"views": 100
},
{
"id": 2,
"title": "Python Tips",
"views": 250
}
]
}
}
The schema acts as a whitelist. Setting a area to True contains it within the output. Utilizing a nested dictionary enables you to filter nested objects. The operate recursively applies the schema to nested buildings.
For arrays, the schema applies to every merchandise. Within the instance, the posts array will get filtered so every publish solely contains id, title, and views, whereas content material and internal_score are excluded.
Discover how delicate fields like password_hash and private_notes don’t seem within the output. This makes the operate helpful for sanitizing knowledge earlier than logging or sending to frontend purposes.
You’ll be able to create completely different schemas for various use instances, comparable to a minimal schema for record views, an in depth schema for single-item views, and an admin schema that features every part.
# 5. Changing JSON to and from Dot Notation
Some programs use flat key-value shops, however you need to work with nested JSON in your code. Changing between flat dot-notation keys and nested buildings helps obtain this.
Here’s a pair of features for bidirectional conversion.
// Changing JSON to Dot Notation
def json_to_dot_notation(knowledge, parent_key=''):
"""
Convert nested JSON to flat dot-notation dictionary.
Args:
knowledge: Nested dictionary
parent_key: Prefix for keys (utilized in recursion)
Returns:
Flat dictionary with dot-notation keys
"""
gadgets = {}
if isinstance(knowledge, dict):
for key, worth in knowledge.gadgets():
new_key = f"{parent_key}.{key}" if parent_key else key
if isinstance(worth, dict):
gadgets.replace(json_to_dot_notation(worth, new_key))
else:
gadgets[new_key] = worth
else:
gadgets[parent_key] = knowledge
return gadgets
// Changing Dot Notation to JSON
def dot_notation_to_json(flat_data):
"""
Convert flat dot-notation dictionary to nested JSON.
Args:
flat_data: Dictionary with dot-notation keys
Returns:
Nested dictionary
"""
end result = {}
for key, worth in flat_data.gadgets():
components = key.break up('.')
present = end result
for i, half in enumerate(components[:-1]):
if half not in present:
present[part] = {}
present = present[part]
present[parts[-1]] = worth
return end result
Let’s take a look at the round-trip conversion:
import json
# Unique nested JSON
config = {
"app": {
"identify": "MyApp",
"model": "1.0.0"
},
"database": {
"host": "localhost",
"credentials": {
"username": "admin",
"password": "secret"
}
},
"options": {
"analytics": True,
"notifications": False
}
}
# Convert to dot notation (for surroundings variables)
flat = json_to_dot_notation(config)
print("Flat format:")
for key, worth in flat.gadgets():
print(f" {key} = {worth}")
print("n" + "="*50 + "n")
# Convert again to nested JSON
nested = dot_notation_to_json(flat)
print("Nested format:")
print(json.dumps(nested, indent=2))
Output:
Flat format:
app.identify = MyApp
app.model = 1.0.0
database.host = localhost
database.credentials.username = admin
database.credentials.password = secret
options.analytics = True
options.notifications = False
==================================================
Nested format:
{
"app": {
"identify": "MyApp",
"model": "1.0.0"
},
"database": {
"host": "localhost",
"credentials": {
"username": "admin",
"password": "secret"
}
},
"options": {
"analytics": true,
"notifications": false
}
}
The json_to_dot_notation operate flattens the construction by recursively strolling by nested dictionaries and becoming a member of keys with dots. In contrast to the sooner flatten operate, this one doesn’t deal with arrays; it’s optimized for configuration knowledge that’s purely key-value.
The dot_notation_to_json operate reverses the method. It splits every key on dots and builds up the nested construction by creating intermediate dictionaries as wanted. The loop handles all components besides the final one, creating nesting ranges. Then it assigns the worth to the ultimate key.
This strategy retains your configuration readable and maintainable whereas working throughout the constraints of flat key-value programs.
# Wrapping Up
JSON processing goes past fundamental json.hundreds(). In most tasks, you’ll need instruments to navigate nested buildings, rework shapes, merge configurations, filter fields, and convert between codecs.
The strategies on this article switch to different knowledge processing duties as nicely. You’ll be able to modify these patterns for XML, YAML, or customized knowledge codecs.
Begin with the secure entry operate to stop KeyError exceptions in your code. Add the others as you run into particular wants. Pleased coding!
Bala Priya C is a developer and technical author from India. She likes working on the intersection of math, programming, knowledge science, and content material creation. Her areas of curiosity and experience embrace DevOps, knowledge science, and pure language processing. She enjoys studying, writing, coding, and occasional! At present, she’s engaged on studying and sharing her information with the developer neighborhood by authoring tutorials, how-to guides, opinion items, and extra. Bala additionally creates partaking useful resource overviews and coding tutorials.