
Picture by Writer
# Introduction
Working with JSON in Python is commonly difficult. The essential json.hundreds() solely will get you thus far.
API responses, configuration recordsdata, and knowledge exports typically comprise JSON that’s messy or poorly structured. It’s good to flatten nested objects, safely extract values with out KeyError exceptions, merge a number of JSON recordsdata, or convert between JSON and different codecs. These duties come up consistently in net scraping, API integration, and knowledge processing. This text walks you thru 5 sensible capabilities for dealing with frequent JSON parsing and processing duties.
You could find the code for these capabilities on GitHub.
# 1. Safely Extracting Nested Values
JSON objects typically nest a number of ranges deep. Accessing deeply nested values with bracket notation will get difficult quick. If any key’s lacking, you get a KeyError.
Here’s a perform that allows you to entry nested values utilizing dot notation, with a fallback for lacking keys:
def get_nested_value(knowledge, path, default=None):
"""
Safely extract nested values from JSON utilizing dot notation.
Args:
knowledge: Dictionary or JSON object
path: Dot-separated string like "person.profile.e mail"
default: Worth to return if path would not exist
Returns:
The worth on the path, or default if not discovered
"""
keys = path.break up('.')
present = knowledge
for key in keys:
if isinstance(present, dict):
present = present.get(key)
if present is None:
return default
elif isinstance(present, listing):
strive:
index = int(key)
present = present[index]
besides (ValueError, IndexError):
return default
else:
return default
return present
Let’s check it with a fancy nested construction:
# Pattern JSON knowledge
user_data = {
"person": {
"id": 123,
"profile": {
"identify": "Allie",
"e mail": "allie@instance.com",
"settings": {
"theme": "darkish",
"notifications": True
}
},
"posts": [
{"id": 1, "title": "First Post"},
{"id": 2, "title": "Second Post"}
]
}
}
# Extract values
e mail = get_nested_value(user_data, "person.profile.e mail")
theme = get_nested_value(user_data, "person.profile.settings.theme")
first_post = get_nested_value(user_data, "person.posts.0.title")
lacking = get_nested_value(user_data, "person.profile.age", default=25)
print(f"Electronic mail: {e mail}")
print(f"Theme: {theme}")
print(f"First put up: {first_post}")
print(f"Age (default): {lacking}")
Output:
Electronic mail: allie@instance.com
Theme: darkish
First put up: First Publish
Age (default): 25
The perform splits the trail string on dots and walks by means of the information construction one key at a time. At every stage, it checks if the present worth is a dictionary or a listing. For dictionaries, it makes use of .get(key), which returns None for lacking keys as an alternative of elevating an error. For lists, it tries to transform the important thing to an integer index.
The default parameter offers a fallback when any a part of the trail doesn’t exist. This prevents your code from crashing when coping with incomplete or inconsistent JSON knowledge from APIs.
This sample is particularly helpful when processing API responses the place some fields are elective or solely current beneath sure situations.
# 2. Flattening Nested JSON into Single-Stage Dictionaries
Machine studying fashions, CSV exports, and database inserts typically want flat knowledge buildings. However API responses and configuration recordsdata use nested JSON. Changing nested objects to flat key-value pairs is a typical job.
Here’s a perform that flattens nested JSON with customizable separators:
def flatten_json(knowledge, parent_key='', separator="_"):
"""
Flatten nested JSON right into a single-level dictionary.
Args:
knowledge: Nested dictionary or JSON object
parent_key: Prefix for keys (utilized in recursion)
separator: String to affix nested keys
Returns:
Flattened dictionary with concatenated keys
"""
gadgets = []
if isinstance(knowledge, dict):
for key, worth in knowledge.gadgets():
new_key = f"{parent_key}{separator}{key}" if parent_key else key
if isinstance(worth, dict):
# Recursively flatten nested dicts
gadgets.prolong(flatten_json(worth, new_key, separator).gadgets())
elif isinstance(worth, listing):
# Flatten lists with listed keys
for i, merchandise in enumerate(worth):
list_key = f"{new_key}{separator}{i}"
if isinstance(merchandise, (dict, listing)):
gadgets.prolong(flatten_json(merchandise, list_key, separator).gadgets())
else:
gadgets.append((list_key, merchandise))
else:
gadgets.append((new_key, worth))
else:
gadgets.append((parent_key, knowledge))
return dict(gadgets)
Now let’s flatten a fancy nested construction:
# Advanced nested JSON
product_data = {
"product": {
"id": 456,
"identify": "Laptop computer",
"specs": {
"cpu": "Intel i7",
"ram": "16GB",
"storage": {
"sort": "SSD",
"capability": "512GB"
}
},
"evaluations": [
{"rating": 5, "comment": "Excellent"},
{"rating": 4, "comment": "Good value"}
]
}
}
flattened = flatten_json(product_data)
for key, worth in flattened.gadgets():
print(f"{key}: {worth}")
Output:
product_id: 456
product_name: Laptop computer
product_specs_cpu: Intel i7
product_specs_ram: 16GB
product_specs_storage_type: SSD
product_specs_storage_capacity: 512GB
product_reviews_0_rating: 5
product_reviews_0_comment: Wonderful
product_reviews_1_rating: 4
product_reviews_1_comment: Good worth
The perform makes use of recursion to deal with arbitrary nesting depth. When it encounters a dictionary, it processes every key-value pair, increase the flattened key by concatenating guardian keys with the separator.
For lists, it makes use of the index as a part of the important thing. This allows you to protect the order and construction of array components within the flattened output. The sample reviews_0_rating tells you that is the ranking from the primary overview.
The separator parameter allows you to customise the output format. Use dots for dot notation, underscores for snake_case, or slashes for path-like keys relying in your wants.
This perform is especially helpful when it’s essential convert JSON API responses into dataframes or CSV rows the place every column wants a novel identify.
# 3. Deep Merging A number of JSON Objects
Configuration administration typically requires merging a number of JSON recordsdata containing default settings, environment-specific configs, person preferences, and extra. A easy dict.replace() solely handles the highest stage. You want deep merging that recursively combines nested buildings.
Here’s a perform that deep merges JSON objects:
def deep_merge_json(base, override):
"""
Deep merge two JSON objects, with override taking priority.
Args:
base: Base dictionary
override: Dictionary with values to override/add
Returns:
New dictionary with merged values
"""
consequence = base.copy()
for key, worth in override.gadgets():
if key in consequence and isinstance(consequence[key], dict) and isinstance(worth, dict):
# Recursively merge nested dictionaries
consequence[key] = deep_merge_json(consequence[key], worth)
else:
# Override or add the worth
consequence[key] = worth
return consequence
Let’s strive merging pattern configuration information:
import json
# Default configuration
default_config = {
"database": {
"host": "localhost",
"port": 5432,
"timeout": 30,
"pool": {
"min": 2,
"max": 10
}
},
"cache": {
"enabled": True,
"ttl": 300
},
"logging": {
"stage": "INFO"
}
}
# Manufacturing overrides
prod_config = {
"database": {
"host": "prod-db.instance.com",
"pool": {
"min": 5,
"max": 50
}
},
"cache": {
"ttl": 600
},
"monitoring": {
"enabled": True
}
}
merged = deep_merge_json(default_config, prod_config)
print(json.dumps(merged, indent=2))
Output:
{
"database": {
"host": "prod-db.instance.com",
"port": 5432,
"timeout": 30,
"pool": {
"min": 5,
"max": 50
}
},
"cache": {
"enabled": true,
"ttl": 600
},
"logging": {
"stage": "INFO"
},
"monitoring": {
"enabled": true
}
}
The perform recursively merges nested dictionaries. When each the bottom and override comprise dictionaries on the identical key, it merges these dictionaries as an alternative of changing them completely. This preserves values that aren’t explicitly overridden.
Discover how database.port and database.timeout stay from the default configuration, whereas database.host will get overridden. The pool settings merge on the nested stage, so min and max each get up to date.
The perform additionally provides new keys that don’t exist within the base config, just like the monitoring part within the manufacturing override.
You possibly can chain a number of merges to layer configurations:
final_config = deep_merge_json(
deep_merge_json(default_config, prod_config),
user_preferences
)
This sample is frequent in software configuration the place you will have defaults, environment-specific settings, and runtime overrides.
# 4. Filtering JSON by Schema or Whitelist
APIs typically return extra knowledge than you want. Massive JSON responses make your code more durable to learn. Generally you solely need particular fields, or it’s essential take away delicate knowledge earlier than logging.
Here’s a perform that filters JSON to maintain solely specified fields:
def filter_json(knowledge, schema):
"""
Filter JSON to maintain solely fields laid out in schema.
Args:
knowledge: Dictionary or JSON object to filter
schema: Dictionary defining which fields to maintain
Use True to maintain a subject, nested dict for nested filtering
Returns:
Filtered dictionary containing solely specified fields
"""
if not isinstance(knowledge, dict) or not isinstance(schema, dict):
return knowledge
consequence = {}
for key, worth in schema.gadgets():
if key not in knowledge:
proceed
if worth is True:
# Hold this subject as-is
consequence[key] = knowledge[key]
elif isinstance(worth, dict):
# Recursively filter nested object
if isinstance(knowledge[key], dict):
filtered_nested = filter_json(knowledge[key], worth)
if filtered_nested:
consequence[key] = filtered_nested
elif isinstance(knowledge[key], listing):
# Filter every merchandise within the listing
filtered_list = []
for merchandise in knowledge[key]:
if isinstance(merchandise, dict):
filtered_item = filter_json(merchandise, worth)
if filtered_item:
filtered_list.append(filtered_item)
else:
filtered_list.append(merchandise)
if filtered_list:
consequence[key] = filtered_list
return consequence
Let’s filter a pattern API response:
import json
# Pattern API response
api_response = {
"person": {
"id": 789,
"username": "Cayla",
"e mail": "cayla@instance.com",
"password_hash": "secret123",
"profile": {
"identify": "Cayla Smith",
"bio": "Software program developer",
"avatar_url": "https://instance.com/avatar.jpg",
"private_notes": "Inside notes"
},
"posts": [
{
"id": 1,
"title": "Hello World",
"content": "My first post",
"views": 100,
"internal_score": 0.85
},
{
"id": 2,
"title": "Python Tips",
"content": "Some tips",
"views": 250,
"internal_score": 0.92
}
]
},
"metadata": {
"request_id": "abc123",
"server": "web-01"
}
}
# Schema defining what to maintain
public_schema = {
"person": {
"id": True,
"username": True,
"profile": {
"identify": True,
"avatar_url": True
},
"posts": {
"id": True,
"title": True,
"views": True
}
}
}
filtered = filter_json(api_response, public_schema)
print(json.dumps(filtered, indent=2))
Output:
{
"person": {
"id": 789,
"username": "Cayla",
"profile": {
"identify": "Cayla Smith",
"avatar_url": "https://instance.com/avatar.jpg"
},
"posts": [
{
"id": 1,
"title": "Hello World",
"views": 100
},
{
"id": 2,
"title": "Python Tips",
"views": 250
}
]
}
}
The schema acts as a whitelist. Setting a subject to True consists of it within the output. Utilizing a nested dictionary allows you to filter nested objects. The perform recursively applies the schema to nested buildings.
For arrays, the schema applies to every merchandise. Within the instance, the posts array will get filtered so every put up solely consists of id, title, and views, whereas content material and internal_score are excluded.
Discover how delicate fields like password_hash and private_notes don’t seem within the output. This makes the perform helpful for sanitizing knowledge earlier than logging or sending to frontend functions.
You possibly can create completely different schemas for various use circumstances, corresponding to a minimal schema for listing views, an in depth schema for single-item views, and an admin schema that features all the pieces.
# 5. Changing JSON to and from Dot Notation
Some methods use flat key-value shops, however you wish to work with nested JSON in your code. Changing between flat dot-notation keys and nested buildings helps obtain this.
Here’s a pair of capabilities for bidirectional conversion.
// Changing JSON to Dot Notation
def json_to_dot_notation(knowledge, parent_key=''):
"""
Convert nested JSON to flat dot-notation dictionary.
Args:
knowledge: Nested dictionary
parent_key: Prefix for keys (utilized in recursion)
Returns:
Flat dictionary with dot-notation keys
"""
gadgets = {}
if isinstance(knowledge, dict):
for key, worth in knowledge.gadgets():
new_key = f"{parent_key}.{key}" if parent_key else key
if isinstance(worth, dict):
gadgets.replace(json_to_dot_notation(worth, new_key))
else:
gadgets[new_key] = worth
else:
gadgets[parent_key] = knowledge
return gadgets
// Changing Dot Notation to JSON
def dot_notation_to_json(flat_data):
"""
Convert flat dot-notation dictionary to nested JSON.
Args:
flat_data: Dictionary with dot-notation keys
Returns:
Nested dictionary
"""
consequence = {}
for key, worth in flat_data.gadgets():
elements = key.break up('.')
present = consequence
for i, half in enumerate(elements[:-1]):
if half not in present:
present[part] = {}
present = present[part]
present[parts[-1]] = worth
return consequence
Let’s check the round-trip conversion:
import json
# Unique nested JSON
config = {
"app": {
"identify": "MyApp",
"model": "1.0.0"
},
"database": {
"host": "localhost",
"credentials": {
"username": "admin",
"password": "secret"
}
},
"options": {
"analytics": True,
"notifications": False
}
}
# Convert to dot notation (for atmosphere variables)
flat = json_to_dot_notation(config)
print("Flat format:")
for key, worth in flat.gadgets():
print(f" {key} = {worth}")
print("n" + "="*50 + "n")
# Convert again to nested JSON
nested = dot_notation_to_json(flat)
print("Nested format:")
print(json.dumps(nested, indent=2))
Output:
Flat format:
app.identify = MyApp
app.model = 1.0.0
database.host = localhost
database.credentials.username = admin
database.credentials.password = secret
options.analytics = True
options.notifications = False
==================================================
Nested format:
{
"app": {
"identify": "MyApp",
"model": "1.0.0"
},
"database": {
"host": "localhost",
"credentials": {
"username": "admin",
"password": "secret"
}
},
"options": {
"analytics": true,
"notifications": false
}
}
The json_to_dot_notation perform flattens the construction by recursively strolling by means of nested dictionaries and becoming a member of keys with dots. In contrast to the sooner flatten perform, this one doesn’t deal with arrays; it’s optimized for configuration knowledge that’s purely key-value.
The dot_notation_to_json perform reverses the method. It splits every key on dots and builds up the nested construction by creating intermediate dictionaries as wanted. The loop handles all elements besides the final one, creating nesting ranges. Then it assigns the worth to the ultimate key.
This strategy retains your configuration readable and maintainable whereas working inside the constraints of flat key-value methods.
# Wrapping Up
JSON processing goes past fundamental json.hundreds(). In most tasks, you’ll need instruments to navigate nested buildings, rework shapes, merge configurations, filter fields, and convert between codecs.
The strategies on this article switch to different knowledge processing duties as nicely. You possibly can modify these patterns for XML, YAML, or customized knowledge codecs.
Begin with the secure entry perform to stop KeyError exceptions in your code. Add the others as you run into particular wants. Blissful coding!
Bala Priya C is a developer and technical author from India. She likes working on the intersection of math, programming, knowledge science, and content material creation. Her areas of curiosity and experience embrace DevOps, knowledge science, and pure language processing. She enjoys studying, writing, coding, and low! At present, she’s engaged on studying and sharing her information with the developer group by authoring tutorials, how-to guides, opinion items, and extra. Bala additionally creates partaking useful resource overviews and coding tutorials.
