Mastering JSON Manipulation in Cortex XQL - Beyond the Basics

A comprehensive guide to JSON extraction in Cortex XQL. Learn how to parse complex nested structures, handle arrays, and use advanced XQL operators with real-world data samples.
In modern security operations, data is rarely flat. In Cortex XDR, the most valuable insights—like asset ownership, vulnerability details, and cloud metadata—are often stored as JSON strings within fields like xdm.issue.extended_fields or raw_log.
To be a master threat hunter, you must know how to peel back these layers. This guide provides a deep dive into JSON manipulation with realistic data samples and detailed XQL queries.
The Sample Dataset
Throughout this guide, we will refer to a hypothetical extended_fields JSON object that follows the standard Cortex CSM (Cloud Security Management) structure:
{
"cve_id": "CVE-2024-1234",
"xdm_assets": [
{
"xdm__asset__name": "SEC-SVR-01",
"xdm__asset__realm": "Cloud-Prod-West",
"xdm__asset__type": "Virtual Machine",
"owner": {
"owner_name": "DevSecOps Team",
"email": "[email protected]"
}
}
],
"user_context": {
"department": "Finance",
"login_geo": "US"
},
"metrics": {
"sent_bytes": "15728640",
"usage_pct": "87.5"
},
"indicators": ["192.168.1.100", "8.8.8.8", "malicious-site.com"]
}
1. The Workhorse: json_extract_scalar
This function is designed to pull a single value (string, number, or boolean) and return it as a XQL-native string.
The Goal: Identify the Department
We want to extract the department from the user_context object to audit finance-related activity.
The Query:
dataset = xdr_data
| alter dept = json_extract_scalar(additional_data, "$.user_context.department")
| filter dept == "Finance"
| comp count() as finance_activity by action_process_name
Detailed Breakdown:
$.user_context.department: The$represents the root of the JSON. We then navigate through keys using dot notation.- Result: The function returns
"Finance". - Limitation: If you tried to extract
$.user_context, the function would returnnullbecauseuser_contextis an object, not a scalar value.
2. Handling Nested Structures: json_extract
When you need to extract a whole sub-section (like an entire array or object) to process it later, use json_extract.
The Goal: Isolate Asset Metadata
In vulnerability management, you often need to grab the entire asset record to perform multiple extractions from it.
The Query:
dataset = issues
| alter first_asset = json_extract(xdm.issue.extended_fields, "$.xdm_assets[0]")
| alter asset_type = json_extract_scalar(first_asset, "$.xdm__asset__type")
| filter asset_type != null
Detailed Breakdown:
$.xdm_assets[0]: Uses array indexing to grab the first item in the asset list.- Return Value: Unlike
json_extract_scalar, this returns the entire stringified JSON:{"xdm__asset__name": "SEC-SVR-01", ...}. - Why use this? It saves you from writing
$.xdm_assets[0]over and over again in subsequentalterstages.
3. Dealing with Lists: Array Functions
Handling arrays of IP addresses or indicators is a common SOC requirement.
The Goal: Find a Specific Malicious IP
We want to check if the indicators list contains a known malicious IP.
The Query:
dataset = cloud_logs
| alter ioc_list = json_extract_array(raw_log, "$.indicators")
| filter array_contains(ioc_list, "192.168.1.100")
| fields _time, ioc_list, action
Detailed Breakdown:
json_extract_array: Converts the JSON string["192.168.1.100", ...]into a native XQL array.array_contains: This function only works on native arrays, making the previous step mandatory.json_extract_scalar_array: If your only goal is to display the indicators cleanly in a report without quotes, use this function instead.
4. Advanced: The "Swiss Army Knife" (JSONPath)
For complex extractions, Cortex supports recursive descent and wildcards through json_path_extract.
The Goal: Find the Owner Name Anywhere
If your JSON structure changes (e.g., owner is sometimes in asset and sometimes in project), you can search for the key globally.
The Query:
dataset = issues
| alter owner = json_path_extract(xdm.issue.extended_fields, "$..owner_name")
// The $.. syntax triggers a recursive search
Syntactic Sugar: The Operators
Cortex provides two extremely helpful operators for cleaner code:
->: Shortcut forjson_extract.->->: Shortcut forjson_extract_scalar.
Modernized Query:
dataset = issues
| alter name = xdm.issue.extended_fields ->-> "$.xdm_assets[0].xdm__asset__name"
5. Final Step: Data Type Casting
Extracted JSON data is always a string by default. To do math or time analysis, you must cast it.
The Goal: Filter by Traffic Volume (Bytes)
In our sample data, sent_bytes is "15728640". As a string, we can't check if it's greater than a number.
The Query:
dataset = network_logs
| alter bytes = to_integer(json_extract_scalar(raw_payload, "$.metrics.sent_bytes"))
| filter bytes > 10485760 // Greater than 10MB
| comp sum(bytes) as total_outbound by bin(_time, 1h)
| Cast Function | Use Case |
|---|---|
to_integer() | Count of issues, byte sizes, port numbers. |
to_float() | Percentages (like usage_pct), risk scores. |
to_timestamp() | Custom event times within JSON logs. |
Best Practices Summary
- Casing: Always double-check your casing.
$.Owner$\neq$$.owner. - Validation: Use
to_json_string()if your extraction returns null on a field you know is there—the field might not be properly typed as JSON yet. - Visualization: Use
json_extract_scalar_arrayfor dashboard tables; it removes the brackets and quotes that often clutter UI widgets.
Mastering these JSON functions transforms you from a basic user into a technical power user who can squeeze every bit of value from Cortex logs.
Happy Hunting!