Connected JSON Specification

1. Introduction

We want a JSON-based document for exchanging graphs. Graphs contain nodes and edges. Undirected edges, directed edges (DAG), typed edges (Hello RDF), weighted edges (Hello flow algorithms) and even hyper-edges (Hello biologists). We want subgraphs (Hello diagrams). We want data attached to nodes and edges (Hello knowledge graphs).

1.1. Goals and Motivation

Yes, we know, but the last effort (JGF, the JSON Graph Format) is over 10 years old and GraphML over 20 years by now. And some GraphML features (mixed hyper-edges, nested graphs) are not supported in JGF. In fact, none of the existing JSON graph interchange formats has the same breadth of features as the over 20-year-old XML-based GraphML.

Connected JSON aims to be a full GraphML replacement. It supports the semantic capabilities and data representation found in GraphML, while adopting a more flexible, schema-less JSON approach.

This format is intended as a universal interchange format for all kinds of graphs, which can be as complex as what GraphML allows — and that is a lot.

For ways how to interpret similar, much more flexible formats unambiguously as Connected JSON, look into Extended CJ.

To support streaming for large graphs (> 1 GB) and to make textual diffing Connected JSON files easy, we also define Canonical Connected JSON.

1.2. Example

Connected JSON Example File
{
  "$schema": "https://j-s-o-n.org/schema/connected-json/7.0.0",
  "connectedJson": {
    "versionDate": "2026-01-15",
    "versionNumber": "7.0.0"
  },
  "baseUri": "https://example.org/",
  "graphs": [{
    "nodes": [
      { "id":  "12" },
      { "id":  "a",
        "ports": [
          { "id": "a1"},
          { "id": "a2",
            "ports": [ "a2-1", "a2-2" ]
          }]},
      { "id":  "b", "data": {"foo": "bar"} },
      { "id":  "c" },
      { "id":  "d" },
      { "id":  "e" },
      { "id":  "f" }
    ],
    "edges": [
      { "endpoints": [
        { "direction": "in", "node":  "12"},
        { "direction": "out", "node":  "a"}
      ]},
      { "endpoints":  [
        { "direction": "in", "node": "12", "port":  "a2-1"},
        { "direction": "out", "node": "a"}
      ]},
      { "endpoints":  [
        { "direction": "in", "node": "12"},
        { "direction": "in", "node": "a", "port": "a2-1" },
        { "direction": "out", "node": "d"},
        { "direction": "out", "node": "e"}
      ]},
      { "endpoints":  [
        { "direction": "in", "node": "12"},
        { "direction": "in", "node": "a"},
        { "direction": "out", "node": "d"},
        { "direction": "out", "node": "e"},
        { "direction": "undir", "node": "f"}
      ]}
    ],
    "data": {
      "hello": ["My data","can be","here"]
    }
  }]
}

1.3. Change Log

Every potentially breaking change increments the major version number. See semantic versioning.

2026-01-15: Version 7.0.0
  • Simplified IDs: Duplicates are not allowed.

  • Added URI (Uniform Resource Identifier) concept.

  • Added baseUri also to graphs (was only document before).

  • Simplified edge and endpoint type definition.

  • Added node types.

  • JSON schema: Fixed $schema as allowed property, removed $id.

  • Extended Connected JSON is becoming a JSON Graph Entry Format (JSON GEF) in contrast to CJ, which is a graph interchange format.

  • Migrated label to a JSON object with an entries array.

2025-09-23: Version 6.0.0
  • Removed graph.meta properties nodeCountTotal, edgeCountTotal, nodeCountInGraph, and edgeCountInGraph.

  • Moved canonical from graph.meta to document connectedJson.

  • Removed graph metadata.

2025-07-14: Version 5.0.0
  • Split spec into two parts: Connected JSON for writing strict files, where there is always only one option to encode a structure and Extended CJ which is much more liberal and flexible in parsing.

  • Moved edgeDefault to Extended CJ.

2025-07-10: Version 4.0.0
  • Simplified graph nesting. Now a CJ document is a graph (or array of graphs).

2025-07-03: Version 3.0.0
  • Renamed all properties with a dash to camelCase form. This makes it pragmatically easier to represent properties in programming languages as variable names or enum values.

    • type-nodetypeNode

    • type-uritypeUri

  • Renamed some lowercase properties to camelCase form. This avoids IDEs and editors complaining about spelling.

    • baseuribaseUri

    • edgedefaultedgeDefault

2025-06-26: Version 2.0.0
  • Multilingual labels (Label): switched from a JSON object with language tags as property keys to a more canonical array-form.

2025-04-30: Version 1.1.0
  • Clearer ID section

  • Allow graph inside edge (consistent with diagram an GraphML)

2025-04-08: Version 1.0.0

Initial public release

2. Overview

Suggested MIME type: application/connected+json (not yet registered).

We define two main formats:

Connected JSON (CJ)

A strict format for writing. There is always only one option to encode a structure.

Extended CJ (ECJ)

A relaxed superset of CJ for reading. It offers many aliases, shortcuts and variants to interpret JSON as as graph. See Extended CJ Specification.

These main formats are refined based on allowing comments (JSON5 adds comments to JSON) and canonicalization:

Table 1. The Connected JSON Formats
Name Default file extension Purpose Allows JSON Comments

Defined in Connected JSON (this specification)

Connected JSON

.cj or .cj.json

Written by tools

no

Connected JSON

.cj.json5

Written by tools, commented by humans.

yes

Canonical Connected JSON

.cj

Optimized for streaming and diffing

no

Defined in Extended CJ

Extended Connected JSON

.json

Read diverse JSON files

no

Extended Connected JSON

.json5

Read diverse JSON files

yes

All formats restrict JSON to the I-JSON subset defined in RFC 7493: No duplicate object properties, UTF-8 encoding, no unpaired UTF-8 surrogate pairs.

2.1. Conceptual Model

Before diving into JSON structures, it is helpful to describe how Connected JSON sees a graph. In general, Connected JSON supports hyperedges with mixed directionality, like GraphML. It also keeps the node and optional port model from GraphML. It supports two ways of Graph Nesting. Connected JSON allows (multilingual) labels on many elements.

  • A document contains graphs.

  • A graph contains nodes and edges.

  • A node may optionally consist of a hierarchical tree of ports.

  • An edge refers to nodes via endpoints.

  • An endpoint defines for each edge-node connection, what the direction is (is the node going into the edge, out of the edge or has no direction)

  • An endpoint can connect to a node and optionally fine-tune to a port within that node.

Conceptual Model
Figure 1. Conceptual Model

3. Addressing Elements

3.1. ID

IDs (identifiers) are used in Connected JSON to address nodes, ports, edges and graphs. Ids are strings.

Identifier Scope

A CJ document MAY NOT contain the same id twice. The identifiers for different elements have different scopes in which they must be unique.

Summary: All ids must be document-wide unique, except port ids, which are only node-local.

Scope

Comment

Document

Node ids, Edge ids and Graph ids are unique per document. Nested graphs do not provide a new id scope.

Node

Port ids are only unique within their corresponding Node.

3.2. Base URI

Similar to the way HTML resolves local links, CJ resolves local ids via a baseUri. CJ supports nested graphs and each graph may define a baseUri. Deeper nested baseUri declarations overwrite higher defined baseUri declarations.

Active Base URI

For any element (graph, node, edge) the active base URI is the baseUri of the nearest enclosing graph. For a graph, this can be the baseUri property of itself.

  • If no baseUri is defined in any graph, the active base URI is the baseUri of the document.

  • If the baseUri of the document is not defined within CJ, it may be provided at export time by the environment as a parameter.

  • If no baseUri is given by any means, the baseUri is implied as the empty string.

See also the Interpretation as RDF of a CJ document.

3.3. URI (Uniform Resource Identifier)

For graph, edge and node, CJ computes a URI, a globally unique identifier (URI, RFC 3986). URIs in CJ are computed from the id property.

  • If an element id is not present (e.g. property not given or the empty string or another JSON type): Element has no URI.

  • id contains a colon (:): URI is set to the id string.

  • id contains no colon: URI is computed by concatenating the active Base URI with the id string.

Nodes are duplicates if they have the same ID or URI. Duplicates are not allowed in a CJ graph.

Example

A node with id aaa base be a duplicate of a node with id https://example.org#aaa if the baseUri is set to https://example.org#.

Ports have no URI.

RDF

Graph URIs are only used for RDF export. RDF blank nodes are represented using the pseudo-URI-schema _: + a locally unique identifier. Nodes with a _blank node_ id (starting with _:) or without and id are both exported as _blank nodes_ in RDF.

Computed URIs
Figure 2. Computed URIs

3.4. Addressing

An endpoint refers to a node, and optionally to a port within the referenced node. A edge or endpoint type refers to a node.

  • endpoint.node may be a node id or node URI.

  • endpoint.port may be a port id, but not a URI.

  • endpoint.type may be a node id or node URI.

  • edge.type may be a node id or node URI.

Why? This mechanism allows CJ to be used completely ignoring URIs and baseUri. However, once desired, CJ graphs can be exported with URIs to the semantic web, and remembering this publishing decision in the baseUri. Mixed usage is also possible.

Addressing
Figure 3. Addressing

4. Elements

4.1. Document

Every file is a document.

Table 2. Property Table in Canonical / Streaming Order
Property Type Description

connectedJson

object(Document Metadata)

Optional. Document Metadata

baseUri

string(URI)

Optional. See Base URI.

data

any

Optional. Allows user-attached Data.

graphs

array(Graph [])

Default: Empty. See also Graph Nesting.

4.1.1. Document Metadata

A graph MAY state a connectedJson property, which is only interpreted at root level.

Property Type Description

canonical

boolean

Optional. If true, this document is considered a canonical representation: All properties are ordered according to the property tables. Default: false.

versionDate

string

Optional. Version date identifier to define the Connected JSON version used by the document. E.g. 2025-07-10

versionNumber

string

Optional. Version number identifier to define the Connected JSON version used by the document. E.g. 4.0.0

4.2. Label

Labels are used in Connected JSON to label nodes, ports, edges and graphs. In Connected JSON, labels are multilingual: Each Label Entry is an object with an optional language property and a required value property. The label itself is an object with an entries array of label entries.

Label Example
{
  "entries": [
    {"language":"de", "value": "Hallo, Welt"},
    {"language":"en", "value": "Hello, World"},
    // a value without language information is also allowed
    { "value": "Hi"}
  ]
}
Table 3. Property Table in Canonical / Streaming Order
Property Type Description

entries

array(Label Entry)

Optional.

data

any

Optional. Allows user-attached Data.

4.2.1. Label Entry

A language tag with an empty string is interpreted as the default language, the same as an absent language tag. Each language tag (including the absent one) may be used at most once.

Table 4. Property Table in Canonical / Streaming Order
Property Type Description

language

string

Optional. Language tag. Usually according to BCP 47.

value

string

Required. The label value.

data

any

Optional. Allows user-attached Data.

Multilingual labels in Connected JSON have been modelled similar to labels in JSON-LD 1.1, expanded form.

4.3. Graph

Contains one or more nodes and/or one or more edges.

Table 5. Property Table in Canonical / Streaming Order
Property Type Description

id

string

Optional. Unique identifier for the graph within a Document. See ID.

baseUri

string(URI)

Optional. See Base URI.

label

object

Optional. Label (name) of the graph. See Label.

data

any

Optional. Allows user-attached Data.

nodes

array(Node [])

0 to n nodes. Default: Empty.

edges

array(Edge [])

0 to n edges (which may be bi- oder hyperedges). Default: Empty.

graphs

array(Graph [])

Default: Empty. See Graph Nesting.

4.4. Node

A node is an atom in the graph.

Table 6. Property Table in Canonical / Streaming Order
Property Type Description

id

string

Required. Unique identifier for the node. See ID.

label

object

Optional. Label (name) of the graph. See Label.

ports

array(Port [])

Optional array of Port.

types

array(string)

Optional. The types of this node, each given as a node id or node URI.

data

any

Optional. Allows user-attached Data.

graphs

array (Graph [])

Optional. Graph(s) nested within the node. This turns the node into a compound node. The edges in a subgraph can refer to nodes higher up in the tree of graphs. See Graph Nesting.

4.5. Port

A port is always a part of a Node. A layout should place a port on the border of the node widget. Ports may be hierarchically nested. This is used in practice graphical editors, where a port is a connection point on a node.

Table 7. Property Table in Canonical / Streaming Order
Property Type Description

id

string

Required. ID unique within the Node. All ports, even nested one, share the same ID space per node. See also ID.

label

object

Optional. Label (name) of the graph. See Label.

ports

array(Port [])

Optional array of sub-ports. Recursively.

data

any

Optional. Allows user-attached Data.

4.6. Edge

Uses endpoints to link to nodes.

The structural model for any edge is this:

Edge Model
Figure 4. Edge Model
  • An edge has n endpoints.

  • An endpoint defines the direction of the attached node, relative to the edge. Is the node incoming, outgoing or undirected (from the perspective of the edge).

  • A target can be a node or a port attached to a port. Yes, a port can also be nested within other ports, forming a kind of recursive port-tree. GraphML has this.

Edges have been modelled like GraphML. They have been extended with a type-property, to make it easier to express RDF.
Table 8. Property Table in Canonical / Streaming Order
Property Type Description

id

string

Optional id. Unique per document. See ID.

label

object

Optional. Label (name) of the graph. See Label.

type

string

Optional. The kind of edge. Any type defined here applies to all endpoints. Endpoints override this type, if set. See Edge Endpoint and Addressing.

endpoints

array (Edge Endpoint [])

The endpoints define the nodes to which this edge is attached.

data

any

Optional. Allows user-attached Data.

graphs

array (Graph [])

Optional. Graph(s) nested within the edge. This turns the edge into a compound edge. The edges in a sub-graph can refer to edges higher up in the tree of graphs. See Graph Nesting.

Weighted edges should use data/weight to store a JSON number as edge weight.

4.7. Edge Endpoint

Table 9. Property Table in Canonical / Streaming Order
Property Type Description

node

string

Required. Node id. A string containing a single nodeId (ID). This is the id of the Node to which this endpoint is attached.

port

string

Optional. Port id. Port ids are only unique per node/port. See ID. If a port is referenced, it defines in addition to the node where precisely the endpoint is attached. NOTE: All port ids are unique within a node (see ID), so that a single string can address all ports directly.

direction

One of: in, out or undir

Optional. Maps to incoming (in), outgoing (out), or undirected (undir). Default is undir.

type

string

Optional. The type of relation from the edge entity to the endpoint node. Default type is related. See Edge Endpoint and Addressing.

data

any

Optional. Allows user-attached Data.

Edge Type (type)
  • Usually, the type of edge is defined at the Edge level. However, in hyper-edges more complex relations (tuples) may need to be expressed. In this case, endpoint-level typing can be used.
    If both edge and endpoint types are given, the endpoint type has precedence. See also Interpretation as RDF.

5. Features

5.1. Data

User-defined data can be attached to Document, Graph, Node, Edge, Port and Edge Endpoint via the data property.The value may be any JSON value. An array can be used, together with the OCIF extension mechanism.

This can be used, for example, to attach style data (e.g. line-color), domain data (e.g. population, sales volume), provennance data (e.g. source), or any other relevant information.

5.2. Graph Nesting

Graphs can be nested within other graphs (Graphs In Graphs) or within other nodes and edges (Graphs In Nodes And Edges; a GraphML mechanism). The nesting depth is not limited. This allows for hierarchical, recursive graph structures.

All nodes in a top-level graph, including all nodes nested within subgraphs, recursively, share the same ID space. The same is true for edges. Any edges, including those nested in nested graphs, may link to any node within the top-level graph, including those within nested graphs.

Graph Nesting
Figure 5. Graph Nesting

5.2.1. Graphs In Graphs

It partitions nodes and edges into subsets. All nodes and edges are treated as one large graph. Any edge can refer to any node. The subgraph is merely used as a container entity. Its id and label do not contribute to the resulting nodes and edges model.

5.2.2. Graphs In Nodes And Edges

In Connected JSON, like in GraphML, nodes and edges can also contain subgraphs. Those subgraphs are additionally turning their container node into a compound node (or their container edge into a compound edge).

In a compound node, the ID and Label of the subgraph(s) are mapped to id and label of synthetic, implied compound node(s). Typically, this is represented in an application by adding synthetic 'contains'-edges from container element to contained elements.

5.3. Streaming

JSON in general is not ideal for streaming data, see also Notes on Streaming JSON. However, Canonical CJ is designed to be streamed efficiently. The property tables are sorted for optimized stream processing. This order is in contrast to RFC 8785 (JSON Canonicalization Scheme, JCS), which defines strict lexicographical order. Canonical CJ requires the order of properties to be followed exactly.

Rationale

Most entities are expected to be reasonably small, so that they can be completely processed in memory. Some entities may occur a large number of times. In general, small properties must come before the large properties (due to values with many child elements).

6. Canonical Connected JSON

Canonical CJ defines a strict order on property keys, compatible with Streaming, so that files can also be used in textual diffs. Canonical CJ is a strict subset of Connected JSON. It forbids using comments (no JSON5). Canonical CJ mandates a strict formatting, described below. Properties in which the value is an empty array should be omitted.

Summary
  • Mandatory pretty-printing

  • Mandatory property order

6.1. Formatting (Pretty-Print)

There is no RFC defining JSON pretty-printing. So here is a small spec. We need a compact, defined, format, so that different CJ tools create the exact same syntax. Also, we need line-breaks to make textual diffing work. Canonical CJ compliant tools MUST adhere to these rules:

Indentation
  • Each level of nesting within an object or array must be indented.

  • The indentation must consist of two spaces. Tabs must not be used.

Line-Breaks
  • The line break character is \n.

  • The opening brace { of an object and the opening bracket [ of an array must be placed on the same line as their corresponding key or at the beginning of the document.

  • Each key-value pair in an object and each element in an array must be placed on its own line.

  • The closing brace } or bracket ] must be placed on a new line, aligned with the indentation level of its opening brace or bracket.

Spacing
  • There must be one space after the colon : in a key-value pair.

  • No other whitespace (except the indentation spaces and line-breaks) is permitted.

Commas
  • A comma , must follow every element in an array and every key-value pair in an object, except for the last one.

Example for Canonical Connected JSON
{
  "$schema": "https://j-s-o-n.org/schema/connected-json/7.0.0",
  "connectedJson": {
    "canonical": true,
    "versionDate": "2026-01-15",
    "versionNumber": "7.0.0"
  },
  "baseUri": "https://example.org/",
  "data": {
    "author": "Max Völkel"
  },
  "graphs": [
    {
      "id": "graph1",
      "label": {
        "language": "en",
        "value": "Example Graph"
      },
      "nodes": [
        {
          "id": "node1",
          "label": {
            "language": "en",
            "value": "Node 1"
          }
        }
      ],
      "edges": [
        {
          "id": "edge1",
          "label": {
            "language": "en",
            "value": "Edge from Node 1 to Node 2"
          },
          "endpoints": [
            {
              "node": "node1",
              "direction": "out"
            }
          ]
        }
      ]
    }
  ]
}

Appendix A: JSON Schema

Appendix B: Reserved Property Names

The following property names are used by Connected JSON in certain places.

Property Usage

baseUri

Document, Graph as base URI for RDF interpretation

connectedJson

Document

canonical

Document Metadata

data

Reserved property for user data. Connected JSON does not interpret contents of this property for any element.

direction

Edge Endpoint direction (in/out/undir)

edges

Graph edges

entries

Label entries

endpoints

Edge endpoints

graphs

Node nested graphs, Edge nested graphs

id

Node id, Edge id, Graph id, Port id

label

Node, Edge, Graph, Port

language

Label

node

Edge Endpoint referenced node id

nodes

Graph nodes

port

Edge Endpoint referenced port id

ports

Node ports

type

Edge, Edge Endpoint

types

Node

value

Label

versionDate

Document Metadata

versionNumber

Document Metadata