Gephi (GEXF)#

GEXF (Graph Exchange XML Format) is a common interchange format used by Gephi and other tools. This notebook covers a small local sample and a medium-sized dataset with GEXF viz metadata.

[1]:
import os
from pathlib import Path
from urllib.request import urlretrieve


import graphistry

# To specify Graphistry account & server, use:
# graphistry.register(api=3, username='...', password='...', protocol='https', server='hub.graphistry.com')
# For more options: https://pygraphistry.readthedocs.io/en/latest/server/register.html

[2]:
GRAPHISTRY_SERVER = os.environ.get("GRAPHISTRY_SERVER", "hub.graphistry.com")
GRAPHISTRY_PROTOCOL = os.environ.get("GRAPHISTRY_PROTOCOL", "https")
GRAPHISTRY_USERNAME = os.environ.get("GRAPHISTRY_USERNAME")
GRAPHISTRY_PASSWORD = os.environ.get("GRAPHISTRY_PASSWORD")

if not GRAPHISTRY_USERNAME or not GRAPHISTRY_PASSWORD:
    raise RuntimeError("Set GRAPHISTRY_USERNAME and GRAPHISTRY_PASSWORD to upload.")

graphistry.register(
    api=3,
    protocol=GRAPHISTRY_PROTOCOL,
    server=GRAPHISTRY_SERVER,
    username=GRAPHISTRY_USERNAME,
    password=GRAPHISTRY_PASSWORD,
)

[2]:
<graphistry.pygraphistry.GraphistryClient at 0x7a5cbe8579b0>
[3]:
gexf_path = Path("demos/demos_databases_apis/gexf/sample.gexf")
if not gexf_path.exists():
    gexf_path = Path("sample.gexf")
g = graphistry.gexf(str(gexf_path))

g._nodes.head()

[3]:
node_id label category viz_color viz_opacity viz_x viz_y viz_z viz_size viz_shape viz_shape_icon
0 n10 Delta typeA #EFAD42 0.5 10.0 20.5 0.0 2.50 disc circle
1 n11 Epsilon typeB #0A141E 1.0 -5.0 7.5 0.0 1.25 square square

GEXF viz attributes map to Graphistry bindings (color, size, position, opacity, icons). You can plot directly using the GEXF defaults:

[4]:
g.name("GEXF sample").plot()

[4]:

Medium GEXF demo: SiS Words#

This dataset includes GEXF viz encodings for node color, size, and position. The source uses a single color and size value, so the default plot looks uniform. Below we show the faithful default binding, how to drop GEXF colors/sizes while keeping layout, and then how to apply Graphistry encodings.

[5]:
DATA_URL = "https://raw.githubusercontent.com/medialab/medialab-network-dataset/master/SiS%20Words.gexf"
DATA_DIR = Path("demos/demos_databases_apis/gexf/data")
if not DATA_DIR.exists():
    DATA_DIR = Path("data")
GEXF_PATH = DATA_DIR / "sis_words.gexf"

DATA_DIR.mkdir(parents=True, exist_ok=True)
if not GEXF_PATH.exists():
    urlretrieve(DATA_URL, GEXF_PATH)

GEXF_PATH.exists()

[5]:
True
[6]:
g = graphistry.gexf(str(GEXF_PATH))
counts = {"nodes": len(g._nodes), "edges": len(g._edges)}
bindings = {
    "point_color": g._point_color,
    "point_size": g._point_size,
    "point_x": g._point_x,
    "point_y": g._point_y,
    "edge_color": g._edge_color,
    "play": g._url_params.get("play"),
}
counts, bindings

[6]:
({'nodes': 6704, 'edges': 71744},
 {'point_color': 'viz_color',
  'point_size': 'viz_size',
  'point_x': 'viz_x',
  'point_y': 'viz_y',
  'edge_color': None,
  'play': 0})
[7]:
g._nodes.head()

[7]:
node_id label class main occurences viz_size viz_x viz_y viz_z viz_color
0 w70401 populations indigènes populations et amélioration des conditions de vie True 3 10.0 -649.47797 -996.466860 0.0 #999999
1 w70416 impact des activités humaines réchauffement climatique et elavation du nivea... True 2 10.0 789.25270 10.201024 0.0 #999999
2 w70453 préservation de la qualité développement durable et environnement True 3 10.0 1131.04210 -927.317500 0.0 #999999
3 w70455 préservation de la nature préservation de la nature et de la biodiversité True 2 10.0 1068.51270 -995.344000 0.0 #999999
4 w70454 préservation des ressources naturelles développement durable et environnement True 4 10.0 982.29270 -890.796300 0.0 #999999
[8]:
g._edges.head()

[8]:
source target
0 w70401 w69745
1 w70401 w69741
2 w70401 w54632
3 w70401 w53692
4 w70401 w53637
[9]:
g.name("SiS Words (GEXF defaults)").plot()

[9]:

Drop GEXF colors/sizes (keep layout)#

Use bind_node_viz / bind_edge_viz to keep only the bindings you want. Here we keep position for layout, and drop color/size/opacity/icon bindings.

[10]:
g_layout_only = graphistry.gexf(
    str(GEXF_PATH),
    bind_node_viz=["position"],
    bind_edge_viz=[],
)

[11]:
g_layout_only.name("SiS Words (layout only)").plot()

[11]:

Apply Graphistry encodings#

After dropping bindings, use Graphistry encodings for color/size. Here we color by class using a categorical mapping for the most frequent classes (and a default for everything else), and size by occurences.

[12]:
required_cols = ["class", "occurences"]
missing_cols = [col for col in required_cols if col not in g_layout_only._nodes.columns]
assert not missing_cols, f"Missing expected node columns: {missing_cols}"

class_counts = g_layout_only._nodes["class"].value_counts()
top_classes = class_counts.head(8).index.tolist()
palette = ["#4C78A8", "#F58518", "#54A24B", "#E45756", "#72B7B2", "#EECA3B", "#B279A2", "#FF9DA6"]
class_color_map = dict(zip(top_classes, palette))

g_encoded = (
    g_layout_only
    .encode_point_color(
        "class",
        categorical_mapping=class_color_map,
        default_mapping="#D0D0D0",
        as_categorical=True,
    )
    .encode_point_size("occurences")
)

class_color_map

[12]:
{'génétique': '#4C78A8',
 'préservation de la nature et de la biodiversité': '#F58518',
 'commerce équitable': '#54A24B',
 'développement durable et environnement': '#E45756',
 'gaz à effet de serre et pollution de l air': '#72B7B2',
 'maîtrise de l énergie': '#EECA3B',
 'connaissance et domaine scientifique': '#B279A2',
 'culture scientifique et technique et pédagogie des sciences': '#FF9DA6'}
[13]:
g_encoded.name("SiS Words (layout + encodings)").plot()

[13]: