Gephi (GEXF)#
GEXF (Graph Exchange XML Format) is a common interchange format used by Gephi and other tools. This notebook covers a small local sample and a medium-sized dataset with GEXF viz metadata.
[1]:
import os
from pathlib import Path
from urllib.request import urlretrieve
import graphistry
# To specify Graphistry account & server, use:
# graphistry.register(api=3, username='...', password='...', protocol='https', server='hub.graphistry.com')
# For more options: https://pygraphistry.readthedocs.io/en/latest/server/register.html
[2]:
GRAPHISTRY_SERVER = os.environ.get("GRAPHISTRY_SERVER", "hub.graphistry.com")
GRAPHISTRY_PROTOCOL = os.environ.get("GRAPHISTRY_PROTOCOL", "https")
GRAPHISTRY_USERNAME = os.environ.get("GRAPHISTRY_USERNAME")
GRAPHISTRY_PASSWORD = os.environ.get("GRAPHISTRY_PASSWORD")
if not GRAPHISTRY_USERNAME or not GRAPHISTRY_PASSWORD:
raise RuntimeError("Set GRAPHISTRY_USERNAME and GRAPHISTRY_PASSWORD to upload.")
graphistry.register(
api=3,
protocol=GRAPHISTRY_PROTOCOL,
server=GRAPHISTRY_SERVER,
username=GRAPHISTRY_USERNAME,
password=GRAPHISTRY_PASSWORD,
)
[2]:
<graphistry.pygraphistry.GraphistryClient at 0x7a5cbe8579b0>
[3]:
gexf_path = Path("demos/demos_databases_apis/gexf/sample.gexf")
if not gexf_path.exists():
gexf_path = Path("sample.gexf")
g = graphistry.gexf(str(gexf_path))
g._nodes.head()
[3]:
| node_id | label | category | viz_color | viz_opacity | viz_x | viz_y | viz_z | viz_size | viz_shape | viz_shape_icon | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | n10 | Delta | typeA | #EFAD42 | 0.5 | 10.0 | 20.5 | 0.0 | 2.50 | disc | circle |
| 1 | n11 | Epsilon | typeB | #0A141E | 1.0 | -5.0 | 7.5 | 0.0 | 1.25 | square | square |
GEXF viz attributes map to Graphistry bindings (color, size, position, opacity, icons). You can plot directly using the GEXF defaults:
[4]:
g.name("GEXF sample").plot()
[4]:
Medium GEXF demo: SiS Words#
This dataset includes GEXF viz encodings for node color, size, and position. The source uses a single color and size value, so the default plot looks uniform. Below we show the faithful default binding, how to drop GEXF colors/sizes while keeping layout, and then how to apply Graphistry encodings.
[5]:
DATA_URL = "https://raw.githubusercontent.com/medialab/medialab-network-dataset/master/SiS%20Words.gexf"
DATA_DIR = Path("demos/demos_databases_apis/gexf/data")
if not DATA_DIR.exists():
DATA_DIR = Path("data")
GEXF_PATH = DATA_DIR / "sis_words.gexf"
DATA_DIR.mkdir(parents=True, exist_ok=True)
if not GEXF_PATH.exists():
urlretrieve(DATA_URL, GEXF_PATH)
GEXF_PATH.exists()
[5]:
True
[6]:
g = graphistry.gexf(str(GEXF_PATH))
counts = {"nodes": len(g._nodes), "edges": len(g._edges)}
bindings = {
"point_color": g._point_color,
"point_size": g._point_size,
"point_x": g._point_x,
"point_y": g._point_y,
"edge_color": g._edge_color,
"play": g._url_params.get("play"),
}
counts, bindings
[6]:
({'nodes': 6704, 'edges': 71744},
{'point_color': 'viz_color',
'point_size': 'viz_size',
'point_x': 'viz_x',
'point_y': 'viz_y',
'edge_color': None,
'play': 0})
[7]:
g._nodes.head()
[7]:
| node_id | label | class | main | occurences | viz_size | viz_x | viz_y | viz_z | viz_color | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | w70401 | populations indigènes | populations et amélioration des conditions de vie | True | 3 | 10.0 | -649.47797 | -996.466860 | 0.0 | #999999 |
| 1 | w70416 | impact des activités humaines | réchauffement climatique et elavation du nivea... | True | 2 | 10.0 | 789.25270 | 10.201024 | 0.0 | #999999 |
| 2 | w70453 | préservation de la qualité | développement durable et environnement | True | 3 | 10.0 | 1131.04210 | -927.317500 | 0.0 | #999999 |
| 3 | w70455 | préservation de la nature | préservation de la nature et de la biodiversité | True | 2 | 10.0 | 1068.51270 | -995.344000 | 0.0 | #999999 |
| 4 | w70454 | préservation des ressources naturelles | développement durable et environnement | True | 4 | 10.0 | 982.29270 | -890.796300 | 0.0 | #999999 |
[8]:
g._edges.head()
[8]:
| source | target | |
|---|---|---|
| 0 | w70401 | w69745 |
| 1 | w70401 | w69741 |
| 2 | w70401 | w54632 |
| 3 | w70401 | w53692 |
| 4 | w70401 | w53637 |
[9]:
g.name("SiS Words (GEXF defaults)").plot()
[9]:
Drop GEXF colors/sizes (keep layout)#
Use bind_node_viz / bind_edge_viz to keep only the bindings you want. Here we keep position for layout, and drop color/size/opacity/icon bindings.
[10]:
g_layout_only = graphistry.gexf(
str(GEXF_PATH),
bind_node_viz=["position"],
bind_edge_viz=[],
)
[11]:
g_layout_only.name("SiS Words (layout only)").plot()
[11]:
Apply Graphistry encodings#
After dropping bindings, use Graphistry encodings for color/size. Here we color by class using a categorical mapping for the most frequent classes (and a default for everything else), and size by occurences.
[12]:
required_cols = ["class", "occurences"]
missing_cols = [col for col in required_cols if col not in g_layout_only._nodes.columns]
assert not missing_cols, f"Missing expected node columns: {missing_cols}"
class_counts = g_layout_only._nodes["class"].value_counts()
top_classes = class_counts.head(8).index.tolist()
palette = ["#4C78A8", "#F58518", "#54A24B", "#E45756", "#72B7B2", "#EECA3B", "#B279A2", "#FF9DA6"]
class_color_map = dict(zip(top_classes, palette))
g_encoded = (
g_layout_only
.encode_point_color(
"class",
categorical_mapping=class_color_map,
default_mapping="#D0D0D0",
as_categorical=True,
)
.encode_point_size("occurences")
)
class_color_map
[12]:
{'génétique': '#4C78A8',
'préservation de la nature et de la biodiversité': '#F58518',
'commerce équitable': '#54A24B',
'développement durable et environnement': '#E45756',
'gaz à effet de serre et pollution de l air': '#72B7B2',
'maîtrise de l énergie': '#EECA3B',
'connaissance et domaine scientifique': '#B279A2',
'culture scientifique et technique et pédagogie des sciences': '#FF9DA6'}
[13]:
g_encoded.name("SiS Words (layout + encodings)").plot()
[13]: