Development Setup#
See also CONTRIBUTING.md and ARCHITECTURE.md
Development is setup for local native and containerized Python coding & testing, and with automatic GitHub Actions for CI + CD. The server tests are like the local ones, except against a wider test matrix of environments.
LFS#
We are starting to use git lfs for data:
# install git lfs: os-specific commands below
git lfs install
git lfs checkout
git lfs: ubuntu#
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs
Docker#
Install#
cd docker && docker compose build && docker compose up -d
For just CPU tests, you can focus on test-cpu and use the run instructions below:
cd docker && docker compose build test-cpu
Run local tests without rebuild#
Containerized call to pytest for CPU + GPU modes:
cd docker
# cpu - pandas
./test-cpu-local.sh
# cpu - fast & targeted
WITH_LINT=0 WITH_TYPECHECK=0 WITH_BUILD=0 ./test-cpu-local.sh graphistry/tests/test_hyper_dask.py::TestHypergraphPandas::test_hyper_to_pa_mixed2
# gpu - pandas, cudf, dask, dask_cudf; test only one file
./test-gpu-local.sh graphistry/tests/test_hyper_dask.py
Connector tests (currently neo4j-only): cd docker && WITH_NEO4J=1 ./test-cpu-local.sh (optional WITH_SUDO=" ")
Will start a local neo4j (docker) then enable+run tests against it
Remote Graphistry integration tests are opt-in because they require a live server and credentials:
TEST_REMOTE_INTEGRATION=1 \
GRAPHISTRY_API_TOKEN=<jwt> \
python -m pytest graphistry/tests/compute/test_chain_let_remote_integration.py
Use GRAPHISTRY_USERNAME/GRAPHISTRY_PASSWORD instead of GRAPHISTRY_API_TOKEN when token auth is not available. For service-account style authentication in application code, prefer personal_key_id + personal_key_secret. Optional env vars: GRAPHISTRY_SERVER and GRAPHISTRY_TEST_DATASET_ID.
Docs#
Automatically build via ReadTheDocs from inline definitions.
To manually build, see docs/.
Ignore files#
You may need to add ignore rules:
ruff: pyproject.toml (or bin/lint.sh)
mypi: mypi.ini
sphinx: docs/source/conf.py
Remote#
Some databases like Neptune can be easier via cloud editing, especially within Jupyter:
git clone https://github.com/graphistry/pygraphistry.git
git checkout origin/my_branch
pip install --user -e .
git diff
and
import logging
logging.basicConfig(level=logging.DEBUG)
import graphistry
graphistry.__version__
CI#
GitHub Actions: See .github/workflows
CI runs on every PR and updates them
Cypher Surface Growth Guard#
CI includes cypher-frontend-surface-guard, which enforces bounded growth for:
graphistry/compute/gfql/cypher/lowering.pytotal line countCompiledCypherQuery,CompiledGraphBinding,CompiledCypherGraphQuerydataclass field/property counts
Guard implementation + baseline:
Script:
bin/ci_cypher_surface_guard.pyBaseline:
bin/ci_cypher_surface_guard_baseline.json
If growth is intentional, regenerate baseline in your branch and include explicit PR rationale:
python bin/ci_cypher_surface_guard.py --write-baseline
Then commit both code changes and baseline update together.
GPU CI#
GPU CI can be manually triggered by core dev team members:
Push intended changes to protected branches
gpu-publicormasterManually trigger action ci-gpu on one of the above branches
GPU tests can also be run locally via ./docker/test-gpu-local.sh .
Debugging Tips#
Use the unit tests
use the
loggingmodule per-file
Publish: Merge, Tag, & Upload#
Update CHANGELOG.md in your PR branch
Convert
## [Development]section to## [X.Y.Z - YYYY-MM-DD]Document all changes following Keep a Changelog format
Commit and push to PR branch
Merge the PR to master (via GitHub UI or
gh pr merge)Switch to master and pull the merged changes
git checkout master git pull --ff-only origin master git status --short # should be empty before tagging
Tag the repository with the new version number (semantic versioning X.Y.Z)
git tag X.Y.Z git push origin refs/tags/X.Y.Z
Confirm the publish Github Action published to pypi
Auto-triggers on tag push
Expected gate: on tag-triggered releases, the final
Publish distribution to PyPIjob can pause inwaitinguntil a maintainer approvesReview deploymentsfor environmentpypi-release.If the run is waiting, open the run page and approve
Review deployments, then wait for the PyPI job to complete.If manually triggering (
workflow_dispatch), chooserelease_mode:evidence: build + SBOM + provenance + evidence artifacts only (no publish)test: includes TestPyPI publish, skips PyPI (uses synthetic runner-local version0.0.dev<run_id>to avoid local-version upload rejection)release: TestPyPI + PyPI publish (restricted tomaster, withpypi-releaseapproval)
Do not rerun publish for a version that is already on PyPI (duplicate-file uploads are rejected)
Verify version appears on PyPI:
curl -s https://pypi.org/pypi/graphistry/json | jq -r '.info.version'Verify release evidence artifacts from the workflow run:
built distributions (
dist/*.whl,dist/*.tar.gz)SBOM (
evidence/sbom-cyclonedx.json)GitHub build provenance attestation for built distributions (
dist/*.whl,dist/*.tar.gz)
Keep the PyPI Trusted Publisher binding aligned with this workflow:
repository:
graphistry/pygraphistryworkflow file:
.github/workflows/publish-pypi.ymlenvironment:
pypi-releaserefs: tag pushes and
workflow_dispatchonmasteronly
This workflow publishes with attestations enabled for both TestPyPI and PyPI.
Toggle version as active at ReadTheDocs
Create GitHub Release with detailed release notes
gh release create X.Y.Z --title "vX.Y.Z - Brief Title" --notes "Release notes in markdown..."
Or create via GitHub UI: https://github.com/graphistry/pygraphistry/releases/new?tag=X.Y.Z
Release notes should include:
Critical fixes and breaking changes (if any)
Major features from current and recent versions
Links to full CHANGELOG and installation instructions
Highlight important API changes, new capabilities, and use cases
CI Dependency Lockfiles#
CI uses per-Python-version hashed lockfiles for supply chain security:
Generation: A
generate-lockfilesCI job runsbin/generate-lockfiles.shto produce lockfiles for all profile × Python version combos. Most are uploaded as artifacts, not committed.ReadTheDocs lockfile:
requirements/rtd-py3.12.lockis committed because.readthedocs.ymlconsumes it directly. Update it when changing RTD’s Python version, docs/pygraphviz extras,setup.pydependency constraints that affect docs, or RTD install steps:PROFILES=rtd VERSIONS=3.12 ./bin/generate-lockfiles.sh
CI’s
check-rtd-lockfilejob regenerates only the RTD profile using the committed lockfile’s--exclude-newertimestamp and fails ifrequirements/rtd-py3.12.lockis out of sync. To fix a redcheck-rtd-lockfile, rerun the command above and commit the resulting lockfile.Spark lockfile:
requirements/spark-py3.14.lockis committed because thetest-sparkjob installs a small Spark-specific smoke-test environment without the broader test extras. Updaterequirements/spark-py3.14.inwhen changing the direct Spark smoke dependencies, then regenerate and commit the lockfile:PROFILES=spark VERSIONS=3.14 ./bin/generate-lockfiles.sh
CI’s
check-spark-lockfilejob uses the committed lockfile’s--exclude-newertimestamp and fails ifrequirements/spark-py3.14.lockis out of sync.6-day cooldown:
--exclude-newerensures no package published in the last 6 days is included, mitigating 0-day supply chain attacks.UV_EXCLUDE_NEWERis also set globally as belt-and-suspenders.Hash verification:
--require-hasheson install ensures tamper-proof installs (except AI/umap profiles where torch conflicts prevent it).Adding a dependency: After modifying most
setup.pyextras, CI automatically regenerates artifact lockfiles. If the change affects ReadTheDocs docs dependencies, also update and commitrequirements/rtd-py3.12.lock.Emergency override: Set
COOLDOWN_DAYS=0inbin/generate-lockfiles.shto disable the 6-day cooldown for urgent patches.