Introduction by Example

We will briefly introduce the fundamental concepts of NeuroGraph through self-contained examples. We closely follow the data representation format of PyG. Therefore, interested readers are referred to the PyG documentation for an introduction to the graph machine learning and PyG’s data representation formats.

Loading Benchmark datasets

NeuroGraph provides two classes for loading static and dynamic benchmark datastes.

Loading Static Benchmarks

NeuroGraph utilizes the PyG InMemoryDataset class to facilitate the loading of datasets. this allows an easy-to-use interface for applying graph machine learning pipelines. For example, the HCPGender benchmark can be loaded as follows:

1from NeuroGraph.datasets import NeuroGraphDataset
2dataset = NeuroGraphDataset(root="data/", name= "HCPGender")
3print(dataset.num_classes)
4print(dataset.num_features)

Loading Dynamic Dataset

To efficiently store and utilize the dynamic datasets in PyG` Batch format, we provide the corresponding functionality. Here is an example of loading the DynHCPGender dataset:

The dataset is a list of dynamic graphs represented in the PyG batch format, making it compatible with graph machine learning pipelines.

Preprocessing Examples

To bridge the gap betwee NeuroGraph and graph machine learning domains, NeuroGraph offers tools to easily preprocess and construct graph-based neuroimaging datasets. Here, we demonstrate how to preprocess your own data to construct functional connectomes and generate corresponding graphs-based representations.

The corresponding Adjacency matrix and PyG data objects can be created from the functional_connectome as follows.

1from NeuroGraph import utils
2adj = utils.construct_adj(fc, threshold= 5) # construct the adjacency matrix
3data = utils.construct_data(fc, label= 1,threshold = 5) # construct PyG data object

We use correlation as node features while constructing data object from functional connectome.

The following is the source code for processing one fMRI scan with corresponding regressor using our preprocessing pipeline.

1from NeuroGraph import utils
2import numpy as np
3from nilearn.image import load_img
4img = load_img("data/raw/1.nii.gz") # 1.nii.gz is fMRI scan
5regs = np.loadtxt("data/raw/1.txt") # 1.txt is the movement regressor
6fmri = img.get_fdata()
7fc = utils.preprocess(fmri, regs, n_rois= 100)
8adj = utils.construct_adj(fc, threshold= 5) # construct the adjacency matrix
9data = utils.construct_data(fc, label = 1,threshold = 5) # construct torch Data object

Our preprocessing pipeline consists of five steps and can also be applied seperately in steps.

 1from NeuroGraph import utils
 2import numpy as np
 3from nilearn.image import load_img
 4
 5img = load_img("data/raw/1.nii.gz")
 6regs = np.loadtxt("data/raw/1.txt")
 7fmri = img.get_fdata()
 8parcells = utils.parcellation(fmri,n_rois = 100) ## this uses schaefer atlas by default
 9Y = utils.remove_drifts(parcells)
10Y = utils.regress_head_motions(Y,regs)
11fc = utils.construct_corr(Y)
12adj = utils.construct_adj(fc, threshold= 5) # construct the adjacency matrix
13data = utils.construct_data(fc, label = 1,threshold = 5)

Preprocessing Human Connectome Project (HCP1200) Dataset

NeuroGraph utilizes the HCP1200 dataset as a primary data source for exploring the dataset generation search space and constructing benchmarks. The HCP1200 dataset can be accessed from the HCP website by accepting the data usage terms. Additionally, the dataset is also available on an AWS S3 bucket, which can be accessed once authorization has been obtained from HCP. In this section, we provide various functions that allow you to crawl and preprocess the HCP datasets, enabling the construction of graph-based neuroimaging datasets. These functions streamline the process of obtaining and preparing the data for further analysis and modeling.

Download and preprocess static datasets

 1from NeuroGraph.preprocess import Brain_Connectome_Rest_Download
 2import boto3
 3
 4root = "data/"
 5name = "HCPGender"
 6threshold = 5
 7path_to_data = "data/raw/HCPGender"  # store the raw downloaded scans
 8n_rois = 100
 9n_jobs = 5 # this script runs in parallel and requires the number of jobs is an input
10
11ACCESS_KEY = ''  # your connectomeDB credentials
12SECRET_KEY = ''
13s3 = boto3.client('s3', aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY)
14# this function requires both HCP_behavioral.csv and ids.pkl files under the root directory. Both files have been provided and can be found under the data directory
15rest_dataset = Brain_Connectome_Rest_Download(root,name,n_rois, threshold,path_to_data,n_jobs,s3)

The provided function facilitates the download of data from the AWS S3 bucket, performs preprocessing steps, and generates a graph-based dataset. It is important to note that the rest_dataset used in this function consists of four labels: gender, age, working memory, and fluid intelligence. To create separate datasets based on these labels, the following functionalities can be used.

1from NeuroGraph import preprocess
2
3rest_dataset = preprocess.Brain_Connectome_Rest_Download(root,name,n_rois, threshold,path_to_data,n_jobs,s3)
4gender_dataset = preprocess.Gender_Dataset(root, "HCPGender",rest_dataset)
5age_dataset = preprocess.Age_Dataset(root, "HCPAge",rest_dataset)
6wm_datast = preprocess.WM_Dataset(root, "HCPWM",rest_dataset)
7fi_datast = preprocess.FI_Dataset(root, "HCPFI",rest_dataset)

To construct the State dataset, the following functionalities can be used.

1from NeuroGraph import preprocess
2
3state_dataset = preprocess.Brain_Connectome_State_Download(root, dataset_name,rois, threshold,path_to_data,n_jobs,s3)

If you have the data locally, then the following functionalities can be used to preprocess the data.

1from NeuroGraph import preprocess
2
3rest_dataset = preprocess.Brain_Connectome_Rest(root, name, n_rois, threshold, path_to_data, n_jobs)

Similarly, for constructing the State dataset, the following function can be used.

1from NeuroGraph import preprocess
2
3state_dataset = preprocess.Brain_Connectome_State(root, name, n_rois, threshold, path_to_data, n_jobs)

Download and preprocess dynamic datasets

We also offer similar functionalities for constructing dynamic datasets. You can create a dynamic REST dataset from the data stored locally as follows.

1from NeuroGraph import preprocess
2
3ngd = Dyn_Prep(fmri, regs, n_rois=100, window_size=50, stride=3, dynamic_length=None)
4dataset = ngd.dataset
5labels = ngd.labels
6print(len(dataset),len(labels))

Here the dataset is a list containing dynamic graphs in the form of PyG Batch, which can be easily fed into graph machine learning pipelines. The following examples demonstrate how a dynamic REST dataset can be downloaded and preprocessed on the fly.

1from NeuroGraph import preprocess
2
3dyn_obj = preporcess.Dyn_Down_Prep(root, name,s3,n_rois = 100, threshold = 10, window_size = 50,stride == 3, dynamic_length=150)
4dataset = dyn_obj.data_dict

Dyn_Down_Prep class downloads and preprocess the rest dataset and provides a dictionary that contains a list of dynamic graphs against each id. The dataset can be further prprocessed as follows to construct each benchmark.

 1from NeuroGraph import preprocess
 2
 3dyn_obj = preporcess.Dyn_Down_Prep(root, name,s3,n_rois = 100, threshold = 10, window_size = 50,stride == 3, dynamic_length=150)
 4dataset = dyn_obj.data_dict
 5gender_dataset, labels = [],[]
 6for k,v in dataset.items():
 7    if v is None:
 8        continue
 9    l = v[0].y
10    gender = int(l[0].item())
11    sub = []
12    for d in v:
13        new_data = Data(x = d.x, edge_index = d.edge_index, y = gender)
14        sub.append(new_data)
15    batch = Batch.from_data_list(sub)
16    gender_dataset.append(batch)
17    labels.append(gender)
18print("gender dataset created with {} {} number of instances".format(len(gender_dataset), len(labels)))
19new_dataset = {'labels':labels, "batches":gender_dataset}
20
21age_dataset, labels = [],[]
22for k,v in dataset.items():
23    if v is None:
24        continue
25    l = v[0].y
26    age = int(l[1].item())
27    if age <=2:  ### Ignoring subjects with age >=36
28        sub = []
29        for d in v:
30            new_data = Data(x = d.x, edge_index = d.edge_index, y = age)
31            sub.append(new_data)
32        batch = Batch.from_data_list(sub)
33        age_dataset.append(batch)
34        labels.append(gender)
35print("Age dataset created with {} {} number of instances".format(len(age_dataset), len(labels)))
36new_dataset = {'labels':labels, "batches":age_dataset}
37
38wm_dataset, labels = [],[]
39for k,v in dataset.items():
40    if v is None:
41        continue
42    l = v[0].y
43    wm = int(l[2].item())
44    if wm is not None: ## there are some None which should be removed
45        sub = []
46        for d in v:
47    #         print(d)
48            new_data = Data(x = d.x, edge_index = d.edge_index, y = wm)
49            sub.append(new_data)
50        batch = Batch.from_data_list(sub)
51        wm_dataset.append(batch)
52        labels.append(gender)
53print("Working memory dataset created with {} {} number of instances".format(len(wm_dataset), len(labels)))
54new_dataset = {'labels':labels, "batches":wm_dataset}
55
56fi_dataset, labels = [],[]
57for k,v in dataset.items():
58    if v is None:
59        continue
60    l = v[0].y
61    fi = int(l[3].item())
62    if not math.isnan(fi): ## there are some None which should be removed
63        sub = []
64        for d in v:
65    #         print(d)
66            new_data = Data(x = d.x, edge_index = d.edge_index, y = fi)
67            sub.append(new_data)
68        batch = Batch.from_data_list(sub)
69        fi_dataset.append(batch)
70        labels.append(gender)
71print("Fluid intelligence dataset created with {} {} number of instances".format(len(fi_dataset), len(labels)))
72new_dataset = {'labels':labels, "batches":fi_dataset}